CN112437301B

CN112437301B - Code rate control method and device for visual analysis, storage medium and terminal

Info

Publication number: CN112437301B
Application number: CN202011089723.5A
Authority: CN
Inventors: 马思伟; 张启; 王苫社
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-11-02
Anticipated expiration: 2040-10-13
Also published as: CN112437301A

Abstract

The invention discloses a code rate control method, a device, a storage medium and a terminal for visual analysis, wherein the method comprises the following steps: before video coding, determining a target code rate; sequentially inputting the target code rate into a pre-established R-lambda model based on a code rate-joint distortion model to generate a Lagrangian multiplier lambda corresponding to the target code rate; inputting a Lagrange multiplier lambda corresponding to the target code rate into a lambda-QP model which is created in advance and is based on a code rate-joint distortion model, and generating a quantization parameter QP corresponding to the target code rate; and setting the quantization parameter QP as a coding quantization parameter to complete the coding of code rate control. Therefore, by adopting the embodiment of the application, the invention only needs to change the encoder end, has smaller change amount and does not need to modify the decoder end, thereby having better deployability, saving code rate and improving the accuracy of visual analysis.

Description

Code rate control method and device for visual analysis, storage medium and terminal

Technical Field

The present invention relates to the field of digital signal processing, and in particular, to a method, an apparatus, a storage medium, and a terminal for controlling a code rate for visual analysis.

Background

Video coding is a data compression method aiming at digital video, and aims to remove redundancy in original video images and save storage and transmission cost. Videos are generally used for being watched by people, so that the traditional video coding mainly optimizes the human visual subjective and objective quality of video images under the same code rate.

More and more video images are now being used for various machine vision analysis tasks, such as object detection, pose estimation, etc. The video characteristic distortion is caused by coding compression, so that the visual analysis performance is reduced, and the visual analysis distortion is not considered when the rate distortion optimization problem is solved by the traditional coding, so that the optimal code rate-visual analysis coding performance is difficult to achieve, and the accuracy of visual analysis is reduced.

Disclosure of Invention

The embodiment of the application provides a code rate control method and device, a storage medium and a terminal for visual analysis. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

In a first aspect, an embodiment of the present application provides a code rate control method for visual analysis, where the method includes:

before video coding, determining a target code rate;

sequentially inputting the target code rate into a pre-established R-lambda model based on a code rate-joint distortion model to generate a Lagrangian multiplier lambda corresponding to the target code rate;

inputting a Lagrange multiplier lambda corresponding to the target code rate into a lambda-QP model which is created in advance and is based on a code rate-joint distortion model, and generating a quantization parameter QP corresponding to the target code rate;

and setting the quantization parameter QP as a coding quantization parameter to complete the coding of code rate control.

Optionally, the method further includes generating a rate-joint distortion model, an R- λ model, and a λ -QP model as follows:

collecting a plurality of video sequence image frames;

coding a plurality of video sequence image frames through a plurality of preset quantization parameters to generate a plurality of coded image frames;

carrying out visual analysis on a plurality of encoded image frames to generate an analysis result, and recording a code rate, a signal distortion degree and a visual analysis distortion degree according to the encoding and analysis result;

combining the signal distortion degree and the visual analysis distortion degree to generate joint distortion, and determining a function model of the joint distortion as a pre-established code rate-joint distortion model;

fitting the relation between the code rate and the joint distortion by adopting a hyperbolic function;

solving a code rate-joint distortion optimization problem by a Lagrange multiplier method, determining a functional relation between a code rate and the Lagrange multiplier, and taking the functional relation between the code rate and the Lagrange multiplier as an R-lambda model;

the method comprises the steps of training on a plurality of sequences to obtain a quantization parameter which enables the code rate-joint distortion cost to be minimum under the fixed Lagrange multiplier in a mode of fixing the Lagrange multiplier, using different quantization parameter codes and calculating the corresponding code rate-joint distortion cost, and determining the functional relation between the fixed Lagrange multiplier and the minimum quantization parameter as a lambda-QP model.

Optionally, the setting the quantization parameter QP as an encoding quantization parameter to complete encoding for rate control includes:

obtaining parameters of an R-lambda model and a lambda-QP model;

replacing parameters of an R-lambda model and a lambda-QP model in high-efficiency video coding with parameters of the R-lambda model and the lambda-QP model;

and coding according to the replaced model parameters.

Optionally, the step of constructing a functional relationship between the fixed lagrangian multiplier and the minimum quantization parameter includes:

selecting a set of lagrangian multipliers and a set of quantization parameters;

in the selected group of Lagrange multipliers, for a fixed Lagrange multiplier, coding is carried out one by using all quantization parameters, the code rate, the signal distortion and the visual analysis distortion are recorded, and the corresponding code rate-joint distortion cost is calculated;

obtaining an optimal quantization parameter in a quantization parameter search interval corresponding to the Lagrange multiplier according to the distortion cost;

and performing the operation on all the selected Lagrangian multipliers one by one so as to determine the relationship between the Lagrangian multipliers and the quantization parameters.

Optionally, the calculation formula of the visual analysis distortion degree is

Wherein P (0) represents the optimal performance of object detection on the original image; p (R) represents the target detection performance on a distorted image with a code rate R.

Optionally, the calculation formula of the joint distortion is D^*＝ω_tD_t+ω_pD_p,ω_t+ω _p1, wherein D^*Representing joint distortion, D, resulting from code compression_tRepresenting signal distortion, ω_tAnd ω_pRepresenting the weights of both distortions.

In a second aspect, an embodiment of the present application provides a device for controlling a code rate for visual analysis, where the device includes:

the code rate determining module is used for determining a target code rate before video coding;

the system comprises a Lagrange multiplier generation module, a target code rate calculation module and a target code rate calculation module, wherein the Lagrange multiplier generation module is used for sequentially inputting the target code rate into a pre-established R-lambda model based on a code rate-joint distortion model to generate a Lagrange multiplier lambda corresponding to the target code rate;

the quantization parameter QP generation module is used for inputting the Lagrange multiplier lambda corresponding to the target code rate into a lambda-QP model which is created in advance and is based on a code rate-joint distortion model, and generating the quantization parameter QP corresponding to the target code rate;

and the parameter setting module is used for setting the quantization parameter QP as a coding quantization parameter to complete the coding of code rate control.

Optionally, the apparatus further comprises:

the image frame acquisition module is used for acquiring a plurality of video sequence image frames;

the image frame coding module is used for coding a plurality of video sequence image frames through a plurality of preset quantization parameters to generate a plurality of coded image frames;

the image frame analysis module is used for carrying out visual analysis on the coded image frames to generate an analysis result and recording a code rate, a signal distortion degree and a visual analysis distortion degree according to the coding and analysis result;

the code rate-joint distortion model building module is used for fusing the signal distortion degree and the visual analysis distortion degree to generate joint distortion, and determining a function model of the joint distortion as a pre-established code rate-joint distortion model;

the relation fitting module is used for fitting the relation between the code rate and the joint distortion by adopting a hyperbolic function;

the R-lambda model construction module is used for solving a code rate-joint distortion optimization problem through a Lagrange multiplier method, determining a functional relation between a code rate and the Lagrange multiplier, and taking the functional relation between the code rate and the Lagrange multiplier as an R-lambda model;

and the lambda-QP model construction module is used for training on a plurality of sequences to obtain a quantization parameter which enables the code rate-joint distortion cost to be minimum under the fixed Lagrange multiplier in a mode of fixing the Lagrange multiplier, using different quantization parameter codes and calculating the corresponding code rate-joint distortion cost, and determining the functional relation between the fixed Lagrange multiplier and the minimum quantization parameter as the lambda-QP model.

In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the embodiment of the application, before video coding, the code rate control device for visual analysis determines a target code rate, sequentially inputs the target code rate into a pre-established R- λ model based on a code rate-joint distortion model to generate a lagrangian multiplier λ corresponding to the target code rate, inputs the lagrangian multiplier λ corresponding to the target code rate into a pre-established λ -QP model based on the code rate-joint distortion model to generate a quantization parameter QP corresponding to the target code rate, and sets the quantization parameter QP as a coding quantization parameter to complete coding of the code rate control. According to the method, the visual analysis performance distortion is introduced into a rate distortion model of a traditional coding frame, a rate-joint distortion model facing the visual analysis is formed, a hyperbolic relation model of a Lagrangian multiplier and a code rate under the problem of rate-joint distortion optimization is established, a functional relation between a quantization parameter enabling joint rate distortion cost to be minimum and the Lagrangian multiplier is determined, finally, the Lagrangian multiplier and the quantization parameter are respectively determined by setting a target code rate during coding, and rate control is achieved, so that the code rate is saved, and the accuracy of the visual analysis is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flowchart of a visual analysis-oriented code rate control method according to an embodiment of the present disclosure;

fig. 2A is a schematic diagram of code rate control-experiment sequence information for visual analysis according to an embodiment of the present application;

fig. 2B is a schematic diagram of setting a code rate control-experimental code rate point for visual analysis according to an embodiment of the present application;

fig. 2C is a schematic diagram illustrating comparison of visual analysis performance of different code rate control methods according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of another visual analysis-oriented code rate control method according to an embodiment of the present disclosure;

fig. 4A to 4D are schematic diagrams illustrating relationships between different sequence code rates and visual analysis distortion according to embodiments of the present application;

fig. 5A to 5D are schematic diagrams illustrating relationships between code rates of different sequences and joint distortion according to embodiments of the present application;

fig. 6 is a schematic diagram illustrating a comparison between an R-D model provided in an embodiment of the present application and an R-Dt model in HEVC;

FIG. 7 is a schematic diagram of a λ -QP model for visual analysis according to an embodiment of the present application;

fig. 8 is a schematic diagram of an apparatus for rate control for visual analysis according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an apparatus for controlling a rate for visual analysis according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

To date, more and more video images are now being used in various machine vision analysis tasks, such as object detection, pose estimation, etc. The video characteristic distortion is caused by coding compression, so that the visual analysis performance is reduced, and the visual analysis distortion is not considered when the rate distortion optimization problem is solved by the traditional coding, so that the optimal code rate-visual analysis coding performance is difficult to achieve, and the accuracy of visual analysis is reduced. Therefore, the present application provides a code rate control method, apparatus, storage medium and terminal for visual analysis, so as to solve the problems in the related art. In the technical scheme provided by the application, because the visual analysis performance distortion is introduced into a rate distortion model of a traditional coding frame, a code rate-joint distortion model facing the visual analysis is formed, a hyperbolic relation model of a Lagrange multiplier and a code rate under the problem of code rate-joint distortion optimization is established, a functional relation between a quantization parameter which enables the joint rate distortion cost to be minimum and the Lagrange multiplier is determined, and finally, the Lagrange multiplier and the quantization parameter are respectively determined by setting a target code rate during coding to realize code rate control, so that the code rate is saved, the accuracy of the visual analysis is improved, and the following detailed description is carried out by adopting an exemplary embodiment.

The following describes in detail a code rate control method for visual analytics according to an embodiment of the present application with reference to fig. 1 to 7. The method may be implemented in dependence on a computer program, operable on a visual analysis-oriented rate control device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.

Please refer to fig. 1, which is a flowchart illustrating a visual analysis-oriented code rate control method according to an embodiment of the present disclosure. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:

s101, determining a target code rate before video coding;

the code rate is the number of data bits transmitted per unit time during data transmission.

In a possible implementation manner, when performing rate control for visual analysis, first, before performing video encoding, a target rate needs to be determined. The determination of the code rate is the code rate determined by the application terminal according to the parameters in the current environment.

S102, sequentially inputting the target code rate into a pre-established R-lambda model based on a code rate-joint distortion model to generate a Lagrange multiplier lambda corresponding to the target code rate;

the code rate-joint distortion model is a rate distortion model of a traditional coding frame, and visual analysis performance distortion is introduced to form the code rate-joint distortion model facing visual analysis. The R- λ model is created based on a rate-joint distortion model. R represents the target code rate, and lambda is a Lagrangian multiplier.

Typically, the rate-distortion model of the conventional coding framework is a model under a video coding standard such as HEVC, AVS, etc.

In a possible implementation manner, after the target code rate is determined based on step S101, the code rate control device for visual analysis inputs the determined code rate into a pre-created R- λ model based on a code rate-joint distortion model, and generates a lagrangian multiplier λ corresponding to the target code rate.

S103, inputting the Lagrange multiplier lambda corresponding to the target code rate into a lambda-QP model which is created in advance and is based on a code rate-joint distortion model, and generating a quantization parameter QP corresponding to the target code rate;

the lambda-QP model is based on a code rate-joint distortion model, lambda is a Lagrange multiplier, and QP is a quantization parameter required by coding.

In a possible implementation manner, when the lagrangian multiplier λ corresponding to the target code rate is determined based on step S102, the code rate control device oriented to the visual analysis inputs the lagrangian multiplier λ corresponding to the determined target code rate into a pre-created λ -QP model based on a code rate-joint distortion model, and generates the quantization parameter QP corresponding to the target code rate after model processing.

And S104, setting the quantization parameter QP as a coding quantization parameter to complete the coding of code rate control.

The quantization parameter is the quantization parameter QP, the sequence number of the quantization step Qstep, and when the QP takes the minimum value of 0, the quantization is finest; conversely, QP takes a maximum value of 51, indicating that quantization is coarsest. In h.264, the quantization parameter is given in 3 levels: picture parameter set (pps), slice header (slice _ header), macroblock (mb).

In the embodiment of the application, parameters of an R-lambda model and a lambda-QP model are firstly obtained, then the parameters of the R-lambda model and the lambda-QP model in high-efficiency video coding are replaced by the parameters of the R-lambda model and the lambda-QP model, and finally coding is carried out according to the replaced model parameters.

In a possible implementation mode, firstly, visual analysis performance loss is used as visual analysis distortion, then signal distortion and visual analysis distortion are fused to form a code rate-joint distortion model, then a Lagrange multiplier method is used for solving a code rate-joint distortion optimization problem, a relation between a code rate R and a Lagrange multiplier lambda is established, then different QP codes are used for establishing a relation between a QP and a lambda which enable the code rate-joint distortion cost to be minimum, and finally lambda and the QP are calculated in sequence through a target code rate before coding based on the R-lambda and lambda-QP models under the code rate-joint distortion, so that code rate control is achieved, and coding is completed.

For example, as shown in fig. 2A-2C, fig. 2A is code rate control-experiment sequence information for visual analysis, fig. 2B is code rate control-experiment code rate point setting for visual analysis, and fig. 2C is comparison of visual analysis performance of different code rate control methods. To verify the effectiveness of the present invention, experiments were performed on the sequencing-by-pass of HEVC, and the specific sequence information is as follows:

for example, in fig. 2A, fig. 2A is the information of the rate control-experiment sequence oriented to the visual analysis, each sequence will be encoded for 5 seconds using RA configuration with 5 rate points, and the specific rate point setting is as shown in fig. 2B, and the rate unit is kilobits per second (kbps).

Further, the performance evaluation index of the present invention includes two aspects: and (5) visual analysis accuracy and code rate. Compared with the current more advanced encoder HEVC, the detailed data are shown in fig. 2C, where AP1 indicates that the target detection performance of all classes and all objects is considered, AP2 indicates that the target detection performance of Person-class objects is considered, AP3 indicates the performance of attitude estimation, and the Proposed method is indicated by Proposed. In terms of coding performance, the average code rate error of the proposed method is 3.44%, which is slightly increased compared with HEVC (3.08%).

The result shows that under the same code rate, the invention can realize better visual analysis task performance; under the same performance of the visual analysis task, the method can also save a certain code rate. Therefore, the code rate control method for visual analysis provided by the invention is effective and feasible. In addition, the invention only needs to change the encoder end, and the change amount is small, and the decoder end does not need to be modified, thereby having better deployability.

Please refer to fig. 3, which is a schematic flow chart illustrating the creation and generation of a rate-joint distortion model, an R- λ model and a λ -QP model according to an embodiment of the present application. The generation of the code rate-joint distortion model, the R-lambda model and the lambda-QP model comprises the following steps:

s201, collecting a plurality of video sequence image frames;

s202, coding a plurality of video sequence image frames through a plurality of preset quantization parameters to generate a plurality of coded image frames;

s203, carrying out visual analysis on the coded image frames to generate an analysis result, and recording a code rate, a signal distortion degree and a visual analysis distortion degree according to the coding and analysis result;

in general, the main goal of video coding is to increase the data compression ratio while preserving as much as possible the original information of the image. The compression ratio is measured by the code rate, and the loss of image information is measured by adopting certain distortion, so that the video coding needs to solve the optimal solution of the code rate-distortion problem. In the existing video coding standards (such as HEVC, AVS, etc.), a rate distortion model can be characterized by a downward convex monotonic function curve, and points on the curve represent the theoretical optimal solution of the rate distortion optimization problem at a given code rate or distortion.

Conventional video coding frameworks mainly use the difference of the original signal and the reconstructed signal to represent distortion, often calculated on a pixel-by-pixel basis, such as MSE or PSNR. However, for most high-level visual analysis tasks, it may be difficult to accurately reflect the performance loss of the visual analysis model only by signal level distortion, because the features extracted and used by the visual analysis model are generally higher-level, more compact abstract expressions of the original image, and most of the original signals are discarded by itself.

Therefore, it is necessary to introduce distortion oriented to visual analysis to optimize the existing rate-distortion model.

S204, fusing the signal distortion degree and the visual analysis distortion degree to generate joint distortion, and determining a function model of the joint distortion as a pre-established code rate-joint distortion model;

in one possible implementation, the present invention uses the visual analysis performance loss as the visual analysis distortion. Without loss of generality, taking the target detection task as an example, the visual analysis is distorted by D_pDefined as the formula:

where P (0) represents the target detection performance on the original image (without loss of generality, using the average accuracy mAP as an index), regarded as the optimum performance; p (R) represents the target detection performance on a distorted image with a code rate R. After normalization, the value range of Dp is between 0 and 1.

If distortion is analyzed using only vision, the subjective and objective quality of the video may not be guaranteed. The invention therefore proposes a joint distortion combining signal distortion and visual analysis distortion, expressed as:

D^*＝ω_tD_t+ω_pD_p,ω_t+ω_p＝1

wherein D^*Representing joint distortion, D, resulting from code compression_tRepresenting signal distortion, ω_tAnd ω_pRepresenting the weights of both distortions.

S205, fitting the relation between the code rate and the joint distortion by adopting a hyperbolic function;

s206, solving a code rate-joint distortion optimization problem by a Lagrange multiplier method, determining a functional relation between the code rate and the Lagrange multiplier, and taking the functional relation between the code rate and the Lagrange multiplier as an R-lambda model;

in one possible implementation, the specific goal of rate control is to minimize coding distortion given the rate constraint, which can be expressed as:

minD,s.t.R≤R_C

where D represents distortion, R represents the actually used code rate, and R_CIndicating the restricted code rate. This is a constrained optimization problem, which can be transformed into an unconstrained optimization problem by introducing a constraint condition into an optimization target through a lagrange multiplier method, that is, minimizing a rate-distortion cost function J:

minJ＝D+λ·R

where λ represents the lagrange multiplier, which is used to control the relative importance of code rate and distortion. To solve this problem, the relationship between R and D needs to be known, and can be usually described by a hyperbolic model:

D(R)＝C·R^-K

where C and K are parameters of a hyperbolic function. In this case, λ represents a negative slope of a tangent at a certain point on the rate-distortion curve, and can be expressed as:

the method comprises the steps that alpha and beta are model parameters and are related to an information source, and HEVC fits a group of effective alpha and beta initial parameters through statistics and training on different sequences. However, the distortion employed by HEVC is only signal distortion, and therefore optimization of the current R- λ model is required.

Likewise, suppose a visual analysis distortion D_pA hyperbolic model representation may also be used, namely:

to verify that the above equation holds, tests were performed on the generic sequencing of HEVC. First, the sequence is encoded using different fixed QPs, and the corresponding code rates are recorded separately, expressed in number of bits used per pixel (bpp). Then, the obtained distorted image is used as the input of a target detection model (without loss of generality, fast RCNN is used as a detection model, and ResNet-101 is used as a feature extraction network), and the accuracy of detection is obtained and expressed by AP. Showing R and D on different sequence B frames_pSuch as shown in fig. 4A-4D.

As shown in FIGS. 4A-4D, R and D_pA hyperbolic model representation can be used. Therefore, visual analysis distortion is introduced to form a code rate-joint distortion optimization problem, which is expressed by the following formula:

minJ^*＝D^*+λ·R

similarly, hyperbolic functions may be used to fit R and D^*The relationship (2) of (c). Due to D_tAnd D_pAre different in value range, therefore D_pThe final value of (d) is multiplied by a scaling factor gamma_pTo maintain a balance of the two orders of magnitude. Pass through pair D_tBy taking γ_p255. The lower graph is shown at ω_tAnd ω_pR-D on different sequences under different values^*The results of the curve fitting are shown, for example, in FIGS. 5A-5D.

As can be seen from fig. 5A-5D, R and D can be represented using a hyperbolic model. After taking ω p as 0.9 and ω t as 0.1 and counting more sequences, an R-D model with better generalization ability can be fitted:

and the relation of R and lambda under an R-D model can be obtained:

in actual coding, by specifying the code rate, the theoretically optimal code rate-joint distortion cost and the corresponding lambda can be calculated by using the formula.

Comparing the R-D model with the R-Dt model used in HEVC, the following figure shows that at low bit rates, the R-D model corresponds to a larger distortion than the R-Dt model, e.g., as shown in fig. 6.

S207, training on a plurality of sequences to obtain a quantization parameter which enables the code rate-joint distortion cost to be minimum under the fixed Lagrange multiplier in a mode of fixing the Lagrange multiplier, using different quantization parameter codes and calculating the corresponding code rate-joint distortion cost, and determining the functional relation between the fixed Lagrange multiplier and the minimum quantization parameter as a lambda-QP model.

In one possible implementation, the corresponding λ may be calculated for a given code rate according to an R- λ model, but corresponding encoding parameters need to be calculated to complete the encoding under the code rate limitation. Generally, the quantization parameter QP is a key parameter for determining the code rate and the image quality, and the code rate-joint distortion optimization problem can be written as follows:

the λ -QP model is used in HEVC to calculate the QP that minimizes the rate distortion cost for a given λ, and similarly, the parameters of the model are obtained by encoding, counting, and fitting the sequence. Using joint distortion, the corresponding parameters need to be re-fitted.

Under different QPs, the code rate and joint distortion generated by coding are different, and thus the code rate-joint distortion cost J is also different; at the same time, the change in λ also affects the magnitude of J. Because the QP is not a continuously changing value, the actually obtained rate-joint distortion cost does not always keep consistent with the theoretical optimum, but it also makes it possible to solve the QP corresponding to λ by using an exhaustive method.

The QP with the optimal rate-joint distortion performance under the given lambda, namely the fixed lambda, is obtained by a method similar to HEVC, different QPs are used for coding, corresponding R and D are recorded, then lambda is continuously adjusted, and coding and recording are carried out again. After encoding is completed, for each λ, inverse operation is performed by using a QP- λ index model of HEVC to obtain an initial search QP, which is denoted as QPs, to serve as a search interval, and a QP that minimizes J × within the interval can be found out, so that a one-to-one correspondence relationship between λ and QP is formed. Similar to HEVC, fitting a function of λ and QP is done using:

for example, fig. 7 shows the model fitting situation and adds the λ -QP curve for HEVC for comparison. The results show that the proposed model calculates a smaller QP when λ increases compared to the λ -QP model employed by HEVC.

Wherein the curve corresponding to "fixed" in FIG. 7 represents the Fitted model, corresponding to the following equation:

QP 3.6 xln (λ +16.0129) + 16.1840. QP is a quantization parameter.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

Please refer to fig. 8, which shows a schematic structural diagram of a rate control apparatus for visual analytics according to an exemplary embodiment of the present invention. The code rate control device for visual analysis can be realized by software, hardware or a combination of the two to form all or part of the terminal. The device 1 comprises a code rate determining module 10, a Lagrange multiplier generating module 20, a quantization parameter QP generating module 30 and a parameter setting module 40.

A code rate determining module 10, configured to determine a target code rate before video encoding;

a lagrangian multiplier generation module 20, configured to sequentially input the target code rate into a pre-created R- λ model based on a code rate-joint distortion model, and generate a lagrangian multiplier λ corresponding to the target code rate;

a quantization parameter QP generation module 30, configured to input the lagrangian multiplier λ corresponding to the target code rate into a λ -QP model based on a code rate-joint distortion model created in advance, and generate a quantization parameter QP corresponding to the target code rate;

and the parameter setting module 40 is configured to set the quantization parameter QP as an encoding quantization parameter, so as to complete encoding for rate control.

Optionally, for example, as shown in fig. 9, the apparatus 1 further includes:

an image frame acquisition module 50, configured to acquire a plurality of video sequence image frames;

an image frame encoding module 60, configured to encode a plurality of video sequence image frames by using a plurality of preset quantization parameters, and generate a plurality of encoded image frames;

the image frame analysis module 70 is configured to perform visual analysis on the encoded image frames to generate an analysis result, and record a code rate, a signal distortion degree and a visual analysis distortion degree according to the encoding and analysis result;

a code rate-joint distortion model construction module 80, configured to fuse the signal distortion degree and the visual analysis distortion degree to generate joint distortion, and determine a function model of the joint distortion as a pre-created code rate-joint distortion model;

a relation fitting module 90, configured to fit a relation between the code rate and the joint distortion by using a hyperbolic function;

the R-lambda model building module 100 is used for solving a code rate-joint distortion optimization problem through a Lagrange multiplier method, determining a functional relation between a code rate and the Lagrange multiplier, and taking the functional relation between the code rate and the Lagrange multiplier as an R-lambda model;

the λ -QP model building module 110 is configured to train on a plurality of sequences to obtain a quantization parameter that minimizes the rate-joint distortion cost under the fixed lagrangian multiplier by fixing the lagrangian multiplier, using different quantization parameter codes, and calculating the corresponding rate-joint distortion cost, and determine a functional relationship between the fixed lagrangian multiplier and the minimum quantization parameter as a λ -QP model.

It should be noted that, when the code rate control apparatus for visual analysis provided in the foregoing embodiment executes the code rate control method for visual analysis, the division of each functional module is merely used as an example, and in practical applications, the function allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the code rate control device for visual analysis and the code rate control method for visual analysis provided in the above embodiments belong to the same concept, and the embodiment of the method for implementing the code rate control device for visual analysis is detailed in the embodiments, and is not described herein again.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The present invention also provides a computer readable medium, on which program instructions are stored, and when the program instructions are executed by a processor, the code rate control method for visual analysis provided by the above method embodiments is implemented. The present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the visual analysis oriented rate control method of the above-described method embodiments.

Please refer to fig. 10, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 10, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.

Wherein a communication bus 1002 is used to enable connective communication between these components.

The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.

The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Processor 1001 may include one or more processing cores, among other things. The processor 1001 interfaces various components throughout the electronic device 1000 using various interfaces and lines to perform various functions of the electronic device 1000 and to process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005 and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.

The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a visual analysis-oriented rate control application program.

In the terminal 1000 shown in fig. 10, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the visual analysis oriented rate control application stored in the memory 1005, and specifically perform the following operations:

before video coding, determining a target code rate;

In one embodiment, the processor 1001, when executing the following operations before performing video encoding:

collecting a plurality of video sequence image frames;

In an embodiment, when performing encoding with the quantization parameter QP set as the encoding quantization parameter and completing rate control, the processor 1001 specifically performs the following operations:

obtaining parameters of an R-lambda model and a lambda-QP model;

and coding according to the replaced model parameters.

In one embodiment, the processor 1001, when executing the step of constructing the functional relationship between the fixed lagrangian multiplier and the minimum quantization parameter, specifically performs the following operations:

selecting a set of lagrangian multipliers and a set of quantization parameters;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware that is related to instructions of a computer program, and the program can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method for rate control for visual analytics, the method comprising:

before video coding, determining a target code rate;

inputting the Lagrange multiplier lambda corresponding to the target code rate into a lambda-QP model which is created in advance and is based on a code rate-joint distortion model, and generating a quantization parameter QP corresponding to the target code rate;

setting the quantization parameter QP as a coding quantization parameter to complete coding of code rate control;

wherein; the method further comprises the following steps of generating the code rate-joint distortion model, the R-lambda model and the lambda-QP model according to the following modes:

collecting a plurality of video sequence image frames;

coding the plurality of video sequence image frames through a plurality of preset quantization parameters to generate a plurality of coded image frames;

carrying out visual analysis on the coded image frames to generate an analysis result, and recording a code rate, a signal distortion degree and a visual analysis distortion degree according to the coding and analysis result;

fusing the signal distortion degree and the visual analysis distortion degree to generate joint distortion, and determining a function model of the joint distortion as a pre-established code rate-joint distortion model;

solving a code rate-joint distortion optimization problem by a Lagrange multiplier method, determining a functional relation between the code rate and the Lagrange multiplier, and taking the functional relation between the code rate and the Lagrange multiplier as an R-lambda model;

training on a plurality of sequences to obtain a quantization parameter which enables the code rate-joint distortion cost to be minimum under the fixed Lagrange multiplier in a mode of fixing the Lagrange multiplier, using different quantization parameter codes and calculating the corresponding code rate-joint distortion cost, and determining the functional relation between the fixed Lagrange multiplier and the minimum quantization parameter as a lambda-QP model;

wherein the step of constructing a functional relationship between the fixed lagrangian multiplier and the minimum quantization parameter comprises:

selecting a set of lagrangian multipliers and a set of quantization parameters;

performing the operation one by one on all the selected Lagrange multipliers so as to determine the relationship between the Lagrange multipliers and the quantization parameters;

wherein the visual analysis distortion factor is calculated by the formula

Wherein P (0) represents the optimal performance of object detection on the original image; p (R) represents the target detection performance on a distorted image with a code rate R;

wherein the calculation formula of the joint distortion is D^*＝ω_tD_t+ω_pD_p,ω_t+ω_p1, wherein D^*Representing joint distortion, D, resulting from code compression_tRepresenting signal distortion, ω_tAnd ω_pRepresenting the weights of both distortions.

2. The method of claim 1, wherein the setting the quantization parameter QP as an encoding quantization parameter completes code rate control encoding, comprising:

obtaining parameters of the R-lambda model and the lambda-QP model;

and coding according to the replaced model parameters.

3. An apparatus for rate control for visual analytics, the apparatus comprising:

the Lagrange multiplier generation module is used for sequentially inputting the target code rate into a pre-established R-lambda model based on a code rate-joint distortion model to generate a Lagrange multiplier lambda corresponding to the target code rate;

the quantization parameter QP generation module is used for inputting the Lagrange multiplier lambda corresponding to the target code rate into a lambda-QP model which is created in advance and is based on a code rate-joint distortion model, and generating a quantization parameter QP corresponding to the target code rate;

the parameter setting module is used for setting the quantization parameter QP as a coding quantization parameter to complete the coding of code rate control;

wherein the apparatus further comprises:

the image frame coding module is used for coding the plurality of video sequence image frames through a plurality of preset quantization parameters to generate a plurality of coded image frames;

the R-lambda model building module is used for solving a code rate-joint distortion optimization problem through a Lagrange multiplier method, determining a functional relation between the code rate and the Lagrange multiplier and taking the functional relation between the code rate and the Lagrange multiplier as an R-lambda model;

the lambda-QP model construction module is used for training on a plurality of sequences to obtain a quantization parameter which enables the code rate-joint distortion cost to be minimum under the fixed Lagrange multiplier in a mode of fixing the Lagrange multiplier, using different quantization parameter codes and calculating the corresponding code rate-joint distortion cost, and determining the functional relation between the fixed Lagrange multiplier and the minimum quantization parameter as a lambda-QP model;

wherein the λ -QP model building module is specifically configured to, when fixing a functional relationship between a lagrange multiplier and the minimum quantization parameter:

selecting a set of lagrangian multipliers and a set of quantization parameters;

wherein the visual analysis distortion factor is calculated by the following formula

4. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-2.

5. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-2.