WO2023103200A1 - 视频码率控制方法及装置、计算机可读存储介质 - Google Patents
视频码率控制方法及装置、计算机可读存储介质 Download PDFInfo
- Publication number
- WO2023103200A1 WO2023103200A1 PCT/CN2022/080754 CN2022080754W WO2023103200A1 WO 2023103200 A1 WO2023103200 A1 WO 2023103200A1 CN 2022080754 W CN2022080754 W CN 2022080754W WO 2023103200 A1 WO2023103200 A1 WO 2023103200A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- code rate
- coding
- parameter
- encoding
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 238000013528 artificial neural network Methods 0.000 claims abstract description 63
- 238000013139 quantization Methods 0.000 claims abstract description 21
- 238000007906 compression Methods 0.000 claims abstract description 17
- 230000006835 compression Effects 0.000 claims abstract description 15
- 238000013441 quality evaluation Methods 0.000 claims description 60
- 238000012545 processing Methods 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- 238000001303 quality assessment method Methods 0.000 claims description 8
- 230000006978 adaptation Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 10
- 238000005457 optimization Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 230000009916 joint effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
Definitions
- the embodiments of the present application relate to the technical field of video image processing, and in particular, to a video bit rate control method and device, and a computer-readable storage medium.
- Embodiments of the present application provide a video code rate control method and device, and a computer-readable storage medium.
- the embodiment of the present application provides a video bit rate control method, including: inputting the obtained global encoding reference data of the video to be compressed into the graph neural network, and outputting bit rate correlation data; according to the bit rate correlation The data determines the current bit rate parameter used to control the video encoding bit rate; wherein, the global encoding reference data is used to characterize the compression quality of the video to be compressed, and the bit rate associated data includes at least one of the following types: encoding The division information of the unit, and the quantization parameter of each coding block in the coding unit.
- the embodiment of the present application also provides a video bit rate control device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the computer program Realize the video code rate control method as described in the first aspect above.
- the embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, the computer-executable instructions being used to execute the video bit rate control method described in the first aspect above.
- Fig. 1 is the flow chart of the video code rate control method that an embodiment of the present application provides
- Fig. 2 is a flow chart of determining the current code rate parameter in the video code rate control method provided by one embodiment of the present application;
- Fig. 3 is a schematic structural diagram of a graph neural network provided by an embodiment of the present application.
- FIG. 4 is a flow chart of outputting code rate-associated data in a video code rate control method provided by an embodiment of the present application
- FIG. 5 is a flow chart of outputting code rate-associated data in a video code rate control method provided in another embodiment of the present application.
- FIG. 6 is a flow chart of determining encoding quality evaluation parameters in a video bit rate control method provided by an embodiment of the present application
- FIG. 7 is a flow chart of determining a first encoding quality evaluation parameter in a video bit rate control method provided by an embodiment of the present application.
- Fig. 8 is a flow chart of determining the encoding quality evaluation index corresponding to the reconstructed frame in the video code rate control method provided by an embodiment of the present application;
- FIG. 9 is an execution flowchart for determining a first encoding quality evaluation parameter provided by an embodiment of the present application.
- FIG. 10 is a flow chart of determining encoding quality evaluation parameters in a video rate control method provided in another embodiment of the present application.
- Fig. 11 is an execution flowchart for determining a second encoding quality evaluation parameter provided by an embodiment of the present application.
- Fig. 12 is a flow chart of obtaining second code rate associated data in the video code rate control method provided by one embodiment of the present application.
- Fig. 13 is a flow chart of obtaining the first code rate associated data in the video code rate control method provided by one embodiment of the present application;
- FIG. 14 is a flow chart before outputting code rate-associated data in a video code rate control method provided by an embodiment of the present application.
- Fig. 15 is a flow chart after determining the current code rate parameter in the video code rate control method provided by one embodiment of the present application.
- Fig. 16 is a schematic diagram of an apparatus for controlling a video code rate provided by an embodiment of the present application.
- the present application provides a video bit rate control method and device, and a computer-readable storage medium.
- the global coding reference data of the video to be compressed is output with target constraints, and the related data of the bit rate in the case of global optimization is obtained, which can reduce the The impact of bit rate fluctuations brought about by global errors, and then based on the division information of the coding unit in the bit rate associated data or/and the quantization parameters of each coding block, the current bit rate parameter at the macroblock level suitable for the video application scene to be compressed is obtained , which is conducive to improving video coding efficiency, optimizing user viewing experience, and does not display standard-related information, and can better adapt to multiple coding standards.
- FIG. 1 is a flow chart of a video bit rate control method provided by an embodiment of the present application.
- the video bit rate control method includes but is not limited to steps S100 to S200 .
- Step S100 Input the obtained global coding reference data of the video to be compressed into the graph neural network, and output code rate related data, wherein the global coding reference data is used to represent the compression quality of the video to be compressed, and the code rate related data includes the following types At least one of: division information of the coding unit, quantization parameters of each coding block in the coding unit.
- the global coding reference data of the video to be compressed is output through the graph neural network with target constraints to obtain the code rate related data in the case of global optimization, which can reduce the influence of code rate fluctuations caused by global errors, and the obtained
- the rate-related data is the division information of the coding unit or/and the quantization parameters of each coding block in the coding unit.
- the type of the video to be compressed is not limited, and the way to obtain the global coding reference data of the video to be compressed is not limited, and is well known to those skilled in the art, and will not be described in detail here;
- Graph Neural Network Graph Neural Network, The type of GNN
- the global coding reference data is input into the trained graph neural network, and the trained graph neural network outputs code rate-related data.
- the training method of the network is described step by step in the following embodiments.
- the global code rate reference data is used to characterize the compression quality of the video to be compressed, so all factors that affect the compression quality of the video to be compressed may be considered as the global code rate reference data, especially the non-structural data therein , strong independence, will not be affected by other data changes, modifications, etc., and has good reference.
- the global code rate reference data can include but not limited to at least one of the following types:
- ROI Region Of Interest
- code rate constraint information associated with the coding standard may be preset, and corresponding code rate constraint information exists for different coding standards.
- the ROI information can be pre-set and used to represent the encoding format supported by the encoder to determine whether the encoder supports ROI encoding. If this type of encoding strategy is supported, the ROI will be prioritized, for example Set the initial values from 0.1 to 1 according to the priority characteristics, 1 means the highest priority, and 0.1 means the lowest priority. Here, the convolution characteristics are considered, and the value of 0 is not used for description even if the priority is the lowest.
- ROI encoding is not supported, consider initializing the ROI matrix to all 1s, so as to realize the control strategy of whether to support a clear ROI, and greatly alleviate the video quality of the non-region of interest (Non Region Of Interest, NROI)
- NROI Non Region Of Interest
- the encoding type information may cause different video compression scenarios, that is, it will affect the video bit rate; the encoder information reflects the encoding effect of the encoder itself on the bit rate-related data, which may be due to the structure of the encoder itself , specifications, etc., need specific analysis and determination for a specific encoder, which is not limited in this embodiment.
- the coding frame constraint information reflects the influence of the coding frame information in the coding process, and can be further determined based on reference frame information, current frame information, and the like.
- the coded frame statistical information may be, but not limited to, macroblock-level texture information, texture information of coding units, etc., and may not be limited to texture information of previous and subsequent frame images, and may refer to residuals between coding unit matching blocks Information, Median Absolute Deviation (MAD), etc., where MAD is used to represent the difficulty of residual coding of the coding block.
- MAD Median Absolute Deviation
- inter-frame information reflects inter-frame prediction correlation, so as to better evaluate the video coding process.
- the global code rate reference data can also include more types and wider data.
- the above-mentioned examples of the global code rate reference data are only used to illustrate its principle features, but should not be interpreted as any arbitrary analysis of its composition.
- those skilled in the art can select relevant types of global code rate reference data according to specific application scenarios and input them into the graph neural network individually or in combination, for example, they can select optimized settings for ROI coding according to specific scenarios to improve the coding effect, etc., and Since there is no mandatory dependency on the coding standard, it is not necessary to consider the specificity between the coding standard and the global bit rate reference data, and the applicable scenarios are wider.
- the division of the coding unit in the video to be compressed can be determined.
- the division of each coding block in the coding unit is further determined.
- the quantization parameter is beneficial to further determine the code rate control parameter at the macroblock level; it can be understood that whether one of the two is confirmed separately or both are confirmed simultaneously, it will not affect the execution of the steps of this embodiment, except that The corresponding emphasis is different, that is, it may focus on controlling the division information of the coding unit or the quantization parameters of each coding block, which is not limited in this embodiment; in addition, the code rate-related data can also be the subjective compression quality of the coding frame.
- step S100 can be presented as a specific function in a logical entity, and the logical entity can be used as a separate physical device entity, or as a software entity on the host, and the logical entity can be named as a data preparation unit.
- Step S200 determine the current code rate parameter used to control the video encoding code rate according to the code rate associated data.
- the global coding reference data of the video to be compressed is output through the graph neural network with target constraints, and the code rate-related data in the case of global optimization is obtained, which can reduce the impact of code rate fluctuations caused by global errors, and then based on the code rate
- the division information of the coding unit in the associated data or/and the quantization parameters of each coding block are obtained to obtain the current code rate parameter at the macroblock level suitable for the application scene of the video to be compressed, which is conducive to improving the video coding efficiency and optimizing the user viewing experience, and It does not display the information related to the imported standard, which can better adapt to various coding standards.
- steps S100 and S200 have the following significant advantages:
- the quantization parameter information under the specific code rate requirement is estimated.
- This embodiment does not require a unified coding standard, and is suitable for mixed coding Strategic video coding solutions, such as H.26x, VP9, AV1, AVSx and other coding standards, have no strong coupling relationship with coding standards and encoder capabilities, making it easier to integrate hardware coding chips.
- this embodiment Compared to assigning the target number of encoded bits saved by NROI to the bit-encoded ROI macroblocks by superimposing ROI information in some cases, this embodiment considers the overall situation and fully considers the impact of excessive vision, which can alleviate the frequent problems caused by NROI. Over-blur conditions to optimize user video experience.
- the compression method based on deep learning realizes end-to-end coding, such as the output of video compression parameters, usually input video output code stream, or the estimation of network parameters, such as using statistical data such as confidence Assessing the lowest bit rate
- this embodiment can provide macroblock-level encoding parameters without relying on the existing bit rate control method, which can improve the adaptability to the scene and optimize the user's video experience.
- step S200 includes but not limited to step S210 .
- Step S210 in the case of determining the division information of the coding unit, train the quantization parameters of each coding block based on the graph neural network, and obtain the current code rate parameter for controlling the video coding code rate.
- the control and adjustment of a specific code rate is realized by optimizing the quantization parameter configuration.
- the coding unit does not participate in the training of the graph neural network as a fixed value
- the adjustment of quantization parameters is realized through the joint action of several other related global coding reference data.
- This adjustment method has strong pertinence, and only need to adjust the quantization parameters to realize the corresponding macroblock-level coding The output of the parameters is conducive to obtaining the current bit rate parameters more accurately and reasonably.
- this embodiment considers the scenario where the coding unit can be trained. If the conditions are sufficient, the result obtained based on the advanced coding search can be used as the real value to participate in the training of the coding unit, which is not included in this embodiment. limit.
- FIG. 3 is a schematic structural diagram of a graph neural network provided by an embodiment of the present application.
- the graph neural network can be applied, but not limited to, to products or application devices involving video encoding and decoding, such as terminals and smart interconnections.
- the global encoding reference data acquired and input this time includes bit rate constraint information, ROI information, reference frame information, current frame information, and texture statistical information of the corresponding frame, based on the graph neural network shown in Figure 3, implements the application of texture statistical information according to the input global coding reference data, and determines the division information of coding units and The quantization parameters of each coding block, and then the division information of the coding unit and the quantization parameters of each coding block are trained by the graph neural network, and the required current code rate parameters are output.
- step S100 includes but not limited to steps S110 to S120 .
- Step S110 Obtain the encoded frame information and historical bit rate parameters of the video to be compressed based on the graph neural network
- Step S120 Input the global encoding reference data, encoding frame information and historical code rate parameters into the graph neural network, and output code rate related data.
- the historical code rate parameters are the current code rate parameters determined last time.
- the graph neural network can be trained and constructed according to the obtained global coding reference data. After the training is completed, the global coding reference data is input into the constructed graph neural network. The constructed graph neural network can match the video coding need.
- the coded frame information reflects the specific impact of the coded frame on the code
- the historical determination of the code rate parameter can be carried out Considering the scenario is equivalent to further outputting the bit rate related data on the basis of the historically determined scenarios of the bit rate parameter, so as to realize the optimized output of the bit rate parameter.
- step S120 includes but not limited to steps S121 to S122.
- Step S121 determining encoding quality evaluation parameters according to encoding frame information and historical code rate parameters
- Step S122 input the global code rate reference data and the coding quality evaluation parameters into the graph neural network, and output code rate related data.
- the coding quality evaluation parameters are determined by the coding frame information and the historical code rate parameters, and the impact of the coding quality evaluation parameters is used to further cooperate with the influence of the global code rate reference data to realize the optimized output of the code rate-related data, which can It is understood that when it is necessary to optimize the code rate-related data, the coding quality evaluation parameter of this embodiment can be used as a new factor to affect the implementation. In other words, if the code rate-related data does not need to be further optimized, the coding quality can be optimized
- the evaluation parameter is set to a null value, which is not limited in this embodiment.
- each coding quality evaluation parameter can also be different , which is not limited in this embodiment, and specific examples are given below for illustration.
- step S121 when the coded frame information includes reference frame information, and the coded quality evaluation parameter includes the first coded quality evaluation parameter, step S121 includes but not limited to steps S1211 to S1213.
- Step S1211 according to the historical code rate parameters to determine the code stream corresponding to the historical code rate parameters
- Step S1212 decoding the coded stream according to the reference frame information to obtain a reconstructed frame
- Step S1213 determining a first coding quality evaluation parameter according to the reconstructed frame.
- the reconstructed frame is restored by determining the coded stream in the historical scene and decoding the coded stream, and the reconstruction strategy based on restoring the original frame is realized. Since the reconstructed frame is associated with the reference frame information at the same time and the coded stream corresponding to the historical code rate parameters, so the reconstructed frame can represent the coding situation of the historical scene and the coding situation corresponding to the reference frame information. Under this condition, the first coding quality evaluation parameter determined based on the reconstructed frame has Good forward propagation characteristics can meet the optimization training requirements based on graph neural network, which is conducive to improving the output of bit rate parameter results.
- step S1213 includes but not limited to steps S12131 to S12132.
- Step S12131 for each reconstructed frame, obtain the encoding quality evaluation index corresponding to the reconstructed frame according to the reconstructed frame;
- Step S12132 from each coding quality evaluation index, determine the largest coding quality evaluation index as the first coding quality evaluation parameter.
- the quality of the reconstructed frame in the current network environment is used as the objective function to update the training parameters of the graph neural network, and the largest coding quality evaluation index is determined as the first coding quality evaluation parameter, indicating that the decoded data frame corresponding to the first coding quality evaluation parameter The quality of is the highest, so the graph neural network can be trained for reinforcement learning based on this parameter to optimize the bit rate parameter output.
- step S12131 includes but not limited to steps S12133 to S12134.
- Step S12133 according to the reconstruction frame, determine the reconstruction quality parameter, network stall parameter and handover status parameter corresponding to the reconstruction frame;
- Step S12134 performing weighted superposition on the reconstruction quality parameter, the network freeze parameter and the switching status parameter to obtain the encoding quality evaluation index corresponding to the reconstructed frame.
- the coding quality evaluation index corresponding to the reconstructed frame can be accurately obtained, and the coding quality evaluation index is only related to the reconstruction frame itself.
- the content of quality parameters will not be mixed with other impurities for calculation, so the error fluctuation is relatively small.
- FIG. 9 is an execution flowchart of determining a first encoding quality evaluation parameter provided by an embodiment of the present application.
- Step S300 According to the historical code rate parameters obtained from the graph neural network, obtain the encoded code stream corresponding to the historical code rate parameters;
- Step S400 refer to the reference frame, decode the coded code stream through a decoder to generate a decoding result, and obtain a reconstructed frame;
- Step S500 Determine a first coding quality evaluation parameter based on the reconstructed frame.
- the quality of the recovery frame in the current network environment is directly used as the objective function to update the network parameters, for example, the weight of the reconstruction quality, network stall parameters and handover status can be comprehensively introduced as the overall quality of experience (Quality of Experience, QoE) evaluation indicators, namely
- R(n) can use no-reference image quality evaluation indicators, including but not limited to Information Fidelity Criterion (IFC), Deep CNN-Based Blind Image Quality Predictor (DIQA) wait.
- IFC Information Fidelity Criterion
- DIQA Deep CNN-Based Blind Image Quality Predictor
- GAN Generative Adversarial Network
- ESRGAN enhanced super-resolution generation confrontation network
- ESRGAN enhanced Generative Adversarial Network
- the coding strategy proposed in this embodiment requires the video to be compressed to be coded by region, and different coding parameters and strategies are designed according to the difference in regional information (such as ROI, texture statistical information, etc.), and the final output coded frame is realized under the control of the overall bit rate.
- regional information such as ROI, texture statistical information, etc.
- GNN'(X) represents the coded code stream output by this example, and there are multiple coded code streams
- Q(GNN'(X)) represents the quality of the data frame obtained by decoding the coded code stream
- the constraints are BD GNN'(X) ⁇ RATE
- the code rate should not be greater than the specified target code rate.
- step S121 also includes, but is not limited to, step S1214.
- Step S1214 performing differential processing on the reconstructed frame information and the current frame information to obtain a second encoding quality evaluation parameter, wherein the reconstructed frame information corresponds to the reconstructed frame.
- differential processing is performed on the obtained reconstructed frame information in conjunction with the current frame information, so as to take into account the encoding situation corresponding to the current frame information, and obtain a second encoding quality evaluation that meets the requirements parameter, which can meet the optimization training requirements based on the graph neural network, and is conducive to improving the output of bit rate parameter results.
- the objective function can be obtained based on the differential processing, and then the encoding result can be evaluated based on the determined objective function.
- FIG. 11 is an execution flowchart of determining a second encoding quality evaluation parameter provided by an embodiment of the present application.
- Step S600 According to the historical code rate parameters obtained from the graph neural network, obtain the encoded code stream corresponding to the historical code rate parameters;
- Step S700 refer to the reference frame, and decode the coded code stream through a decoder to generate a decoding result
- Step S800 Compare the decoding result with the real value of the current frame, calculate the difference cost f, and obtain Loss (ie, the second encoding quality evaluation parameter).
- the way of obtaining the Loss is determined according to a specific application scenario, which is not limited in this embodiment, and will be described with an example below.
- the L1 norm of the reconstructed image x' and the uncompressed image x is used as the Loss, or an implicit discriminant method can also be used, such as using the idea of GAN, designing a discriminant network to analyze the quality of the encoded image,
- h and h' represent the coding unit and the decoding unit, respectively. Since h is lossy compression, the quality of the restored image is degraded.
- the output of the discriminator network evaluates the encoding results in order to achieve the maximum preservation of video quality under specific bit rate requirements.
- the Loss calculation based on the current frame and the reconstructed frame can also adopt a variety of similar schemes, for example, in step 3, the L2 norm of the reconstructed image x' and the uncompressed image x is used as the Loss, etc.
- Example 2 and Example 3 can be presented as a specific function in a logical entity.
- This logical entity can be used as a separate physical device entity or as a software entity on the host.
- This logical entity can be named as The model training unit is to determine the first coding quality evaluation parameter according to the reconstructed frame, and perform differential processing on the reconstructed frame information and the current frame information to obtain the second coding quality evaluation parameter.
- step S122 includes but is not limited to step S1221.
- Step S1221 inputting the global code rate reference data and the second coding quality assessment parameters into the graph neural network to obtain second code rate related data.
- the second code rate associated data corresponding to the second coding quality evaluation parameters is obtained, compared to the original Rate-related data, using the second encoding quality evaluation parameter as a training parameter to optimize the graph neural network, can obtain better optimized two-bit rate-related data, which is conducive to improving the video compression effect.
- step S122 includes but is not limited to step S1222.
- Step S1222 for each coded code stream, input the global code rate reference data and the first code quality evaluation parameter into the graph neural network to obtain the first code rate associated data corresponding to the coded code stream.
- the first code rate associated data corresponding to each coded code stream is obtained by inputting the global code rate reference data and the first code quality evaluation parameter into the graph neural network, That is to say, in a specific application scenario, during a video compression process, the first bit rate associated data corresponding to each encoded bit stream can be controlled and adjusted separately, which can avoid homogeneity, thereby significantly improving the video compression effect.
- step S100 also includes but not limited to step S900 .
- Step S900 in the case of receiving the resource limitation information corresponding to the graph neural network, perform scale reduction processing on the graph neural network.
- the resource limitation information can be formed under the resource limitation scenario of the application platform.
- the graph neural network is subjected to scale reduction processing according to the requirements of the application scenario, including but not limited to distillation, quantization , pruning, and dynamic network design, etc., to reduce the scale and computing power requirements of the overall graph neural network model.
- the graph neural network replaces the original graph neural network.
- step S1000 is also included after step S200 .
- Step S1000 in the case of receiving model adaptation information corresponding to the current code rate parameter, optimize the current code rate parameter according to the model adaptation information.
- the model adaptation information can be formed under the condition that the network transmission environment is constrained.
- the current code rate parameters are optimized according to the model adaptation information, including but not limited to optimized encoding Parameters, consider reducing the bit rate at the expense of subjective quality, etc., to adapt the network structure and model parameters.
- step S900 and step S1000 can be presented as a specific function in a logical entity, and this logical entity can be used as a separate physical device entity, or as a software entity on the host, and this logical entity can be named as an inference application unit , in combination with lightweight strategies for deployment optimization, scale-compression processing of the graph neural network in resource-constrained scenarios, and optimization of the current code rate parameters in scenarios where the network transmission environment is constrained, so as to reduce the computing power of the model purpose of consumption.
- an embodiment of the present application also provides a video code rate control device 100, which includes: a memory 110, a processor 120, and an A computer program running on 120.
- the processor 120 and the memory 110 may be connected through a bus or in other ways.
- the non-transitory software programs and instructions required to realize the video bit rate control method of the above-mentioned embodiments are stored in the memory 110, and when executed by the processor 120, the video bit rate control methods of the above-mentioned embodiments are executed, for example, the above Described method steps S100 to S200 in FIG. 1 , method steps S210 in FIG. 2 , method steps S110 to S120 in FIG. 4 , method steps S121 to S122 in FIG. 5 , method steps S1211 to S1213 in FIG. 6 , Method steps S12131 to S12132 in Fig. 7, method steps S12133 to S12134 in Fig. 8, method steps S300 to S500 in Fig. 9, method steps S1214 in Fig. 10, method steps S600 to S800 in Fig. 11, Fig. 12
- the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor 120 or a controller, for example, by Execution by a processor 120 in the above-mentioned device embodiment can make the above-mentioned processor 120 execute the video code rate control method in the above-mentioned embodiment, for example, execute the method steps S100 to S200 in FIG. 1 described above, and the method steps in FIG. 2 Method steps S210, method steps S110 to S120 in Fig. 4, method steps S121 to S122 in Fig. 5, method steps S1211 to S1213 in Fig. 6, method steps S12131 to S12132 in Fig.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
- communication media typically embody computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
一种视频码率控制方法及装置、计算机可读存储介质,其中,方法包括:将获取到的待压缩视频的全局编码参考数据输入到图神经网络,输出码率关联数据(S100);根据码率关联数据确定用于控制视频编码码率的当前码率参数(S200);其中,全局编码参考数据用于表征待压缩视频的压缩质量,码率关联数据包括如下类型中的至少一个:编码单元的划分信息、编码单元中的各个编码块的量化参数。
Description
相关申请的交叉引用
本申请基于申请号为202111508059.8、申请日为2021年12月10日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本申请实施例涉及视频图像处理技术领域,尤其涉及一种视频码率控制方法及装置、计算机可读存储介质。
随着网络技术的不断发展,设备接入请求和环境变得复杂多样,为克服带宽不稳定而导致的体验下降逐渐成为其中的一个重要课题。通常而言,受带宽不稳定影响比较明显的属于持续流量传输,例如视频信号等;目前,在一些情况下的视频编码方案较为固定,通常仅应用于特定的编码标准,在应对带宽变化的场景时,无法为同一编码内容提供适应的码率传输方案,因此编码效率较为低下,导致用户在观看视频时会经常出现视频卡顿、感兴趣区域ROI画面模糊或者主观体验明显下降等问题。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种视频码率控制方法及装置、计算机可读存储介质。
第一方面,本申请实施例提供了一种视频码率控制方法,包括:将获取到的待压缩视频的全局编码参考数据输入到图神经网络,输出码率关联数据;根据所述码率关联数据确定用于控制视频编码码率的当前码率参数;其中,所述全局编码参考数据用于表征所述待压缩视频的压缩质量,所述码率关联数据包括如下类型中的至少一个:编码单元的划分信息、所述编码单元中的各个编码块的量化参数。
第二方面,本申请实施例还提供了一种视频码率控制装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上第一方面所述的视频码率控制方法。
第三方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如上第一方面所述的视频码率控制方法。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的 实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请一个实施例提供的视频码率控制方法的流程图;
图2是本申请一个实施例提供的视频码率控制方法中,确定当前码率参数的流程图;
图3是本申请一个实施例提供的图神经网络的结构示意图;
图4是本申请一个实施例提供的视频码率控制方法中,输出码率关联数据的流程图;
图5是本申请另一个实施例提供的视频码率控制方法中,输出码率关联数据的流程图;
图6是本申请一个实施例提供的视频码率控制方法中,确定编码质量评估参数的流程图;
图7是本申请一个实施例提供的视频码率控制方法中,确定第一编码质量评估参数的流程图;
图8是本申请一个实施例提供的视频码率控制方法中,确定与重建帧对应的编码质量评估指标的流程图;
图9是本申请一个实施例提供的确定第一编码质量评估参数的执行流程图;
图10是本申请另一个实施例提供的视频码率控制方法中,确定编码质量评估参数的流程图;
图11是本申请一个实施例提供的确定第二编码质量评估参数的执行流程图;
图12是本申请一个实施例提供的视频码率控制方法中,得到第二码率关联数据的流程图;
图13是本申请一个实施例提供的视频码率控制方法中,得到第一码率关联数据的流程图;
图14是本申请一个实施例提供的视频码率控制方法中,输出码率关联数据之前的流程图;
图15是本申请一个实施例提供的视频码率控制方法中,确定当前码率参数之后的流程图;
图16是本申请一个实施例提供的视频码率控制装置的示意图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要注意的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请提供了一种视频码率控制方法及装置、计算机可读存储介质,通过图神经网络对待压缩视频的全局编码参考数据进行目标约束输出,得到全局优化情况下的码率关联数据,能够降低全局误差带来的码率波动影响,进而基于码率关联数据中的编码单元的划分信息或/和各个编码块的量化参数,得到适应于待压缩视频应用场景的宏块级的当前码率参数,有利于提升视频编码效率,优化用户观看体验,并且不显示引入标准关联信息,能够更好地适配多种编码标准。
下面结合附图,对本申请实施例作进一步阐述。
如图1所示,图1是本申请一个实施例提供的视频码率控制方法的流程图,该视频码率控制方法包括但不限于步骤S100至S200。
步骤S100:将获取到的待压缩视频的全局编码参考数据输入到图神经网络,输出码率关 联数据,其中,全局编码参考数据用于表征待压缩视频的压缩质量,码率关联数据包括如下类型中的至少一个:编码单元的划分信息、编码单元中的各个编码块的量化参数。
在一实施例中,通过图神经网络对待压缩视频的全局编码参考数据进行目标约束输出,得到全局优化情况下的码率关联数据,能够降低全局误差带来的码率波动影响,并且所得到的码率关联数据为编码单元的划分信息或/和编码单元中的各个编码块的量化参数,本领域技术人员可知上述两个参数为影响视频压缩的重要指标,以便基于编码单元的划分信息或/和编码单元中的各个编码块的量化参数确定相关的码率控制参数。
在一实施例中,待压缩视频的类型不限定,获取待压缩视频的全局编码参考数据的方式不限制,且为本领域技术人员所熟知,在此不作赘述;图神经网络(Graph Neural Network,GNN)的类型不做限定,可以为已经训练好的,此时将全局编码参考数据输入到已经训练好的图神经网络,则由已经训练好的图神经网络输出码率关联数据,关于图神经网络的训练方式在下述各实施例中逐步说明。
在一实施例中,全局码率参考数据用于表征待压缩视频的压缩质量,因此所有影响待压缩视频的压缩质量的因素都可能被认为是全局码率参考数据,尤其是其中的非结构数据,独立性强,不会因为其他数据的变更、修改等受到影响,具有较好的参考性,例如全局码率参考数据可以包括但不限于如下类型中的至少一个:
与编码标准关联的码率约束信息;
感兴趣区域(Region Of Interest,ROI)信息;
编码类型信息;
编码器信息;
编码帧约束信息;
编码帧统计信息;
帧间信息。
需要说明的是,与编码标准关联的码率约束信息可以为预先设置好的,对于不同的编码标准存在相应的码率约束信息。
需要说明的是,ROI信息可以为预先设置好的,用于表征编码器支持的编码格式,以确定编码器是否支持ROI编码,如果支持该类编码策略,则将ROI进行优先级设定,例如根据优先级特性分别设定从0.1至1的初始值,1表示优先级最高,0.1表示优先级最低,此处考虑卷积特性,即使优先级最低也不采用0值进行描述。相反地,如果不支持ROI编码,则考虑将ROI矩阵初始化全为1,从而实现支持有无明确ROI的控制策越,很大程度缓解对非感兴趣区域(Non Region Of Interest,NROI)视频质量的过度退化,降低整体码率因控制偏差导致的波动性问题。
需要说明的是,编码类型信息可能造成视频压缩场景的不同,即会对视频码率产生影响;编码器信息体现编码器本身对于码率关联数据的编码影响,这可能是由于编码器自身的构造、规格等所产生的,对于具体编码器需要具体分析确定,这在本实施例中并未限制。
需要说明的是,编码帧约束信息体现编码过程中的编码帧信息影响,可以基于参考帧信息、当前帧信息等进行进一步判断得到。
需要说明的是,编码帧统计信息可以但不限于为宏块级的纹理信息、编码单元的纹理信息等,也可以不限于前后帧图像的纹理信息,可以参考编码单元匹配块之间的残差信息、异 常值检测(Median Absolute Deviation,MAD)等,其中MAD用于表征编码块的残差编码难易程度。
需要说明的是,帧间信息体现帧间预测相关性,以便对视频编码过程进行更好地评估。
可以理解地是,全局码率参考数据还可以包括更多的类型、更广泛的数据,上述对全局码率参考数据的示例仅用于说明其原理特征,但不应理解为对其构成进行任意限制,本领域技术人员可以根据具体应用场景选择相关类型的全局码率参考数据进行单独或组合输入到图神经网络中,例如可以根据具体场景选择对ROI编码的优化设置以提升编码效果等,并且由于对编码标准没有强制依赖关系,因此可以不用考虑编码标准与全局码率参考数据之间的专配性,适用场景更为广泛。
在一实施例中,基于编码单元的划分信息可以确定待压缩视频中的编码单元划分情况,在一种情景下,在确定编码单元划分情况的情况下,进一步确定编码单元中的各个编码块的量化参数,有利于进一步确定宏块级的码率控制参数;可以理解地是,无论单独确认两者中的一个还是同时确认两者,对于本实施例的步骤执行均不会产生影响,只不过相应的侧重点不同,即可能侧重于控制编码单元的划分信息或者各个编码块的量化参数,这在本实施例中并未限制;此外,码率关联数据还可以为编码帧对主观压缩质量的影响数据,虽然上述各实施例中的码率控制作为强约束,但在码率参数输出迭代优化过程中,可以将编码帧对主观质量的影响作为输入参数加入网络训练过程中,解决在NROI中对编码块的过度压缩而导致主观质量极度恶化的问题。
在一实施例中,步骤S100可以作为一种具体功能呈现于逻辑实体中,该逻辑实体可以作为单独物理设备实体,也可以作为主机上的软件实体,该逻辑实体可以命名为数据准备单元,在于将获取到的待压缩视频的全局编码参考数据输入到图神经网络,从而获取到由图神经网络输出的码率关联数据。
步骤S200,根据码率关联数据确定用于控制视频编码码率的当前码率参数。
在一实施例中,通过图神经网络对待压缩视频的全局编码参考数据进行目标约束输出,得到全局优化情况下的码率关联数据,能够降低全局误差带来的码率波动影响,进而基于码率关联数据中的编码单元的划分信息或/和各个编码块的量化参数,得到适应于待压缩视频应用场景的宏块级的当前码率参数,有利于提升视频编码效率,优化用户观看体验,并且不显示引入标准关联信息,能够更好地适配多种编码标准。
可以理解地是,步骤S100和S200具有以下显著优点:
相比于在一些情况下,通过对编码块的统计信息与码率需求关系进行数学建模,预估出特定码率要求下的量化参数信息,本实施例无需统一编码标准,适用于混合编码策略的视频编码方案,例如适用于H.26x、VP9、AV1、AVSx等编码标准,与编码标准和编码器能力之间无强耦合关系,更加便于实现硬件编码芯片集成。
相比于在一些情况下,通过叠加ROI信息将NROI节省的目标编码比特数分配给位编码的ROI宏块,本实施例从全局出发进行考虑,充分考虑视觉过度的影响,能够缓解NROI经常导致的过度模糊情况,优化用户视频体验。
相比于在一些情况下,基于深度学习的压缩方法实现端到端编码,例如对视频压缩参数的输出,通常输入视频输出码流,或者,对网络参数的估计,例如采用置信度等统计数据评估最低码率,本实施例能够提供宏块级的编码参数,并且无需依赖现有的码率控制方法,能 够提升对场景的适应程度,优化用户视频体验。
在图2的示例中,在码率关联数据包括编码单元的划分信息和各个编码块的量化参数的情况下,步骤S200包括但不限于步骤S210。
步骤S210,在确定编码单元的划分信息的情况下,基于图神经网络对各个编码块的量化参数进行训练,得到用于控制视频编码码率的当前码率参数。
在一实施例中,考虑在确定编码单元的划分信息的场景下,通过优化量化参数配置实现对特定码率的控制调整,在这种情况下,编码单元作为固定值不参与图神经网络的训练过程,而是通过其他几个相关的全局编码参考数据的共同作用实现对量化参数的调整,该调整方式具有较强的针对性,且只需调整量化参数即可实现对应的宏块级的编码参数的输出,有利于更准确合理地得到当前码率参数。
可以理解地是,本实施例考虑了编码单元可训练的场景,在条件充分的情况下,可以采用基于先进编码搜索得到的结果作为真实值参与编码单元的训练,这在本实施例中并未限制。
以下给出具体示例对上述实施例进行说明。
示例一:
如图3所示,图3是本申请一个实施例提供的图神经网络的结构示意图。
在图3的示例中,该图神经网络可以但不限于应用于终端、智能互联等涉及视频编、解码的产品或应用设备,此次获取并输入的全局编码参考数据包括有码率约束信息、ROI信息、参考帧信息、当前帧信息以及相应帧的纹理统计信息,基于图3所示的图神经网络,根据输入的全局编码参考数据实现对纹理统计信息的应用,确定编码单元的划分信息和每个编码块的量化参数,进而由图神经网络对编码单元的划分信息和每个编码块的量化参数进行训练,输出所需的当前码率参数。
在图4的示例中,在图神经网络为根据获取到的全局编码参考数据训练得到的情况下,步骤S100包括但不限于步骤S110至S120。
步骤S110:基于图神经网络获取待压缩视频的编码帧信息和历史码率参数;
步骤S120:将全局编码参考数据、编码帧信息和历史码率参数输入到图神经网络,输出码率关联数据,历史码率参数为上一次确定的当前码率参数。
需要说明的是,图神经网络可以根据获取到的全局编码参考数据训练构建,在训练完成之后再将全局编码参考数据输入到构建好的图神经网络中,所构建的图神经网络能够匹配视频编码需求。
在一实施例中,考虑在全局编码参考数据的基础上优化输入数据,即通过图神经网络获取待压缩视频的编码帧信息和历史码率参数,并将其混合全局编码参考数据输入到图神经网络,以得到编码关联性更好的码率关联数据;其中,编码帧信息体现编码帧对于编码的具体影响,而基于上一次确定的当前码率参数进行优化,可以将码率参数的历史确定情景考虑在内,即相当于在码率参数的历史确定情景的基础上进一步输出码率关联数据,从而实现码率参数的优化输出。
在图5的示例中,步骤S120包括但不限于步骤S121至S122。
步骤S121,根据编码帧信息和历史码率参数确定编码质量评估参数;
步骤S122,将全局码率参考数据和编码质量评估参数输入到图神经网络,输出码率关联数据。
在一实施例中,通过编码帧信息和历史码率参数确定编码质量评估参数,进而通过编码质量评估参数的影响来进一步配合全局码率参考数据的影响,实现码率关联数据的优化输出,可以理解地是,当需要对码率关联数据进行优化时,可以采用本实施例的编码质量评估参数作为新的因素进行影响实现,换言之,若不需要进一步优化码率关联数据,则可以将编码质量评估参数设置为空值,这在本实施例中并未限制。
需要说明的是,在不同应用场景下,由于获取到的编码帧信息和历史码率参数是不同的,因此所确定的编码质量评估参数也不同;此外,即使在同一应用场景下,可以采用不同的计算方式以分别获取相应的编码质量评估参数,以根据特定的编码质量评估参数对码率关联数据的某一方面或多方面的内容进行输出优化,即各个编码质量评估参数也可以是不同的,这在本实施例中并未限制,以下给出具体实施例举例说明。
在图6的示例中,在编码帧信息包括参考帧信息,编码质量评估参数包括第一编码质量评估参数的情况下,步骤S121包括但不限于步骤S1211至S1213。
步骤S1211,根据历史码率参数确定与历史码率参数对应的编码码流;
步骤S1212,根据参考帧信息对编码码流进行解码,得到重建帧;
步骤S1213,根据重建帧确定第一编码质量评估参数。
在一实施例中,通过确定历史场景下的编码码流并对该编码码流进行解码,从而恢复出重建帧,实现依恢复原始帧为目标的重建策略,由于重建帧同时关联于参考帧信息以及与历史码率参数对应的编码码流,因此重建帧可以表征历史场景的编码情况和参考帧信息对应的编码情况,在这种条件下,基于重建帧所确定的第一编码质量评估参数具有良好的前向传播特性,能够满足基于图神经网络的优化训练需求,有利于改善码率参数结果输出。
在图7的示例中,在重建帧为多个,且每个重建帧对应一个编码码流的情况下,步骤S1213包括但不限于步骤S12131至S12132。
步骤S12131,对于每个重建帧,根据重建帧得到与重建帧对应的编码质量评估指标;
步骤S12132,从各个编码质量评估指标中,确定最大的编码质量评估指标为第一编码质量评估参数。
在一实施例中,对于每个编码码流需要评估其对应的解码数据帧的质量,即相当于需要获取与每个重建帧对应的编码质量评估指标,因此能够得到多个编码质量评估指标,进而针对重建帧在当前网络环境下的质量作为目标函数以更新图神经网络的训练参数,确定最大的编码质量评估指标为第一编码质量评估参数,说明第一编码质量评估参数对应的解码数据帧的质量最大,因此可以基于该参数来对图神经网络进行强化学习训练,以优化码率参数输出。
在图8的示例中,步骤S12131包括但不限于步骤S12133至S12134。
步骤S12133,根据重建帧确定与重建帧对应的重建质量参数、网络卡顿参数和切换状况参数;
步骤S12134,对重建质量参数、网络卡顿参数和切换状况参数进行加权叠加,得到与重建帧对应的编码质量评估指标。
在一实施例中,通过引入重建质量参数、网络卡顿参数和切换状况参数的加权叠加数值,可以准确得到与重建帧对应的编码质量评估指标,且编码质量评估指标只关联于重建帧自身的质量参数内容,不会掺杂其余杂质内容进行计算,因此误差波动相对较小。
以下给出具体示例以说明本实施例的原理。
示例二:
如图9所示,图9是本申请一个实施例提供的确定第一编码质量评估参数的执行流程图。
在图9的示例中,依次执行以下步骤:
步骤S300:根据从图神经网络中获取到的历史码率参数,得到与历史码率参数对应的编码码流;
步骤S400:引用参考帧,通过解码器对编码码流进行解码生成解码结果,得到重建帧;
步骤S500:基于重建帧确定第一编码质量评估参数。
其中,对应步骤3,采用直接针对恢复帧在当前网络环境下的质量作为目标函数来更新网络参数,例如可以综合引入重建质量、网络卡顿参数和切换状况的加权作为整体的体验质量(Quality of Experience,QoE)的评估指标,即
R(n)可以采用无参考图像质量评价指标,包括但不限于信息保真度准则(Information Fidelity Criterion,IFC)、基于深度学习的盲图像质量评估(Deep CNN-Based Blind Image Quality Predictor,DIQA)等。
可以理解地是,重建帧的主观质量也可以生成式对抗网络(Generative Adversarial Network,GAN)进行评估,可参考增强型超分辨率生成对抗网络(Enhanced Generative Adversarial Network,ESRGAN)等高质量重构的网络架构。
本实施例提出的编码策略需要待压缩视频进行分区域编码,根据区域信息的差异(例如ROI、纹理统计信息等)设计不同的编码参数和策略,在整体码率控制下实现最终输出的编码帧质量退化最小,考虑如下的目标函数:
其中,GNN′(X)表示采用本示例所输出的编码码流,编码码流为多个,Q(GNN′(X))表示该编码码流经过解码得到的数据帧的质量,约束条件为BD
GNN′(X)≤RATE,码率应当不大于规定的目标码率。将每一次满足约束的编码方案作为一个Action,将判别函数f作为评价机制,目标设定为寻找最大的f,则在该模型下可以基于强化学习的方式训练图神经网络,以便于实现在特定码率要求下对视频质量的最大保存。
在图10的示例中,在编码帧信息还包括当前帧信息,编码质量评估参数还包括第二编码质量评估参数的情况下,步骤S121还包括但不限于步骤S1214。
步骤S1214,对重建帧信息和当前帧信息进行差异化处理,得到第二编码质量评估参数,其中,重建帧信息与重建帧对应。
在一实施例中,在确定重建帧之后,配合当前帧信息对获取到的重建帧信息进行差异化处理,从而将当前帧信息对应的编码情况考虑在内,得到符合要求的第二编码质量评估参数,能够满足基于图神经网络的优化训练需求,有利于改善码率参数结果输出其中,基于差异化处理可以得到目标函数,进而基于所确定的目标函数对编码结果进行评估,以下给出具体示例以说明本实施例的原理。
示例三:
如图11所示,图11是本申请一个实施例提供的确定第二编码质量评估参数的执行流程图。
在图11的示例中,依次执行以下步骤:
步骤S600:根据从图神经网络中获取到的历史码率参数,得到与历史码率参数对应的编码码流;
步骤S700:引用参考帧,通过解码器对编码码流进行解码生成解码结果;
步骤S800:将该解码结果与当前帧的真实值进行比较,计算差异代价f,求取Loss(即第二编码质量评估参数)。
其中,Loss的求取方式根据具体应用场景而确定,这在本实施例中并未限制,以下进行举例说明。
f=||x’-x||
1
如上式所示,采用重构图像x’与未压缩图像x的L1范数作为Loss,或者,也可以采用隐式的判别方法,例如采用基于GAN的思想,设计判别网络分析编码图像的质量,即
f=||g(h’(h(x)))-g(x)||
1
其中h和h’分别表示编码单元和解码单元,由于h为有损压缩,所以恢复图像质量存在退化,通过重构目标函数g(x),即GAN的判别器部分,或者,也可以采用ESRGAN的判别器网络的输出,对编码结果进行评估,以便于实现在特定码率要求下对视频质量的最大保存。
可以理解地是,基于当前帧与重建帧的Loss计算还可以采用多种类似方案,例如在步骤3中,采用重构图像x’与未压缩图像x的L2范数作为Loss等。
需要说明的是,示例二和示例三的执行流程可以作为一种具体功能呈现于逻辑实体中,该逻辑实体可以作为单独物理设备实体,也可以作为主机上的软件实体,该逻辑实体可以命名为模型训练单元,在于根据重建帧确定第一编码质量评估参数,以及对重建帧信息和当前帧信息进行差异化处理,得到第二编码质量评估参数。
在图12的示例中,在码率关联数据包括第二码率关联数据的情况下,步骤S122包括但不限于步骤S1221。
步骤S1221,将全局码率参考数据和第二编码质量评估参数输入到图神经网络,得到第二码率关联数据。
在一实施例中,通过将全局码率参考数据和第二编码质量评估参数输入到图神经网络,从而得到与第二编码质量评估参数对应的第二码率关联数据,相比于原有的码率关联数据,以第二编码质量评估参数作为训练参数优化图神经网络,能够得到优化效果更好的二码率关联数据,有利于提升视频压缩效果。
在图13的示例中,在码率关联数据包括第一码率关联数据的情况下,步骤S122包括但不限于步骤S1222。
步骤S1222,对于每个编码码流,将全局码率参考数据和第一编码质量评估参数输入到图神经网络,得到与编码码流对应的第一码率关联数据。
在一实施例中,对于每个编码码流,通过将全局码率参考数据和第一编码质量评估参数输入到图神经网络,从而得到与每个编码码流对应的第一码率关联数据,即在具体应用场景下,在一次视频压缩过程中,对于其中的每个编码码流对应的第一码率关联数据可以实现分别控制调节,能够避免同质化,从而显著提升视频压缩效果。
在图14的示例中,步骤S100之前还包括但不限于步骤S900。
步骤S900,在接收到与图神经网络对应的资源限制信息的情况下,对图神经网络进行规模压缩处理。
在一实施例中,资源限制信息可以为在应用平台进行资源限制的情景下形成的,在这种情景下,根据应用场景的要求对图神经网络进行规模压缩处理,包括但不限于蒸馏、量化、剪枝以及动态网络设计等,以降低整体图神经网络模型的规模和算力需求,相应地,在资源扩张处理的情景下,可以对图神经网络进行规模扩张处理,或者采用新的符合要求的图神经网络替代原有的图神经网络。
在图15的示例中,步骤S200之后还包括但不限于步骤S1000。
步骤S1000,在接收到与当前码率参数对应的模型适配信息的情况下,根据模型适配信息对当前码率参数进行优化处理。
在一实施例中,模型适配信息可以为在网络传输环境受到约束的情景下形成的,在这种情景下,根据模型适配信息对当前码率参数进行优化处理,包括但不限于优化编码参数,考虑在牺牲主观质量的情况下降低码率等,以对网络结构和模型参数进行适配。
需要说明的是,步骤S900和步骤S1000可以作为一种具体功能呈现于逻辑实体中,该逻辑实体可以作为单独物理设备实体,也可以作为主机上的软件实体,该逻辑实体可以命名为推理应用单元,在于结合轻量化策略进行部署优化,在资源限制的场景下对图神经网络进行规模压缩处理,以及在网络传输环境受到约束的情景下对当前码率参数进行优化处理,达到降低模型的算力消耗的目的。
另外,参照图16,本申请的一个实施例还提供了一种视频码率控制装置100,该视频码率控制装置100包括:存储器110、处理器120及存储在存储器110上并可在处理器120上运行的计算机程序。
处理器120和存储器110可以通过总线或者其他方式连接。
实现上述实施例的视频码率控制方法所需的非暂态软件程序以及指令存储在存储器110中,当被处理器120执行时,执行上述各实施例的视频码率控制方法,例如,执行以上描述的图1中的方法步骤S100至S200、图2中的方法步骤S210、图4中的方法步骤S110至S120、图5中的方法步骤S121至S122、图6中的方法步骤S1211至S1213、图7中的方法步骤S12131至S12132、图8中的方法步骤S12133至S12134、图9中的方法步骤S300至S500、图10中的方法步骤S1214、图11中的方法步骤S600至S800、图12中的方法步骤S1221、图13中的方法步骤S1222、图14中的方法步骤S900或图15中的方法步骤S1000。
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器120或控制器执行,例如,被上述设备实施例中的一个处理器120执行,可使得上述处理器120执行上述实施例中的视频码率控制方法,例如,执行以上描述的图1中的方法步骤S100至S200、图2中的方法步骤S210、图4中的方法步骤S110至S120、图5中的方法步骤S121至S122、图6中的方法步骤S1211至S1213、图7中的方法步骤S12131至S12132、图8中的方法步骤S12133至S12134、 图9中的方法步骤S300至S500、图10中的方法步骤S1214、图11中的方法步骤S600至S800、图12中的方法步骤S1221、图13中的方法步骤S1222、图14中的方法步骤S900或图15中的方法步骤S1000。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包括计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的若干实施方式进行的具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包括在本申请权利要求所限定的范围内。
Claims (15)
- 一种视频码率控制方法,包括:将获取到的待压缩视频的全局编码参考数据输入到图神经网络,输出码率关联数据;根据所述码率关联数据确定用于控制视频编码码率的当前码率参数;其中,所述全局编码参考数据用于表征所述待压缩视频的压缩质量,所述码率关联数据包括如下类型中的至少一个:编码单元的划分信息、或所述编码单元中的各个编码块的量化参数。
- 根据权利要求1所述的码率控制方法,其中,所述图神经网络为根据获取到的全局编码参考数据训练得到;所述将获取到的待压缩视频的全局编码参考数据输入到图神经网络,输出码率关联数据,包括:基于所述图神经网络获取待压缩视频的编码帧信息和历史码率参数;将所述全局编码参考数据、所述编码帧信息和所述历史码率参数输入到所述图神经网络,输出码率关联数据,所述历史码率参数为上一次确定的所述当前码率参数。
- 根据权利要求1所述的码率控制方法,其中,所述码率关联数据包括所述编码单元的划分信息和所述各个编码块的量化参数;根据所述码率关联数据确定用于控制视频编码码率的当前码率参数,包括:在确定所述编码单元的划分信息的情况下,基于所述图神经网络对所述各个编码块的量化参数进行训练,得到用于控制视频编码码率的当前码率参数。
- 根据权利要求1所述的码率控制方法,其中,所述全局码率参考数据包括如下类型中的至少一个:与编码标准关联的码率约束信息;感兴趣区域ROI信息;编码类型信息;编码器信息;编码帧约束信息;编码帧统计信息;或帧间信息。
- 根据权利要求2所述的码率控制方法,其中,所述将所述全局编码参考数据、所述编码帧信息和所述历史码率参数输入到所述图神经网络,输出码率关联数据,包括:根据所述编码帧信息和所述历史码率参数确定编码质量评估参数;将所述全局码率参考数据和所述编码质量评估参数输入到所述图神经网络,输出码率关联数据。
- 根据权利要求5所述的码率控制方法,其中,所述编码帧信息包括参考帧信息,所述编码质量评估参数包括第一编码质量评估参数;所述根据所述编码帧信息和所述历史码率参数确定编码质量评估参数,包括:根据所述历史码率参数确定与所述历史码率参数对应的编码码流;根据所述参考帧信息对所述编码码流进行解码,得到重建帧;根据所述重建帧确定所述第一编码质量评估参数。
- 根据权利要求6所述的码率控制方法,其中,所述编码帧信息还包括当前帧信息,所述编码质量评估参数还包括第二编码质量评估参数;所述根据所述编码帧信息和所述历史码率参数确定编码质量评估参数,还包括:对重建帧信息和所述当前帧信息进行差异化处理,得到所述第二编码质量评估参数,其中,所述重建帧信息与所述重建帧对应。
- 根据权利要求6所述的码率控制方法,其中,所述重建帧为多个,每个所述重建帧对应一个所述编码码流;所述根据所述重建帧确定所述第一编码质量评估参数,包括:对于每个所述重建帧,根据所述重建帧得到与所述重建帧对应的编码质量评估指标;从各个所述编码质量评估指标中,确定最大的所述编码质量评估指标为所述第一编码质量评估参数。
- 根据权利要求8所述的码率控制方法,其中,所述根据所述重建帧得到与所述重建帧对应的编码质量评估指标,包括:根据所述重建帧确定与所述重建帧对应的重建质量参数、网络卡顿参数和切换状况参数;对所述重建质量参数、所述网络卡顿参数和所述切换状况参数进行加权叠加,得到与所述重建帧对应的编码质量评估指标。
- 根据权利要求8所述的码率控制方法,其中,所述码率关联数据包括第一码率关联数据;所述将所述全局码率参考数据和所述编码质量评估参数输入到所述图神经网络,输出码率关联数据,包括:对于每个所述编码码流,将所述全局码率参考数据和所述第一编码质量评估参数输入到所述图神经网络,得到与所述编码码流对应的所述第一码率关联数据。
- 根据权利要求7所述的码率控制方法,其中,所述码率关联数据包括第二码率关联数据;所述将所述全局码率参考数据和所述编码质量评估参数输入到所述图神经网络,输出码率关联数据,包括:将所述全局码率参考数据和所述第二编码质量评估参数输入到所述图神经网络,得到所述第二码率关联数据。
- 根据权利要求1所述的码率控制方法,其中,所述将获取到的待压缩视频的全局编码参考数据输入到图神经网络,输出码率关联数据之前,还包括:在接收到与所述图神经网络对应的资源限制信息的情况下,对所述图神经网络进行规模压缩处理。
- 根据权利要求1所述的码率控制方法,其中,所述根据所述码率关联数据确定用于控制视频编码码率的当前码率参数之后,还包括:在接收到与所述当前码率参数对应的模型适配信息的情况下,根据所述模型适配信息对所述当前码率参数进行优化处理。
- 一种视频码率控制装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至13中任意一项所述的视频码率控制方法。
- 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至13中任意一项所述的视频码率控制方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111508059.8A CN116320529A (zh) | 2021-12-10 | 2021-12-10 | 视频码率控制方法及装置、计算机可读存储介质 |
CN202111508059.8 | 2021-12-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023103200A1 true WO2023103200A1 (zh) | 2023-06-15 |
Family
ID=86729546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/080754 WO2023103200A1 (zh) | 2021-12-10 | 2022-03-14 | 视频码率控制方法及装置、计算机可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116320529A (zh) |
WO (1) | WO2023103200A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118368433A (zh) * | 2024-06-20 | 2024-07-19 | 深圳金三立视频科技股份有限公司 | 基于带宽自适应的视频编码压缩方法 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105898331A (zh) * | 2016-05-12 | 2016-08-24 | 天津大学 | 一种深度视频编码的比特分配和码率控制方法 |
CN107277520A (zh) * | 2017-07-11 | 2017-10-20 | 中国科学技术大学 | 帧内预测的码率控制方法 |
CN109862356A (zh) * | 2019-01-17 | 2019-06-07 | 中国科学院计算技术研究所 | 一种基于感兴趣区域的视频编码方法及系统 |
CN110248195A (zh) * | 2019-07-17 | 2019-09-17 | 北京百度网讯科技有限公司 | 用于输出信息的方法和装置 |
CN110650370A (zh) * | 2019-10-18 | 2020-01-03 | 北京达佳互联信息技术有限公司 | 一种视频编码参数确定方法、装置、电子设备及存储介质 |
CN110832856A (zh) * | 2017-11-30 | 2020-02-21 | 深圳市大疆创新科技有限公司 | 用于减小视频编码波动的系统及方法 |
CN110996131A (zh) * | 2020-03-02 | 2020-04-10 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、计算机设备及存储介质 |
CN111294595A (zh) * | 2020-02-04 | 2020-06-16 | 清华大学深圳国际研究生院 | 一种基于深度强化学习的视频编码帧内码率控制方法 |
CN111918066A (zh) * | 2020-09-08 | 2020-11-10 | 北京字节跳动网络技术有限公司 | 视频编码方法、装置、设备及存储介质 |
US20210067785A1 (en) * | 2020-11-17 | 2021-03-04 | Intel Corporation | Video encoding rate control for intra and scene change frames using machine learning |
-
2021
- 2021-12-10 CN CN202111508059.8A patent/CN116320529A/zh active Pending
-
2022
- 2022-03-14 WO PCT/CN2022/080754 patent/WO2023103200A1/zh unknown
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105898331A (zh) * | 2016-05-12 | 2016-08-24 | 天津大学 | 一种深度视频编码的比特分配和码率控制方法 |
CN107277520A (zh) * | 2017-07-11 | 2017-10-20 | 中国科学技术大学 | 帧内预测的码率控制方法 |
CN110832856A (zh) * | 2017-11-30 | 2020-02-21 | 深圳市大疆创新科技有限公司 | 用于减小视频编码波动的系统及方法 |
CN109862356A (zh) * | 2019-01-17 | 2019-06-07 | 中国科学院计算技术研究所 | 一种基于感兴趣区域的视频编码方法及系统 |
CN110248195A (zh) * | 2019-07-17 | 2019-09-17 | 北京百度网讯科技有限公司 | 用于输出信息的方法和装置 |
CN110650370A (zh) * | 2019-10-18 | 2020-01-03 | 北京达佳互联信息技术有限公司 | 一种视频编码参数确定方法、装置、电子设备及存储介质 |
CN111294595A (zh) * | 2020-02-04 | 2020-06-16 | 清华大学深圳国际研究生院 | 一种基于深度强化学习的视频编码帧内码率控制方法 |
CN110996131A (zh) * | 2020-03-02 | 2020-04-10 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、计算机设备及存储介质 |
CN111918066A (zh) * | 2020-09-08 | 2020-11-10 | 北京字节跳动网络技术有限公司 | 视频编码方法、装置、设备及存储介质 |
US20210067785A1 (en) * | 2020-11-17 | 2021-03-04 | Intel Corporation | Video encoding rate control for intra and scene change frames using machine learning |
Non-Patent Citations (2)
Title |
---|
WEI LILI, YANG ZHENGLONG, WANG ZHENMING, WANG GUOZHONG: "A CNN-Based Optimal CTU λ Decision for HEVC Intra Rate Control", IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, INFORMATION & SYSTEMS SOCIETY, TOKYO., JP, vol. E104.D, no. 10, 1 October 2021 (2021-10-01), JP , pages 1766 - 1769, XP093072130, ISSN: 0916-8532, DOI: 10.1587/transinf.2021EDL8047 * |
XU YIWEN, LIU HANG, HUANG JINGQUAN, ZHAO TIESON: "VVC rate control algorithm based on deep reinforcement learning", CHINA SCIENCEPAPER., vol. 16, no. 7, 1 July 2021 (2021-07-01), pages 748 - 753, XP093072128 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118368433A (zh) * | 2024-06-20 | 2024-07-19 | 深圳金三立视频科技股份有限公司 | 基于带宽自适应的视频编码压缩方法 |
Also Published As
Publication number | Publication date |
---|---|
CN116320529A (zh) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8804815B2 (en) | Support vector regression based video quality prediction | |
KR100484148B1 (ko) | 개선된 비트율 제어 방법과 그 장치 | |
CN101010964A (zh) | 在可分级视频编码中的使用帧速率上变换技术的方法与设备 | |
US20200068200A1 (en) | Methods and apparatuses for encoding and decoding video based on perceptual metric classification | |
CN110248189B (zh) | 一种视频质量预测方法、装置、介质和电子设备 | |
CN111193931B (zh) | 一种视频数据的编码处理方法和计算机存储介质 | |
CN108012149A (zh) | 一种视频编码中码率控制的方法 | |
WO2022021422A1 (zh) | 视频编码方法、编码器、系统以及计算机存储介质 | |
WO2023103200A1 (zh) | 视频码率控制方法及装置、计算机可读存储介质 | |
CN114793282A (zh) | 带有比特分配的基于神经网络的视频压缩 | |
CN111556318A (zh) | 数据传输方法及装置 | |
CN113313777A (zh) | 一种图像压缩处理方法、装置、计算机设备和存储介质 | |
US20050254576A1 (en) | Method and apparatus for compressing video data | |
CN111416978B (zh) | 视频编解码方法及系统、计算机可读存储介质 | |
CN116760988B (zh) | 基于人类视觉系统的视频编码方法和装置 | |
US20050141616A1 (en) | Video encoding and decoding methods and apparatuses using mesh-based motion compensation | |
CN109219960B (zh) | 视频编码质量平滑度的优化方法、装置、设备及存储介质 | |
CN112243129B (zh) | 视频数据处理方法、装置、计算机设备及存储介质 | |
CN109618155B (zh) | 压缩编码方法 | |
JP2019004294A (ja) | 特徴量推定装置及びプログラム | |
US20240267541A1 (en) | Encoder and associated signal processing method | |
CN102948147A (zh) | 基于变换系数直方图的视频速率控制 | |
CN112822493A (zh) | 基于复杂度来适应性地对视频帧进行编码 | |
CN112218086A (zh) | 编码、解码方法、传输方法、编码、解码装置及系统 | |
CN114422783B (zh) | 视频编码方法、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22902644 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |