WO2023134523A1 - Content adaptive video coding method and apparatus, device and storage medium - Google Patents

Content adaptive video coding method and apparatus, device and storage medium Download PDF

Info

Publication number
WO2023134523A1
WO2023134523A1 PCT/CN2023/070555 CN2023070555W WO2023134523A1 WO 2023134523 A1 WO2023134523 A1 WO 2023134523A1 CN 2023070555 W CN2023070555 W CN 2023070555W WO 2023134523 A1 WO2023134523 A1 WO 2023134523A1
Authority
WO
WIPO (PCT)
Prior art keywords
encoding
video
coding
rate control
parameter
Prior art date
Application number
PCT/CN2023/070555
Other languages
French (fr)
Chinese (zh)
Inventor
刘芳
袁子逸
洪旭东
崔同兵
Original Assignee
百果园技术(新加坡)有限公司
刘芳
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 刘芳 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2023134523A1 publication Critical patent/WO2023134523A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive

Definitions

  • the embodiments of the present application relate to the technical field of video processing, and in particular, to a content-adaptive video coding method, device, device, and storage medium.
  • Video codec technology is to achieve as high a compression ratio as possible and as high as possible within the available computing resources.
  • the quality of video reconstruction can meet the requirements of storage capacity and bandwidth.
  • Early video service providers usually use a pre-determined general encoding configuration to process almost all video content, so there may be insufficient bit rate for high-motion videos, resulting in low encoding quality, and for low-speed motion videos There may be a bit rate waste problem.
  • Content-adaptive encoding sets different encoding configurations for different videos according to the video content, and finds the minimum bit rate for each video or video segment that meets the requirements of clarity and subjective sensitivity, so as to save bandwidth.
  • the encoded data is extracted as features by pre-encoding the training video data, and combined with the corresponding constant bit rate coefficient values, the machine learning model is trained.
  • this model to predict encoding parameters based on video features, and then use the predicted values for encoding, a balance between encoding bit rate and encoding quality can be achieved to improve the viewing experience of most viewers.
  • this encoding method extracts features by encoding the entire video, and then uses a machine learning model to predict the encoding constant bit rate coefficient value of the entire video. For long videos containing complex and mixed content, this method will cause video The encoding quality of complex parts is poor, and the code rate of simple parts is wasted.
  • the encoding process first encode the entire video to extract features and predict the constant bit rate coefficient value, and then encode according to the predicted value will consume a lot of time, which is not suitable for live broadcast scenarios.
  • Embodiments of the present application provide a content-adaptive video coding method, device, device, and storage medium, which solves the problem of unsatisfactory video coding effects in complex scenes in the related art, improves video coding efficiency, and is applicable to Live video scene.
  • the embodiment of the present application provides a content-adaptive video coding method, the method including:
  • the embodiment of the present application also provides a content-adaptive video coding device, including:
  • An image set determination module configured to acquire video data to be encoded, and divide the video data into multiple image sets comprising continuous frame images
  • the code rate parameter determination module is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;
  • An encoding module configured to encode the set of images according to the encoding feature and the rate control parameter.
  • the embodiment of the present application also provides a content-adaptive video coding device, which includes:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors are made to implement the content-adaptive video coding method described in the embodiment of this application.
  • the embodiment of the present application further provides a storage medium storing computer-executable instructions, the computer-executable instructions are configured to execute the content-adaptive video coding described in the embodiment of the present application when executed by a computer processor method.
  • the embodiment of the present application further provides a computer program product, including a computer program.
  • a computer program product including a computer program.
  • the video data is divided into multiple image sets containing continuous frame images, the encoding features of the image sets are determined, and the encoding features and the set video picture evaluation parameters are input into the pre-training
  • the machine learning model of the machine learning model outputs bit rate control parameters, and encodes the image set according to the encoding characteristics and bit rate control parameters, which solves the problem of unsatisfactory encoding effect of video encoding in complex scenes in related technologies, improves video encoding efficiency, and at the same time Suitable for real-time video scenarios.
  • FIG. 1 is a flowchart of a content-adaptive video coding method provided by an embodiment of the present application
  • FIG. 2 is a flow chart of a method for performing secondary encoding based on a primary encoding result provided in an embodiment of the present application
  • FIG. 3 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application.
  • FIG. 4 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application.
  • FIG. 5 is a structural block diagram of a content-adaptive video coding device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a content-adaptive video coding device provided by an embodiment of the present application.
  • Fig. 1 is a flow chart of a content adaptive video coding method provided by the embodiment of the present application, which can be applied to video data coding, and the method can be used by computing devices such as notebooks, desktops, smart phones, servers and tablet computers, etc. Execute, specifically include the following steps:
  • Step S101 Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.
  • the video data to be encoded includes recorded video data and real-time generated video data that needs to be transmitted and displayed, such as live video data.
  • the video data when encoding video data, for a piece of video data, the video data is first divided into multiple image sets including continuous frame images. That is, when video coding is performed, separate video coding is performed for each subdivided image set.
  • the video data may be divided into multiple consecutive GOPs (Group of pictures, a group of pictures), and each GOP represents a group of continuous pictures in a coded video stream.
  • each GOP contains 15 or 20 frames of images, that is, the video data to be encoded is divided into multiple consecutive image sets, and each image set contains 15 to 20 frames of images, that is, the encoding of video data is performed using GOP as the encoding unit. coding.
  • Step S102 Determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into the pre-trained machine learning model to output code rate control parameters.
  • the manner of determining the encoding feature of the image set may be achieved by using pre-encoding to obtain the encoding feature of the image set.
  • an encoder is used to encode an image set to obtain corresponding encoding features.
  • the encoding features of the image set are obtained by performing feature extraction and analysis on each frame of image in the image set.
  • the encoding features include motion vector features, distortion degree parameters, complexity parameters, etc. used to describe each frame of images in the image set.
  • the motion vector feature is used to characterize the degree of change of the image, and the more severe the changes between the frames of images, the larger the motion vector is.
  • the distortion degree parameter It is used to represent the degree of distortion of the image. The greater the degree of image distortion, the higher the value of this parameter.
  • the degree of image distortion is low, the corresponding value of this parameter is relatively low.
  • the complexity parameter is used to characterize the complexity of the image, such as An image contains many different objects, and the greater the pixel difference between each object, the higher the complexity.
  • the identification and determination of the above-mentioned encoding features can be realized through existing encoder modules and image processing algorithms.
  • the video picture evaluation parameter is a comprehensive evaluation index used to characterize the image quality.
  • the video frame evaluation parameters may be represented by VMAF (Video Multimethod Assessment Fusion, video multimethod assessment fusion).
  • VMAF is an objective evaluation index proposed by Netflix that combines human visual modeling and machine learning.
  • Netflix uses a large amount of subjective data as a training set, and integrates algorithms of different evaluation dimensions by means of machine learning, which is currently a relatively mainstream objective evaluation index.
  • the VMAF value is used to save the encoding bit rate without changing the subjective quality of the video.
  • the determined encoding features of the image set and the set video picture evaluation parameters are input to the pre-trained machine learning model to output the code rate control parameters, wherein the set video picture evaluation parameters can be based on different picture quality Customized settings can be made according to requirements, different playback devices, etc., and the set values can also be adjusted.
  • the input machine learning model is a pre-trained neural network model, which can output corresponding bit rate control parameters based on the encoding characteristics of the image set and the set video picture evaluation parameter input.
  • the rate control parameter may be CRF (Constant Rate Factor, constant rate factor) or CQF (Constant Quality Factor, constant quality factor).
  • CRF is a kind of bit rate control.
  • CRF value The smaller the CRF value, the higher the video quality, and the larger the file size. The larger the CRF value, the higher the video compression rate, but the lower the video quality.
  • CRF values correspond to different code rates.
  • a mapping table may be used to record different CRF values and corresponding code rates, or a function curve may be used to characterize the relationship between CRF and code rates.
  • Step S103 encoding the image set according to the encoding feature and the rate control parameter.
  • the image set is finally re-encoded based on the code rate control parameters and the coding features determined in step S101 to output code stream data.
  • FIG. 2 is a flow chart of a method for performing secondary encoding based on the primary encoding result provided in the embodiment of the present application, as shown in FIG. 2 , specifically including:
  • Step S1031. Determine frame type information and scene information according to the encoding feature.
  • the encoding feature records the frame type of each frame, such as different frame types divided by I frame, P frame and B frame.
  • different frame type information requires encoding and compression with different qualities due to their different reference relations.
  • the I frame represents a key frame, which is a frame that is completely reserved.
  • B frame means a two-way difference frame, that is, the B frame records the difference between the current frame and the preceding and following frames when decoding the B frame image, Not only to obtain the previous cached picture, but also the decoded picture, and obtain the final picture by superimposing the front and back pictures with the current frame data.
  • the scene information can be divided into, for example, a moving scene and a static scene. It can be determined from encoded features by an integrated scene discrimination module. Among them, the encoding feature records image features related to motion displacement changes such as the motion vector and motion compensation of each frame of the image, and the scene information of the image is determined by analyzing the data such as the motion vector and motion compensation.
  • Step S1032 perform prediction analysis according to the frame type information, the scene information and the code rate control parameters to obtain coding parameters.
  • the encoding parameter takes HEVC (High Efficiency Video Coding, high-efficiency video coding) as an example, which corresponds to a quantization parameter QP (quantization parameter, quantization parameter).
  • the quantization parameter QP is the sequence number of the quantization step Qstep.
  • the quantization step Qstep has 52 values in total, and the value of QP is 0-51.
  • the value of QP is 0-39.
  • the encoding parameter takes the quantization parameter QP as an example, which reflects the compression of spatial details.
  • QP values of 0 to 51 when the QP takes the minimum value of 0, it means the quantization is the finest; on the contrary, when the QP takes the maximum value of 51, it means the quantization is the roughest. Quantization is to reduce the length of image coding and reduce unnecessary information in visual restoration without reducing the visual effect.
  • the process of obtaining encoding parameters by predictive analysis based on frame type information, scene information, and rate control parameters can be implemented using its integrated encoder module. That is, different frame type information (I frame, B frame, P frame), scene information (static scene, dynamic scene), and code rate control parameters (CRF) are jointly determined to obtain the final coding parameters (frame-level QP).
  • frame type is a key frame
  • scene information is a dynamic scene
  • code rate control parameters CRF
  • Step S1033 encode the image set based on the encoding parameters.
  • HEVC high-efficiency video coding is performed by taking the frame-level QP parameters in HEVC high-efficiency video coding as an example, so as to realize code stream output.
  • the above process of performing predictive analysis to obtain encoding parameters, and encoding the set of images based on the encoding parameters includes: performing predictive analysis to obtain the first encoding parameters; Coding parameters, coding feedback information, cache information, frame type information, and scene information determine the second coding parameter; adjust the quantization offset parameter according to the first coding parameter; adjust the quantization offset parameter according to the second coding parameter and the adjusted quantization offset parameter
  • a collection of images is encoded to output codestream data.
  • the first encoding parameter can be understood as the base QP information (base QP), which determines the frame-level QP according to the first encoding parameter, encoding feedback information, cache information, frame type information, and scene information information.
  • base QP base QP
  • the cache information represents the parameters of the buffer memory in the process of video encoding, and the larger the cache occupation, the larger the corresponding QP value, so as to reduce the calculation amount and storage amount of video encoding.
  • the encoding feedback information can be obtained during the pre-encoding process or the information fed back after encoding the previous round of this image set or video, such as the degree of distortion.
  • the quantization offset parameter is further adjusted according to the first encoding parameter.
  • the quantization offset parameter can be represented by the intensity of the cutree, which indicates the quantization offset adjustment performed according to the degree to which the current block is referenced.
  • the current block is referenced, it is further determined whether a certain number of blocks after the current block refer to the current block, and if more are referenced by subsequent image blocks, it indicates that the current block belongs to a slowly changing scene , then correspondingly lower the QP value to improve the image quality.
  • the set of images is comprehensively encoded by using the determined second encoding parameters and the determined quantization offset parameters to output code stream data, so as to ensure the optimal balance of encoding effect between image quality and compression rate.
  • the video when performing video encoding, the video is first divided into image sets, and after encoding them once to obtain the encoding features, the trained machine learning model is used to output accurate bit rate control parameters, and then based on the bit rate control parameters And the encoding features obtained in the first encoding process are used to encode the image set twice to finally obtain the video encoding result.
  • This method uses the video live content adaptive encoding technology of twice encoding and machine learning, and uses HEVC's multiple encoding and The predictive model of machine learning dynamically adjusts the encoding configuration according to the complexity of the video content, realizes content adaptive encoding, and better balances video fluency and clarity. It can be applied to real-time live video scenes, and the video encoding effect is good.
  • Fig. 3 is a flow chart of another content-adaptive video coding method provided by the embodiment of the present application, which provides a method for determining the coding characteristics of an image set, as shown in Fig. 3 , specifically including:
  • Step S201 Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.
  • Step S202 Obtain a preset number of frame images in the image set, encode the preset number of frame images to obtain encoding features, and determine the encoding features as the encoding features of the image set.
  • the preset number of frame images may be miniGOP images in a GOP image, that is, taking a GOP image with an image set of 15 frames as an example, the preset number of frame images
  • the frame images may be 5 frame images among them.
  • the process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain encoding features. Then determine the coding features of the preset number of frame images as the coding features of the image set.
  • Step S203 input the coding features and the set evaluation parameters of the video picture into the pre-trained machine learning model to output the code rate control parameters.
  • Step S204 encoding the image set according to the encoding feature and the rate control parameter.
  • the video live content adaptive encoding technology of two encodings and machine learning is adopted, and the encoding configuration is dynamically adjusted according to the complexity of the video content.
  • the encoding configuration is dynamically adjusted according to the complexity of the video content.
  • Fig. 4 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application, which shows a method for outputting bit rate control parameters through a machine learning model in an embodiment, wherein the machine learning model includes The joint model formed by the first training model and the second training model, as shown in Figure 4, specifically includes:
  • Step S301 Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.
  • Step S302 Determine the encoding features of the image set, input the encoding features and the set video frame evaluation parameters into the first training model and the second training model respectively, and obtain the first training model output by the first training model A code rate control parameter, and a second code rate control parameter output by the second training model.
  • the first training model is an XGBoost model
  • the second training model is a LightGBM model, both of which are decision tree-based machine learning algorithms.
  • the first rate control parameter output by the first training model is denoted as CRF1
  • the second rate control parameter output by the second training model is denoted as CRF2.
  • Step S303 performing weighted average calculation on the first rate control parameter and the second rate control parameter to obtain a rate control parameter.
  • ⁇ 1 + ⁇ 2 1, ⁇ 1 ⁇ [0,1], ⁇ 2 ⁇ [0,1].
  • Step S304 encode the image set according to the encoding feature and the rate control parameter.
  • this solution before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, it also includes: acquiring video sample data of different scene types and corresponding to different resolutions; The data is divided into training set samples, test set samples and verification set samples, which are respectively input to the first training model and the second training model for training.
  • this solution first distinguishes the scene types of video images, such as distinguishing them into dynamic scenes and static scenes, and at the same time performs training based on different resolution video images as sample data.
  • the video sample data Divide into training set samples, test set samples and validation set samples to get the final training model with good prediction effect.
  • Fig. 5 is a structural block diagram of a content-adaptive video coding device provided in an embodiment of the present application.
  • the device is configured to execute the content-adaptive video coding method provided in the above embodiment, and has corresponding functional modules and beneficial effects for executing the method.
  • the device specifically includes: an image set determination module 101, a code rate parameter determination module 102 and an encoding module 103, wherein,
  • the image set determination module 101 is configured to acquire video data to be encoded, and divide the video data into a plurality of image sets comprising continuous frame images;
  • the code rate parameter determination module 102 is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;
  • the coding module 103 is configured to code the image set according to the coding feature and the code rate control parameter.
  • the video when performing video encoding, the video is first divided into image sets, and after encoding them once to obtain the encoding features, the trained machine learning model is used to output accurate bit rate control parameters, and then based on the bit rate control parameters And the encoding features obtained in the first encoding process are used to encode the image set twice to finally obtain the video encoding result.
  • This method uses the video live content adaptive encoding technology of twice encoding and machine learning, and uses HEVC's multiple encoding and The predictive model of machine learning dynamically adjusts the encoding configuration according to the complexity of the video content, realizes content adaptive encoding, and better balances video fluency and clarity. It can be configured as a real-time live video scene, and the video encoding effect is good .
  • the code rate parameter determination module 102 is specifically configured as:
  • the machine learning model includes a joint model composed of a first training model and a second training model
  • the code rate parameter determination module 102 is specifically configured as:
  • a code rate control parameter is obtained by performing weighted average calculation on the first code rate control parameter and the second code rate control parameter.
  • the code rate parameter determining module 102 is further configured to:
  • the video sample data is divided into training set samples, test set samples and verification set samples, and are respectively input to the first training model and the second training model for training.
  • the encoding module 103 is specifically configured as:
  • the set of images is encoded based on the encoding parameters.
  • the encoding module 103 is specifically configured as:
  • the encoding module 103 is specifically configured as:
  • FIG. 6 is a schematic structural diagram of a content-adaptive video coding device provided in an embodiment of the present application.
  • the device includes a processor 201, a memory 202, an input device 203, and an output device 204;
  • the number of can be one or more, take a processor 201 as an example in Fig. 6;
  • the processor 201, memory 202, input device 203 and output device 204 in the equipment can be connected by bus or other ways, in Fig. 6 by Take the bus connection as an example.
  • the memory 202 can be configured to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the content-adaptive video coding method in the embodiment of the present application.
  • the processor 201 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 202, that is, implements the above-mentioned content adaptive video coding method.
  • the input device 203 can be configured to receive input numbers or character information, and generate key signal input related to user settings and function control of the device.
  • the output device 204 may include a display device such as a display screen.
  • the embodiment of the present application also provides a storage medium containing computer-executable instructions, and the computer-executable instructions are configured to execute a content-adaptive video coding method described in the above-mentioned embodiments when executed by a computer processor, specifically including:
  • each unit and module included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized;
  • the specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the embodiments of the present application.
  • various aspects of the method provided in this application can also be implemented in the form of a program product, which includes program code, and when the program product is run on a computer device, the program code is configured to
  • the computer device is made to execute the steps in the methods described above in this specification according to various exemplary implementations of the present application.
  • the computer device may execute the content adaptive video coding method described in the embodiments of the present application.
  • the program product can be implemented using any combination of one or more readable media.

Abstract

Embodiments of the present application provide a content adaptive video coding method and apparatus, a device and a storage medium. The method comprises: obtaining video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images; determining coding features of the image sets, and inputting the coding features and set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters; and coding the image sets according to the coding features and the code rate control parameters. The present solution improvs the video coding efficiency, and is suitable for a real-time video scene.

Description

内容自适应视频编码方法、装置、设备和存储介质Content Adaptive Video Coding Method, Device, Equipment and Storage Medium
本申请要求在2022年1月14日提交中国专利局,申请号为202210043241.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202210043241.9 filed with the China Patent Office on January 14, 2022, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及视频处理技术领域,尤其涉及一种内容自适应视频编码方法、装置、设备和存储介质。The embodiments of the present application relate to the technical field of video processing, and in particular, to a content-adaptive video coding method, device, device, and storage medium.
背景技术Background technique
随着移动互联网技术的迅速发展,视频已经成为用户使用的主流媒介,视频直播、点播、短视频以及视频聊天已经成为人们生活的一部分。但是由于相比于文字和图片,视频数据量非常巨大,视频的传输和存储也面临着巨大的挑战,视频编解码技术就是在可用的计算资源内,实现尽可能高的压缩比和尽可能高的视频重建质量以达到存储容量和带宽的要求。早期视频服务提供商通常会使用一个事先确定好的通用的编码配置来处理几乎所有的视频内容,这样对于高度运动的视频可能会存在码率不足的问题造成编码质量很低,对于低速运动的视频可能会存在码率浪费的问题。内容自适应编码通过根据视频内容为不同的视频设置不同的编码配置,为每个视频或者视频片段找出满足清晰度和主观敏感度要求的最低码率,达到节省带宽的目的。With the rapid development of mobile Internet technology, video has become the mainstream medium used by users, and live video, on-demand, short video and video chat have become part of people's lives. However, compared with text and pictures, the amount of video data is huge, and video transmission and storage are also facing huge challenges. Video codec technology is to achieve as high a compression ratio as possible and as high as possible within the available computing resources. The quality of video reconstruction can meet the requirements of storage capacity and bandwidth. Early video service providers usually use a pre-determined general encoding configuration to process almost all video content, so there may be insufficient bit rate for high-motion videos, resulting in low encoding quality, and for low-speed motion videos There may be a bit rate waste problem. Content-adaptive encoding sets different encoding configurations for different videos according to the video content, and finds the minimum bit rate for each video or video segment that meets the requirements of clarity and subjective sensitivity, so as to save bandwidth.
在进行视频编码时,通过预先编码训练视频数据提取编码数据作为特征,并结合对应的恒定码率系数值,训练机器学习模型。在生产环境中通过使用该模型根据视频特征进行编码参数预测,然后使用预测值进行编码,在编码比特率和编码质量中取得平衡,提高大多数观众观看体验。但这种编码方法通过对整个视频进行编码提取特征,然后使用机器学习模型进行预测得到整个视频的编码恒定码率系数值,这种方法对于包含复杂和混合内容的长视频来说,会导致视频复杂部分编码质量较差,简单部分码率浪费。同时在编码过程中,首先对整个视频进行编码提取特征并预测恒定码率系数值,然后根据预测值进行编码需要消耗大量时间,不适用于直播场景。When performing video encoding, the encoded data is extracted as features by pre-encoding the training video data, and combined with the corresponding constant bit rate coefficient values, the machine learning model is trained. In the production environment, by using this model to predict encoding parameters based on video features, and then use the predicted values for encoding, a balance between encoding bit rate and encoding quality can be achieved to improve the viewing experience of most viewers. However, this encoding method extracts features by encoding the entire video, and then uses a machine learning model to predict the encoding constant bit rate coefficient value of the entire video. For long videos containing complex and mixed content, this method will cause video The encoding quality of complex parts is poor, and the code rate of simple parts is wasted. At the same time, in the encoding process, first encode the entire video to extract features and predict the constant bit rate coefficient value, and then encode according to the predicted value will consume a lot of time, which is not suitable for live broadcast scenarios.
发明内容Contents of the invention
本申请实施例提供了一种内容自适应视频编码方法、装置、设备和存储介质,解决了相关技术中视频编码对于复杂场景下的编码效果不理想的问题,提高了视频编码效率,同时适用于实时视频场景。Embodiments of the present application provide a content-adaptive video coding method, device, device, and storage medium, which solves the problem of unsatisfactory video coding effects in complex scenes in the related art, improves video coding efficiency, and is applicable to Live video scene.
第一方面,本申请实施例提供了一种内容自适应视频编码方法,该方法包括:In the first aspect, the embodiment of the present application provides a content-adaptive video coding method, the method including:
获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;Obtaining video data to be encoded, and dividing the video data into a plurality of image sets comprising continuous frame images;
确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;Determining the coding features of the image collection, inputting the coding features and the set video picture evaluation parameters to the pre-trained machine learning model to output code rate control parameters;
根据所述编码特征和所述码率控制参数对所述图像集合进行编码。Encoding the set of images according to the encoding features and the rate control parameters.
第二方面,本申请实施例还提供了一种内容自适应视频编码装置,包括:In the second aspect, the embodiment of the present application also provides a content-adaptive video coding device, including:
图像集合确定模块,配置为获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;An image set determination module configured to acquire video data to be encoded, and divide the video data into multiple image sets comprising continuous frame images;
码率参数确定模块,配置为确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;The code rate parameter determination module is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;
编码模块,配置为根据所述编码特征和所述码率控制参数对所述图像集合进行编码。An encoding module configured to encode the set of images according to the encoding feature and the rate control parameter.
第三方面,本申请实施例还提供了一种内容自适应视频编码设备,该设备包括:In the third aspect, the embodiment of the present application also provides a content-adaptive video coding device, which includes:
一个或多个处理器;one or more processors;
存储装置,配置为存储一个或多个程序,a storage device configured to store one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请实施例所述的内容自适应视频编码方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the content-adaptive video coding method described in the embodiment of this application.
第四方面,本申请实施例还提供了一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时配置为执行本申请实施例所述的内容自适应视频编码方法。In a fourth aspect, the embodiment of the present application further provides a storage medium storing computer-executable instructions, the computer-executable instructions are configured to execute the content-adaptive video coding described in the embodiment of the present application when executed by a computer processor method.
第五方面,本申请实施例还提供一种计算机程序产品,包括计算机程序,该计算机程序被执行时,可以实现如上述所述内容自适应视频编码方法的步骤。In a fifth aspect, the embodiment of the present application further provides a computer program product, including a computer program. When the computer program is executed, the steps of the content-adaptive video coding method described above can be realized.
本申请实施例中,通过获取待编码的视频数据,将视频数据划分为多个包含连续帧图像的图像集合,确定图像集合的编码特征,将编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数,根据编码特征和码率控制参数对图像集合进行编码,解决了相关技术中视频编码对于复杂场景下的编码效果不理想的问题,提高了视频编码效率,同时适用于实时视频场景。In the embodiment of the present application, by obtaining the video data to be encoded, the video data is divided into multiple image sets containing continuous frame images, the encoding features of the image sets are determined, and the encoding features and the set video picture evaluation parameters are input into the pre-training The machine learning model of the machine learning model outputs bit rate control parameters, and encodes the image set according to the encoding characteristics and bit rate control parameters, which solves the problem of unsatisfactory encoding effect of video encoding in complex scenes in related technologies, improves video encoding efficiency, and at the same time Suitable for real-time video scenarios.
附图说明Description of drawings
图1为本申请实施例提供的一种内容自适应视频编码方法的流程图;FIG. 1 is a flowchart of a content-adaptive video coding method provided by an embodiment of the present application;
图2为本申请实施例提供的一种基于一次编码结果进行二次编码的方法的流程图;FIG. 2 is a flow chart of a method for performing secondary encoding based on a primary encoding result provided in an embodiment of the present application;
图3为本申请实施例提供的另一种内容自适应视频编码方法的流程图;FIG. 3 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application;
图4为本申请实施例提供的另一种内容自适应视频编码方法的流程图;FIG. 4 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application;
图5为本申请实施例提供的一种内容自适应视频编码装置的结构框图;FIG. 5 is a structural block diagram of a content-adaptive video coding device provided by an embodiment of the present application;
图6为本申请实施例提供的一种内容自适应视频编码设备的结构示意图。FIG. 6 is a schematic structural diagram of a content-adaptive video coding device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请实施例作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请实施例,而非对本申请实施例的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请实施例相关的部分而非全部结构。The embodiments of the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the embodiments of the present application, but not to limit the embodiments of the present application. In addition, it should be noted that, for the convenience of description, only a part but not all structures related to the embodiment of the present application are shown in the drawings.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application can be practiced in sequences other than those illustrated or described herein, and that references to "first," "second," etc. distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.
图1为本申请实施例提供的一种内容自适应视频编码方法的流程图,可应用于对视频数据进行编码,该方法可以计算设备如笔记本、台式机、智能手机、服务器和平板电脑等来执行,具体包括如下步骤:Fig. 1 is a flow chart of a content adaptive video coding method provided by the embodiment of the present application, which can be applied to video data coding, and the method can be used by computing devices such as notebooks, desktops, smart phones, servers and tablet computers, etc. Execute, specifically include the following steps:
步骤S101、获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合。Step S101. Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.
其中,该待编码的视频数据包括录制的视频数据以及实时产生的需要被传输和展示的视频数据,如直播视频数据。Wherein, the video data to be encoded includes recorded video data and real-time generated video data that needs to be transmitted and displayed, such as live video data.
在一个实施例中,在进行视频数据的编码时,针对一段视频数据,首先将视频数据划分为多个包含连续帧图像的图像集合。即在进行视频编码时,针对每个细分的图像集合进行单独的视频编码。示例性的,可以是将视频数据划分为连续的多个GOP(Group of pictures,一组图片),每个GOP表征一个编码视频流中的一组连续的画面。如每个GOP包含15帧或20帧图像,即将待编码的视频数据划分为连续的多个图像集合,每个图像集合包含15到20帧图像,即采用GOP为编码单位的方式进行视频数据的编码。In one embodiment, when encoding video data, for a piece of video data, the video data is first divided into multiple image sets including continuous frame images. That is, when video coding is performed, separate video coding is performed for each subdivided image set. Exemplarily, the video data may be divided into multiple consecutive GOPs (Group of pictures, a group of pictures), and each GOP represents a group of continuous pictures in a coded video stream. For example, each GOP contains 15 or 20 frames of images, that is, the video data to be encoded is divided into multiple consecutive image sets, and each image set contains 15 to 20 frames of images, that is, the encoding of video data is performed using GOP as the encoding unit. coding.
步骤S102、确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数。Step S102: Determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into the pre-trained machine learning model to output code rate control parameters.
在一个实施例中,确定图像集合的编码特征的方式可以采用预编码实现得到图像集合的编码特征。如对图像集合使用编码器进行编码以得到对应的编码特征。In an embodiment, the manner of determining the encoding feature of the image set may be achieved by using pre-encoding to obtain the encoding feature of the image set. For example, an encoder is used to encode an image set to obtain corresponding encoding features.
在一个实施例中,通过对图像集合中每帧图像进行特征提取和分析以得到图像集合的编码特征。可选的,该编码特征包括用于描述图像集合中每帧图像的运动矢量特征、失真程度参数、复杂度参数等。其中,运动矢量特征用以表征图像的变化程度,其中各帧图像之间变化越剧烈,运动矢量相对越大,反之如果各帧图像描述的为静止画面,则运动矢量相对越小;失真程度参数用以表征图像的失真程度,图像失真程度越大,该参数值越高,反之如果图像失真程度低,则相应的该参数值也相对较低;复杂度参数用以表征图像的复杂程度,如图像包含多个不同的对象,每个对象之间的像素差别越大其复杂度也越高。可选的,上述编码特征的识别确定可通过现有的编码器模块以及图像处理算法等实现。In one embodiment, the encoding features of the image set are obtained by performing feature extraction and analysis on each frame of image in the image set. Optionally, the encoding features include motion vector features, distortion degree parameters, complexity parameters, etc. used to describe each frame of images in the image set. Among them, the motion vector feature is used to characterize the degree of change of the image, and the more severe the changes between the frames of images, the larger the motion vector is. On the contrary, if each frame of image describes a still picture, the motion vector is relatively smaller; the distortion degree parameter It is used to represent the degree of distortion of the image. The greater the degree of image distortion, the higher the value of this parameter. On the contrary, if the degree of image distortion is low, the corresponding value of this parameter is relatively low. The complexity parameter is used to characterize the complexity of the image, such as An image contains many different objects, and the greater the pixel difference between each object, the higher the complexity. Optionally, the identification and determination of the above-mentioned encoding features can be realized through existing encoder modules and image processing algorithms.
其中,视频画面评价参数是用于表征图像质量的综合评价指标。可选的,该视频画面评价参数可以通过VMAF(Video Multimethod Assessment Fusion,视频多方法评估融合)表征。其中,VMAF是Netflix提出的一种结合人类视觉建模和机器学习的客观评价指标。VMAF使用大量的主观数据作为训练集,通过机器学习的手段将不同评估维度的算法进行融合,是目前比较主流的一种客观评 价指标。通常可以认为VMAF分越高,视频质量越好,但是从人眼感知来看,同一个视频当VMAF分提高到一定阈值时,人眼察觉不到画质改善,因此可以为不同的视频设计不同的VMAF值以实现不改变视频主观质量的前提下节省编码比特率。Wherein, the video picture evaluation parameter is a comprehensive evaluation index used to characterize the image quality. Optionally, the video frame evaluation parameters may be represented by VMAF (Video Multimethod Assessment Fusion, video multimethod assessment fusion). Among them, VMAF is an objective evaluation index proposed by Netflix that combines human visual modeling and machine learning. VMAF uses a large amount of subjective data as a training set, and integrates algorithms of different evaluation dimensions by means of machine learning, which is currently a relatively mainstream objective evaluation index. Generally, it can be considered that the higher the VMAF score, the better the video quality. However, from the perspective of human perception, when the VMAF score of the same video is increased to a certain threshold, the human eye cannot perceive the improvement in image quality. Therefore, different video designs can be used for different videos. The VMAF value is used to save the encoding bit rate without changing the subjective quality of the video.
在一个实施例中,将确定的图像集合的编码特征和设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数,其中,该设置的视频画面评价参数可根据不同的画面质量要求、不同的播放设备等进行定制化的设定,也可对设置的值进行调整。输入的机器学习模型为预先训练完成的神经网络模型,其基于图像集合的编码特征和设置的视频画面评价参数输入可输出对应的码率控制参数。可选的,该码率控制参数可以是CRF(Constant Rate Factor,恒定码率系数)或CQF(Constant Quality Factor,恒定质量因子)。CRF是码率控制的一种,CRF值越小越其带来更高的视频质量,随之文件体积也会变大;CRF值越大,视频压缩率越高,但视频质量越低。可选的,不同的CRF值对应不同的码率,可以采用映射表的方式记录不同的CRF值和对应的编码码率,或者采用函数曲线的方式表征CRF和码率之间的关系。In one embodiment, the determined encoding features of the image set and the set video picture evaluation parameters are input to the pre-trained machine learning model to output the code rate control parameters, wherein the set video picture evaluation parameters can be based on different picture quality Customized settings can be made according to requirements, different playback devices, etc., and the set values can also be adjusted. The input machine learning model is a pre-trained neural network model, which can output corresponding bit rate control parameters based on the encoding characteristics of the image set and the set video picture evaluation parameter input. Optionally, the rate control parameter may be CRF (Constant Rate Factor, constant rate factor) or CQF (Constant Quality Factor, constant quality factor). CRF is a kind of bit rate control. The smaller the CRF value, the higher the video quality, and the larger the file size. The larger the CRF value, the higher the video compression rate, but the lower the video quality. Optionally, different CRF values correspond to different code rates. A mapping table may be used to record different CRF values and corresponding code rates, or a function curve may be used to characterize the relationship between CRF and code rates.
步骤S103、根据所述编码特征和所述码率控制参数对所述图像集合进行编码。Step S103, encoding the image set according to the encoding feature and the rate control parameter.
在一个实施例中,通过机器学习模型得到码率控制参数后,基于该码率控制参数以及步骤S101中确定的编码特征对图像集合进行最终的二次编码,以输出码流数据。In one embodiment, after the code rate control parameters are obtained through the machine learning model, the image set is finally re-encoded based on the code rate control parameters and the coding features determined in step S101 to output code stream data.
在一实施例中,图2为本申请实施例提供的一种基于一次编码结果进行二次编码的方法的流程图,如图2所示,具体包括:In an embodiment, FIG. 2 is a flow chart of a method for performing secondary encoding based on the primary encoding result provided in the embodiment of the present application, as shown in FIG. 2 , specifically including:
步骤S1031、根据所述编码特征确定帧类型信息和场景信息。Step S1031. Determine frame type information and scene information according to the encoding feature.
其中,编码特征记录有每个帧的帧类型,如I帧、P帧和B帧划分的不同帧类型。其中,不同的帧类型信息由于其参考引用关系的不同需要不同质量的编码压缩。其中,I帧表示关键帧,为完整保留的一帧画面,解码时只需要该帧数据即可完成图像解码不需要参考其它帧图像;P帧表示该帧跟之前的一个关键帧或P帧的差别,解码时需要用之前缓存的画面叠加上本帧定义的差别,生成最终画面;B帧表示双向差别帧,即B帧记录的是本帧与前后帧的差别进行B帧图像的解码时,不仅要取得之前的缓存画面,还要解码之后的画面,通过前后画面的与本帧数据的叠加取得最终的画面。Wherein, the encoding feature records the frame type of each frame, such as different frame types divided by I frame, P frame and B frame. Wherein, different frame type information requires encoding and compression with different qualities due to their different reference relations. Among them, the I frame represents a key frame, which is a frame that is completely reserved. When decoding, only the frame data is needed to complete image decoding without referring to other frame images; Difference, when decoding, you need to superimpose the difference defined in this frame with the previously cached picture to generate the final picture; B frame means a two-way difference frame, that is, the B frame records the difference between the current frame and the preceding and following frames when decoding the B frame image, Not only to obtain the previous cached picture, but also the decoded picture, and obtain the final picture by superimposing the front and back pictures with the current frame data.
其中,场景信息示例性的可划分为运动场景和静止场景。其可通过集成的场景判别模块根据编码特征进行确定。其中,编码特征记录有每帧图像的运动矢量、运动补偿等和运动位移变化相关的图像特征,通过对运动矢量、运动补偿等数据的分析以确定该图像的场景信息。Wherein, the scene information can be divided into, for example, a moving scene and a static scene. It can be determined from encoded features by an integrated scene discrimination module. Among them, the encoding feature records image features related to motion displacement changes such as the motion vector and motion compensation of each frame of the image, and the scene information of the image is determined by analyzing the data such as the motion vector and motion compensation.
步骤S1032、根据所述帧类型信息、所述场景信息和所述码率控制参数进行预测分析得到编码参数。Step S1032, perform prediction analysis according to the frame type information, the scene information and the code rate control parameters to obtain coding parameters.
其中,该编码参数以HEVC(High Efficiency Video Coding,高效率视频编码)为例,其对应量化参数QP(quantization parameter,量化参数)。其中,量化参数QP为量化步长Qstep的序号,对于亮度编码而言,量化步长Qstep共有52个值,QP取值0~51,对于色度编码而言,QP的取值0~39。Wherein, the encoding parameter takes HEVC (High Efficiency Video Coding, high-efficiency video coding) as an example, which corresponds to a quantization parameter QP (quantization parameter, quantization parameter). Wherein, the quantization parameter QP is the sequence number of the quantization step Qstep. For luma coding, the quantization step Qstep has 52 values in total, and the value of QP is 0-51. For chroma coding, the value of QP is 0-39.
该编码参数以量化参数QP为例,反映了空间细节压缩情况。编码参数值越小,量化越精细,图像质量越高,产生的码流也越长;如量化参数QP值较小时,图像中的大部分的细节会被保留,量化参数QP值增大时,图像中的一些细节相应丢失,码率降低。以上述QP取值0~51为例,QP取最小值0时,表示量化最精细;相反,QP取最大值51时,表示量化是最粗糙的。量化是在不降低视觉效果的前提下减少图像编码长度,减少视觉恢复中不必要的信息。The encoding parameter takes the quantization parameter QP as an example, which reflects the compression of spatial details. The smaller the encoding parameter value, the finer the quantization, the higher the image quality, and the longer the generated code stream; if the quantization parameter QP value is small, most of the details in the image will be preserved, and when the quantization parameter QP value increases, Some details in the image are correspondingly lost, and the bit rate is reduced. Taking the above-mentioned QP values of 0 to 51 as an example, when the QP takes the minimum value of 0, it means the quantization is the finest; on the contrary, when the QP takes the maximum value of 51, it means the quantization is the roughest. Quantization is to reduce the length of image coding and reduce unnecessary information in visual restoration without reducing the visual effect.
在一实施例中,基于帧类型信息、场景信息和码率控制参数进行预测分析得到编码参数的过程,以HEVC高效率视频编码为例,可使用其集成的编码器模块实现。即不同的帧类型信息(I帧、B帧、P帧)、场景信息(静态场景、动态场景)、码率控制参数(CRF)共同确定得到最终的编码参数(帧级QP)。示例性的,帧类型为关键帧、场景信息为动态场景、码率控制参数的值越高,确定得到的帧级QP值越低。In one embodiment, the process of obtaining encoding parameters by predictive analysis based on frame type information, scene information, and rate control parameters, taking HEVC high-efficiency video encoding as an example, can be implemented using its integrated encoder module. That is, different frame type information (I frame, B frame, P frame), scene information (static scene, dynamic scene), and code rate control parameters (CRF) are jointly determined to obtain the final coding parameters (frame-level QP). Exemplarily, the frame type is a key frame, the scene information is a dynamic scene, and the higher the value of the rate control parameter is, the lower the determined frame-level QP value is obtained.
步骤S1033、基于所述编码参数对所述图像集合进行编码。Step S1033, encode the image set based on the encoding parameters.
在一个实施例中,再得到编码参数后,以HEVC高效率视频编码中的帧级QP参数为例,进行HEVC高效率视频编码,以实现码流输出。In one embodiment, after obtaining the coding parameters, HEVC high-efficiency video coding is performed by taking the frame-level QP parameters in HEVC high-efficiency video coding as an example, so as to realize code stream output.
在另一个实施例中,为了提高二次编码的准确性,上述进行预测分析得到编码参数,以及基于编码参数对所图像集合进行编码的过程包括:进行预测分析得到第一编码参数;基于第一编码参数、编码返馈信息、缓存信息、帧类型信息、场景信息确定第二编码参数;根据第一编码参数对量化偏移参数进行调整;根据第二编码参数和调整后的量化偏移参数对图像集合进行编码,以输出码流数据。其中,以HEVC编码为例,该第一编码参数可理解为基础的QP信 息(base QP),其根据第一编码参数、编码返馈信息、缓存信息、帧类型信息、场景信息确定帧级QP信息。其中,缓存信息表征了进行视频编码过程中的缓冲存储器的参数,缓存占用越大,相应的QP值越大以减少视频编码的运算量和存储量。编码返馈信息可以是预编码过程中得到的或者为前一轮此图像集合或视频进行编码后返馈的信息,如失真程度,如果失真程度越高相应的需要降低QP值,以提高编码质量。在根据第一编码参数确定第二编码参数的同时,根据该第一编码参数进一步对量化偏移参数进行调整。该量化偏移参数以HEVC视频编码为例,可以通过Cutree强度表征,其表示根据当前块被参考的程度进行的量化偏移调整。在一实施例中,当前块如果被参考,则进一步确定当前块之后的一定数量的块是否对当前的块进行参考,如果被后续图像块参考的越多,则表征当前块属于缓慢变化的场景,则相应的将QP值调低以提高画质质量。最终利用确定出的第二编码参数以及确定的量化偏移参数综合的进行图像集合编码,以输出码流数据,保证编码效果在图像质量和压缩率间的最优平衡。In another embodiment, in order to improve the accuracy of secondary encoding, the above process of performing predictive analysis to obtain encoding parameters, and encoding the set of images based on the encoding parameters includes: performing predictive analysis to obtain the first encoding parameters; Coding parameters, coding feedback information, cache information, frame type information, and scene information determine the second coding parameter; adjust the quantization offset parameter according to the first coding parameter; adjust the quantization offset parameter according to the second coding parameter and the adjusted quantization offset parameter A collection of images is encoded to output codestream data. Among them, taking HEVC encoding as an example, the first encoding parameter can be understood as the base QP information (base QP), which determines the frame-level QP according to the first encoding parameter, encoding feedback information, cache information, frame type information, and scene information information. Wherein, the cache information represents the parameters of the buffer memory in the process of video encoding, and the larger the cache occupation, the larger the corresponding QP value, so as to reduce the calculation amount and storage amount of video encoding. The encoding feedback information can be obtained during the pre-encoding process or the information fed back after encoding the previous round of this image set or video, such as the degree of distortion. If the degree of distortion is higher, the corresponding QP value needs to be reduced to improve the encoding quality. . While determining the second encoding parameter according to the first encoding parameter, the quantization offset parameter is further adjusted according to the first encoding parameter. Taking HEVC video coding as an example, the quantization offset parameter can be represented by the intensity of the cutree, which indicates the quantization offset adjustment performed according to the degree to which the current block is referenced. In one embodiment, if the current block is referenced, it is further determined whether a certain number of blocks after the current block refer to the current block, and if more are referenced by subsequent image blocks, it indicates that the current block belongs to a slowly changing scene , then correspondingly lower the QP value to improve the image quality. Finally, the set of images is comprehensively encoded by using the determined second encoding parameters and the determined quantization offset parameters to output code stream data, so as to ensure the optimal balance of encoding effect between image quality and compression rate.
由上述方案可知,在进行视频编码时,首先将视频划分为图像集合,对其进行一次编码得到编码特征后,利用训练好的机器学习模型输出准确的码率控制参数,再基于码率控制参数以及一次编码过程中得到的编码特征对图像集合进行二次编码以最终得到视频编码结果,该种方式利用两次编码和机器学习的视频直播内容自适应编码技术,通过利用HEVC的多次编码和机器学习的预测模型,根据视频内容的复杂度动态调整编码配置,实现内容自适应编码,更好地平衡视频流畅度和清晰度,其可应用于实时的直播视频场景中,视频编码效果良好。It can be seen from the above scheme that when performing video encoding, the video is first divided into image sets, and after encoding them once to obtain the encoding features, the trained machine learning model is used to output accurate bit rate control parameters, and then based on the bit rate control parameters And the encoding features obtained in the first encoding process are used to encode the image set twice to finally obtain the video encoding result. This method uses the video live content adaptive encoding technology of twice encoding and machine learning, and uses HEVC's multiple encoding and The predictive model of machine learning dynamically adjusts the encoding configuration according to the complexity of the video content, realizes content adaptive encoding, and better balances video fluency and clarity. It can be applied to real-time live video scenes, and the video encoding effect is good.
图3为本申请实施例提供的另一种内容自适应视频编码方法的流程图,给出了一种确定图像集合编码特征的方法,如图3所示,具体包括:Fig. 3 is a flow chart of another content-adaptive video coding method provided by the embodiment of the present application, which provides a method for determining the coding characteristics of an image set, as shown in Fig. 3 , specifically including:
步骤S201、获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合。Step S201. Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.
步骤S202、获取所述图像集合中预设数量的帧图像,对所述预设数量的帧图像进行编码得到编码特征,将所述编码特征确定为所述图像集合的编码特征。Step S202. Obtain a preset number of frame images in the image set, encode the preset number of frame images to obtain encoding features, and determine the encoding features as the encoding features of the image set.
在一个实施例中,以图像集合为GOP图像为例,该预设数量的帧图像可以是一个GOP图像中的miniGOP图像,即以图像集合为15帧的GOP图像为例,该预设数量的帧图像可以是其中的5帧图像。其中,对该预设数量的帧图像进行编码的过程可以是通过编码器进行预编码以得到编码特征。再将预设数量的 帧图像的编码特征确定为图像集合的编码特征。In one embodiment, taking the image set as a GOP image as an example, the preset number of frame images may be miniGOP images in a GOP image, that is, taking a GOP image with an image set of 15 frames as an example, the preset number of frame images The frame images may be 5 frame images among them. Wherein, the process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain encoding features. Then determine the coding features of the preset number of frame images as the coding features of the image set.
步骤S203、将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数。Step S203 , input the coding features and the set evaluation parameters of the video picture into the pre-trained machine learning model to output the code rate control parameters.
步骤S204、根据所述编码特征和所述码率控制参数对所述图像集合进行编码。Step S204, encoding the image set according to the encoding feature and the rate control parameter.
由上述方案可知,在进行视频编码过程中采用两次编码和机器学习的视频直播内容自适应编码技术,根据视频内容的复杂度动态调整编码配置,其中,通过获取图像集合中预设数量的帧图像,对预设数量的帧图像进行编码得到编码特征,将编码特征确定为所述图像集合的编码特征,可以显著提升编码速度,对于实时性要求的视频编码效果突出,同时减少了数据计算量,实现内容自适应编码,更好地平衡视频流畅度和清晰度,其可应用于实时的直播视频场景中,视频编码效果良好。It can be seen from the above scheme that in the process of video encoding, the video live content adaptive encoding technology of two encodings and machine learning is adopted, and the encoding configuration is dynamically adjusted according to the complexity of the video content. Among them, by obtaining the preset number of frames in the image set Image, encode a preset number of frame images to obtain encoding features, and determine the encoding features as the encoding features of the image set, which can significantly increase the encoding speed, and the effect of video encoding for real-time requirements is outstanding, while reducing the amount of data calculation , realize content adaptive coding, and better balance video fluency and clarity. It can be applied to real-time live video scenes, and the video coding effect is good.
图4为本申请实施例提供的另一种内容自适应视频编码方法的流程图,给出了一种在一实施例中通过机器学习模型输出码率控制参数的方法,其中,机器学习模型包括第一训练模型和第二训练模型组成的联合模型,如图4所示,具体包括:Fig. 4 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application, which shows a method for outputting bit rate control parameters through a machine learning model in an embodiment, wherein the machine learning model includes The joint model formed by the first training model and the second training model, as shown in Figure 4, specifically includes:
步骤S301、获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合。Step S301. Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.
步骤S302、确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型,得到所述第一训练模型输出的第一码率控制参数,以及所述第二训练模型输出的第二码率控制参数。Step S302: Determine the encoding features of the image set, input the encoding features and the set video frame evaluation parameters into the first training model and the second training model respectively, and obtain the first training model output by the first training model A code rate control parameter, and a second code rate control parameter output by the second training model.
在一个实施例中,该第一训练模型是XGBoost模型,第二训练模型是LightGBM模型,二者均为基于决策树的机器学习算法。示例性的,第一训练模型输出的第一码率控制参数记为CRF1,第二训练模型输出的第二码率控制参数记为CRF2。In one embodiment, the first training model is an XGBoost model, and the second training model is a LightGBM model, both of which are decision tree-based machine learning algorithms. Exemplarily, the first rate control parameter output by the first training model is denoted as CRF1, and the second rate control parameter output by the second training model is denoted as CRF2.
步骤S303、对所述第一码率控制参数和所述第二码率控制参数进行加权平均计算得到码率控制参数。Step S303, performing weighted average calculation on the first rate control parameter and the second rate control parameter to obtain a rate control parameter.
其中,最终计算得到的码率控制参数记为CRF3,可选的,其通过公式CRF3=λ 1*CRF1+λ 2*CRF2计算得到。其中,λ 12=1,λ 1∈[0,1],λ 2∈[0,1]。 Wherein, the finally calculated code rate control parameter is denoted as CRF3, optionally, it is calculated by the formula CRF3=λ 1 *CRF1+λ 2 *CRF2. Wherein, λ 12 =1, λ 1 ∈[0,1], λ 2 ∈[0,1].
步骤S304、根据所述编码特征和所述码率控制参数对所述图像集合进行编 码。Step S304, encode the image set according to the encoding feature and the rate control parameter.
由上述可知,在通过机器学习模型输出码率控制参数时,采用两个不同的基于决策树的模型输出相应的码率控制参数后进行加权平均得到最终的码率控制参数,使得得到的码率控制参数的准确度更高,最终视频编码的效果更优。It can be seen from the above that when outputting the code rate control parameters through the machine learning model, two different decision tree-based models are used to output the corresponding code rate control parameters and then weighted and averaged to obtain the final code rate control parameters, so that the obtained code rate The accuracy of the control parameters is higher, and the final video encoding effect is better.
在一个实施例中,在将编码特征以及设置的视频画面评价参数分别输入第一训练模型和第二训练模型之前,还包括:获取不同场景类型以及对应不同分辨率的视频样本数据;将视频样本数据划分为训练集样本、测试集样本和验证集样本,并分别输入至第一训练模型和所述第二训练模型进行训练。在进行模型训练过程中,本方案首先对视频画面进行场景类型的区分,如区分为动态场景和静态场景,同时基于不同的分辨率视频画面作为样本数据分别进行训练,训练过程中将视频样本数据划分为训练集样本、测试集样本和验证集样本以得到预测效果良好的最终的训练模型。In one embodiment, before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, it also includes: acquiring video sample data of different scene types and corresponding to different resolutions; The data is divided into training set samples, test set samples and verification set samples, which are respectively input to the first training model and the second training model for training. In the process of model training, this solution first distinguishes the scene types of video images, such as distinguishing them into dynamic scenes and static scenes, and at the same time performs training based on different resolution video images as sample data. During the training process, the video sample data Divide into training set samples, test set samples and validation set samples to get the final training model with good prediction effect.
图5为本申请实施例提供的一种内容自适应视频编码装置的结构框图,该装置配置为执行上述实施例提供的内容自适应视频编码方法,具备执行方法相应的功能模块和有益效果。如图5所示,该装置具体包括:图像集合确定模块101、码率参数确定模块102和编码模块103,其中,Fig. 5 is a structural block diagram of a content-adaptive video coding device provided in an embodiment of the present application. The device is configured to execute the content-adaptive video coding method provided in the above embodiment, and has corresponding functional modules and beneficial effects for executing the method. As shown in Figure 5, the device specifically includes: an image set determination module 101, a code rate parameter determination module 102 and an encoding module 103, wherein,
图像集合确定模块101,配置为获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;The image set determination module 101 is configured to acquire video data to be encoded, and divide the video data into a plurality of image sets comprising continuous frame images;
码率参数确定模块102,配置为确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;The code rate parameter determination module 102 is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;
编码模块103,配置为根据所述编码特征和所述码率控制参数对所述图像集合进行编码。The coding module 103 is configured to code the image set according to the coding feature and the code rate control parameter.
由上述方案可知,在进行视频编码时,首先将视频划分为图像集合,对其进行一次编码得到编码特征后,利用训练好的机器学习模型输出准确的码率控制参数,再基于码率控制参数以及一次编码过程中得到的编码特征对图像集合进行二次编码以最终得到视频编码结果,该种方式利用两次编码和机器学习的视频直播内容自适应编码技术,通过利用HEVC的多次编码和机器学习的预测模型,根据视频内容的复杂度动态调整编码配置,实现内容自适应编码,更好地平衡视频流畅度和清晰度,其可应配置为实时的直播视频场景中,视频编码效果良好。It can be seen from the above scheme that when performing video encoding, the video is first divided into image sets, and after encoding them once to obtain the encoding features, the trained machine learning model is used to output accurate bit rate control parameters, and then based on the bit rate control parameters And the encoding features obtained in the first encoding process are used to encode the image set twice to finally obtain the video encoding result. This method uses the video live content adaptive encoding technology of twice encoding and machine learning, and uses HEVC's multiple encoding and The predictive model of machine learning dynamically adjusts the encoding configuration according to the complexity of the video content, realizes content adaptive encoding, and better balances video fluency and clarity. It can be configured as a real-time live video scene, and the video encoding effect is good .
在一个可能的实施例中,所述码率参数确定模块102具体配置为:In a possible embodiment, the code rate parameter determination module 102 is specifically configured as:
获取所述图像集合中预设数量的帧图像;Acquiring a preset number of frame images in the image collection;
对所述预设数量的帧图像进行编码得到编码特征,将所述编码特征确定为所述图像集合的编码特征。Encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.
在一个可能的实施例中,所述机器学习模型包括第一训练模型和第二训练模型组成的联合模型,所述码率参数确定模块102具体配置为:In a possible embodiment, the machine learning model includes a joint model composed of a first training model and a second training model, and the code rate parameter determination module 102 is specifically configured as:
将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型,得到所述第一训练模型输出的第一码率控制参数,以及所述第二训练模型输出的第二码率控制参数;Input the encoding features and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain the first bit rate control parameters output by the first training model, and the second training model The second rate control parameter output by the model;
对所述第一码率控制参数和所述第二码率控制参数进行加权平均计算得到码率控制参数。A code rate control parameter is obtained by performing weighted average calculation on the first code rate control parameter and the second code rate control parameter.
在一个可能的实施例中,所述码率参数确定模块102还配置为:In a possible embodiment, the code rate parameter determining module 102 is further configured to:
在将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型之前,获取不同场景类型以及对应不同分辨率的视频样本数据;Before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, obtain video sample data of different scene types and corresponding to different resolutions;
将所述视频样本数据划分为训练集样本、测试集样本和验证集样本,并分别输入至所述第一训练模型和所述第二训练模型进行训练。The video sample data is divided into training set samples, test set samples and verification set samples, and are respectively input to the first training model and the second training model for training.
在一个可能的实施例中,所述编码模块103具体配置为:In a possible embodiment, the encoding module 103 is specifically configured as:
根据所述编码特征确定帧类型信息和场景信息;determining frame type information and scene information according to the encoding feature;
根据所述帧类型信息、所述场景信息和所述码率控制参数进行预测分析得到编码参数;performing prediction analysis according to the frame type information, the scene information and the rate control parameters to obtain encoding parameters;
基于所述编码参数对所述图像集合进行编码。The set of images is encoded based on the encoding parameters.
在一个可能的实施例中,所述编码模块103具体配置为:In a possible embodiment, the encoding module 103 is specifically configured as:
进行预测分析得到第一编码参数;performing predictive analysis to obtain a first encoding parameter;
基于所述第一编码参数、编码返馈信息、缓存信息、所述帧类型信息、所述场景信息确定第二编码参数。Determine a second encoding parameter based on the first encoding parameter, encoding feedback information, cache information, the frame type information, and the scene information.
在一个可能的实施例中,所述编码模块103具体配置为:In a possible embodiment, the encoding module 103 is specifically configured as:
根据所述第一编码参数对量化偏移参数进行调整;adjusting a quantization offset parameter according to the first coding parameter;
根据所述第二编码参数和调整后的量化偏移参数对所述图像集合进行编码,以输出码流数据。Encode the image set according to the second encoding parameter and the adjusted quantization offset parameter, so as to output code stream data.
图6为本申请实施例提供的一种内容自适应视频编码设备的结构示意图, 如图6所示,该设备包括处理器201、存储器202、输入装置203和输出装置204;设备中处理器201的数量可以是一个或多个,图6中以一个处理器201为例;设备中的处理器201、存储器202、输入装置203和输出装置204可以通过总线或其他方式连接,图6中以通过总线连接为例。存储器202作为一种计算机可读存储介质,可配置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的内容自适应视频编码方法对应的程序指令/模块。处理器201通过运行存储在存储器202中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述的内容自适应视频编码方法。输入装置203可配置为接收输入的数字或字符信息,以及产生与设备的用户设置以及功能控制有关的键信号输入。输出装置204可包括显示屏等显示设备。FIG. 6 is a schematic structural diagram of a content-adaptive video coding device provided in an embodiment of the present application. As shown in FIG. 6, the device includes a processor 201, a memory 202, an input device 203, and an output device 204; The number of can be one or more, take a processor 201 as an example in Fig. 6; The processor 201, memory 202, input device 203 and output device 204 in the equipment can be connected by bus or other ways, in Fig. 6 by Take the bus connection as an example. As a computer-readable storage medium, the memory 202 can be configured to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the content-adaptive video coding method in the embodiment of the present application. The processor 201 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 202, that is, implements the above-mentioned content adaptive video coding method. The input device 203 can be configured to receive input numbers or character information, and generate key signal input related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.
本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时配置为执行一种上述实施例描述的内容自适应视频编码方法,具体包括:The embodiment of the present application also provides a storage medium containing computer-executable instructions, and the computer-executable instructions are configured to execute a content-adaptive video coding method described in the above-mentioned embodiments when executed by a computer processor, specifically including:
获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;Obtaining video data to be encoded, and dividing the video data into a plurality of image sets comprising continuous frame images;
确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;Determining the coding features of the image collection, inputting the coding features and the set video picture evaluation parameters to the pre-trained machine learning model to output code rate control parameters;
根据所述编码特征和所述码率控制参数对所述图像集合进行编码。Encoding the set of images according to the encoding features and the rate control parameters.
值得注意的是,上述内容自适应视频编码装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请实施例的保护范围。It should be noted that, in the above-mentioned embodiment of the content-adaptive video coding device, each unit and module included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; In addition, the specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the embodiments of the present application.
在一些可能的实施方式中,本申请提供的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在计算机设备上运行时,所述程序代码配置为使所述计算机设备执行本说明书上述描述的根据本申请各种示例性实施方式的方法中的步骤,例如,所述计算机设备可以执行本申请实施例所记载的内容自适应视频编码方法。所述程序产品可以采用一个或多个可读介质的任意组合实现。In some possible implementations, various aspects of the method provided in this application can also be implemented in the form of a program product, which includes program code, and when the program product is run on a computer device, the program code is configured to The computer device is made to execute the steps in the methods described above in this specification according to various exemplary implementations of the present application. For example, the computer device may execute the content adaptive video coding method described in the embodiments of the present application. The program product can be implemented using any combination of one or more readable media.

Claims (10)

  1. 内容自适应视频编码方法,其中,包括:A content-adaptive video coding method, including:
    获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;Obtaining video data to be encoded, and dividing the video data into a plurality of image sets comprising continuous frame images;
    确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;Determining the coding features of the image collection, inputting the coding features and the set video picture evaluation parameters to the pre-trained machine learning model to output code rate control parameters;
    根据所述编码特征和所述码率控制参数对所述图像集合进行编码。Encoding the set of images according to the encoding features and the rate control parameters.
  2. 根据权利要求1所述的内容自适应视频编码方法,其中,所述确定所述图像集合的编码特征,包括:The content-adaptive video coding method according to claim 1, wherein said determining the coding characteristics of said set of images comprises:
    获取所述图像集合中预设数量的帧图像;Acquiring a preset number of frame images in the image collection;
    对所述预设数量的帧图像进行编码得到编码特征,将所述编码特征确定为所述图像集合的编码特征。Encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.
  3. 根据权利要求1或2所述的内容自适应视频编码方法,其中,所述机器学习模型包括第一训练模型和第二训练模型组成的联合模型,所述将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数,包括:The content-adaptive video coding method according to claim 1 or 2, wherein the machine learning model includes a joint model composed of a first training model and a second training model, and the coding feature and the set video picture The evaluation parameters are input to the pre-trained machine learning model to output the rate control parameters, including:
    将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型,得到所述第一训练模型输出的第一码率控制参数,以及所述第二训练模型输出的第二码率控制参数;Input the encoding features and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain the first bit rate control parameters output by the first training model, and the second training model The second rate control parameter output by the model;
    对所述第一码率控制参数和所述第二码率控制参数进行加权平均计算得到码率控制参数。A code rate control parameter is obtained by performing weighted average calculation on the first code rate control parameter and the second code rate control parameter.
  4. 根据权利要求3所述的内容自适应视频编码方法,其中,在将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型之前,还包括:The content-adaptive video coding method according to claim 3, wherein, before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, further comprising:
    获取不同场景类型以及对应不同分辨率的视频样本数据;Obtain video sample data corresponding to different scene types and different resolutions;
    将所述视频样本数据划分为训练集样本、测试集样本和验证集样本,并分别输入至所述第一训练模型和所述第二训练模型进行训练。The video sample data is divided into training set samples, test set samples and verification set samples, and are respectively input to the first training model and the second training model for training.
  5. 根据权利要求1-4中任一项所述的内容自适应视频编码方法,其中,所述根据所述编码特征和所述码率控制参数对所述图像集合进行编码,包括:The content-adaptive video coding method according to any one of claims 1-4, wherein said coding said set of images according to said coding features and said code rate control parameters comprises:
    根据所述编码特征确定帧类型信息和场景信息;determining frame type information and scene information according to the encoding feature;
    根据所述帧类型信息、所述场景信息和所述码率控制参数进行预测分析得到编码参数;performing prediction analysis according to the frame type information, the scene information and the rate control parameters to obtain encoding parameters;
    基于所述编码参数对所述图像集合进行编码。The set of images is encoded based on the encoding parameters.
  6. 根据权利要求5所述的内容自适应视频编码方法,其中,所述进行预测分析得到编码参数,包括:The content-adaptive video coding method according to claim 5, wherein said performing predictive analysis to obtain coding parameters includes:
    进行预测分析得到第一编码参数;performing predictive analysis to obtain a first encoding parameter;
    基于所述第一编码参数、编码返馈信息、缓存信息、所述帧类型信息、所述场景信息确定第二编码参数。Determine a second encoding parameter based on the first encoding parameter, encoding feedback information, cache information, the frame type information, and the scene information.
  7. 根据权利要求5或6所述的内容自适应视频编码方法,其中,所述基于所述编码参数对所述图像集合进行编码,包括:The content-adaptive video coding method according to claim 5 or 6, wherein said coding said set of images based on said coding parameters comprises:
    根据所述第一编码参数对量化偏移参数进行调整;adjusting a quantization offset parameter according to the first coding parameter;
    根据所述第二编码参数和调整后的量化偏移参数对所述图像集合进行编码,以输出码流数据。Encode the image set according to the second encoding parameter and the adjusted quantization offset parameter, so as to output code stream data.
  8. 内容自适应视频编码装置,其中,包括:A content adaptive video encoding device, including:
    图像集合确定模块,配置为获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;An image set determination module configured to acquire video data to be encoded, and divide the video data into multiple image sets comprising continuous frame images;
    码率参数确定模块,配置为确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;The code rate parameter determination module is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;
    编码模块,配置为根据所述编码特征和所述码率控制参数对所述图像集合进行编码。An encoding module configured to encode the set of images according to the encoding feature and the rate control parameter.
  9. 一种内容自适应视频编码设备,所述设备包括:一个或多个处理器;存储装置,配置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一项所述的内容自适应视频编码方法。A content adaptive video encoding device, the device comprising: one or more processors; a storage device configured to store one or more programs, when the one or more programs are executed by the one or more processors Executing, so that the one or more processors implement the content adaptive video coding method according to any one of claims 1-7.
  10. 一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时配置为执行如权利要求1-7中任一项所述的内容自适应视频编码方法。A storage medium storing computer-executable instructions configured to execute the content-adaptive video coding method according to any one of claims 1-7 when executed by a computer processor.
PCT/CN2023/070555 2022-01-14 2023-01-04 Content adaptive video coding method and apparatus, device and storage medium WO2023134523A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210043241.9 2022-01-14
CN202210043241.9A CN114554211A (en) 2022-01-14 2022-01-14 Content adaptive video coding method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023134523A1 true WO2023134523A1 (en) 2023-07-20

Family

ID=81671210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/070555 WO2023134523A1 (en) 2022-01-14 2023-01-04 Content adaptive video coding method and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN114554211A (en)
WO (1) WO2023134523A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114554211A (en) * 2022-01-14 2022-05-27 百果园技术(新加坡)有限公司 Content adaptive video coding method, device, equipment and storage medium
CN116320429B (en) * 2023-04-12 2024-02-02 瀚博半导体(上海)有限公司 Video encoding method, apparatus, computer device, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083473A (en) * 2019-12-28 2020-04-28 杭州当虹科技股份有限公司 Content self-adaptive video coding method based on machine learning
CN112383777A (en) * 2020-09-28 2021-02-19 北京达佳互联信息技术有限公司 Video coding method and device, electronic equipment and storage medium
US20210168408A1 (en) * 2018-08-14 2021-06-03 Huawei Technologies Co., Ltd. Machine-Learning-Based Adaptation of Coding Parameters for Video Encoding Using Motion and Object Detection
CN114554211A (en) * 2022-01-14 2022-05-27 百果园技术(新加坡)有限公司 Content adaptive video coding method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210168408A1 (en) * 2018-08-14 2021-06-03 Huawei Technologies Co., Ltd. Machine-Learning-Based Adaptation of Coding Parameters for Video Encoding Using Motion and Object Detection
CN111083473A (en) * 2019-12-28 2020-04-28 杭州当虹科技股份有限公司 Content self-adaptive video coding method based on machine learning
CN112383777A (en) * 2020-09-28 2021-02-19 北京达佳互联信息技术有限公司 Video coding method and device, electronic equipment and storage medium
CN114554211A (en) * 2022-01-14 2022-05-27 百果园技术(新加坡)有限公司 Content adaptive video coding method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114554211A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
WO2023134523A1 (en) Content adaptive video coding method and apparatus, device and storage medium
TWI743919B (en) Video processing apparatus and processing method of video stream
WO2021068598A1 (en) Encoding method and device for screen sharing, and storage medium and electronic equipment
US8718145B1 (en) Relative quality score for video transcoding
WO2018234860A1 (en) Real-time screen sharing
WO2021129007A1 (en) Method and device for determining video bitrate, computer apparatus, and storage medium
WO2023016155A1 (en) Image processing method and apparatus, medium, and electronic device
US20140254688A1 (en) Perceptual Quality Of Content In Video Collaboration
CN116440501B (en) Self-adaptive cloud game video picture rendering method and system
JP2022500901A (en) Data processing methods, devices, and computer programs to be coded
WO2022000298A1 (en) Reinforcement learning based rate control
CN111385577B (en) Video transcoding method, device, computer equipment and computer readable storage medium
CN112437301B (en) Code rate control method and device for visual analysis, storage medium and terminal
WO2024017106A1 (en) Code table updating method, apparatus, and device, and storage medium
CN116471262A (en) Video quality evaluation method, apparatus, device, storage medium, and program product
CN110740316A (en) Data coding method and device
TWI749676B (en) Image quality assessment apparatus and image quality assessment method thereof
Li et al. Perceptual quality assessment of face video compression: A benchmark and an effective method
Huang et al. Semantic video adaptation using a preprocessing method for mobile environment
JP2018514133A (en) Data processing method and apparatus
US10848772B2 (en) Histogram-based edge/text detection
WO2024051299A1 (en) Encoding method and apparatus, and decoding method and apparatus
CN112073724B (en) Video information processing method and device, electronic equipment and storage medium
US11272185B2 (en) Hierarchical measurement of spatial activity for text/edge detection
CN114430501B (en) Content adaptive coding method and system for file transcoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23739867

Country of ref document: EP

Kind code of ref document: A1