WO2024124911A1 - 视频编码方法、装置、电子设备及存储介质 - Google Patents

视频编码方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024124911A1
WO2024124911A1 PCT/CN2023/109379 CN2023109379W WO2024124911A1 WO 2024124911 A1 WO2024124911 A1 WO 2024124911A1 CN 2023109379 W CN2023109379 W CN 2023109379W WO 2024124911 A1 WO2024124911 A1 WO 2024124911A1
Authority
WO
WIPO (PCT)
Prior art keywords
bit rate
video
frame
training
video image
Prior art date
Application number
PCT/CN2023/109379
Other languages
English (en)
French (fr)
Inventor
曲建峰
Original Assignee
书行科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 书行科技(北京)有限公司 filed Critical 书行科技(北京)有限公司
Publication of WO2024124911A1 publication Critical patent/WO2024124911A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Definitions

  • the present application relates to the field of video encoding technology, and in particular to a video encoding method, device, electronic device and storage medium.
  • bit rate control is a method of controlling the size of the video file and the quality of the video image by determining how many bits are allocated to each frame of the image. It generally includes constant bit factor (CRF), constant bit rate (CBR), variable bit rate (VBR), constant quantization parameter (CQP), etc.
  • CCF constant bit factor
  • CBR constant bit rate
  • VBR variable bit rate
  • CQP constant quantization parameter
  • CRF bitrate control method
  • CRF constant image quality, and variable bitrate (the number of bits transmitted per unit time during data transmission).
  • CRF mainly selects image quality parameters and image resolution (also known as selected original code points) based on the average bitrate and average image quality of the video to encode the video. Because the smaller the bitrate, the smaller the amount of video data, and the more convenient it is to transmit, but in the case of multiple resolutions, the image resolution of the original code point is not necessarily the optimal bitrate under the video quality indicated by the image quality parameter. Therefore, how to determine the optimal bitrate for each frame of the video to encode the video while ensuring the image quality of the video is a problem that needs to be solved urgently.
  • the embodiments of the present application provide a video encoding method, device, electronic device and storage medium, which can determine the optimal encoding bit rate for video encoding while ensuring image quality, thereby reducing the data amount of the encoded video, which is beneficial to the storage and transmission of the video.
  • an embodiment of the present application provides a video encoding method, the method comprising:
  • the video to be encoded is encoded according to the optimal bit rate of each frame of the video image.
  • parsing each video frame in at least one video frame to determine an optimal bit rate range for each video frame includes:
  • the compressed domain information of each frame of video image is input into the second logistic regression model to obtain the second predicted bit rate, wherein the second The prediction model is trained based on a second set bit rate, where the second set bit rate is higher than an original bit rate of each frame of the video image;
  • the first predicted bit rate and the second predicted bit rate are used as endpoints of the optimal bit rate range to determine the optimal bit rate range.
  • the method further includes:
  • the first training set comprising at least one first training video and at least one first predicted bit rate, wherein an initial bit rate of each first training video in the at least one first training video is a first set bit rate, and the at least one first training video corresponds to the at least one first predicted bit rate in one-to-one correspondence;
  • the second training set comprising at least one second training video and at least one second predicted bit rate, wherein an initial bit rate of each second training video in the at least one second training video is a second set bit rate, and the at least one second training video corresponds to the at least one second predicted bit rate in one-to-one correspondence;
  • At least one regression predictor in the initial logistic regression model parsing each first training video, and obtaining at least one first training bit rate corresponding to each first training video, wherein the at least one first training bit rate corresponds to the at least one regression predictor in a one-to-one manner;
  • At least one regression predictor to parse each second training video to obtain at least one second training bit rate corresponding to each second training video, wherein the at least one second training bit rate corresponds to the at least one regression predictor in a one-to-one manner;
  • the initial logistic regression model is trained according to at least one second training bit rate corresponding to each second training video and the second predicted bit rate of each second training video to obtain a second logistic regression model.
  • training an initial logistic regression model according to at least one first training bit rate corresponding to each first training video and a first predicted bit rate of each first training video to obtain a first logistic regression model includes:
  • a corresponding regression predictor is trained and adjusted to obtain a first logistic regression model.
  • determining the optimal bit rate of each frame of video image according to the original bit rate of each frame of video image and the optimal bit rate range of each frame of video image includes:
  • the compression domain information of each frame of video image is input into the third logistic regression model to obtain the optimal bit rate, wherein the third prediction model is trained based on the third set bit rate, and the third set bit rate is determined by the relative position of the original bit rate of each frame of video image and the optimal bit rate range.
  • inputting the compression domain information of each frame of video image into a third logistic regression model to obtain an optimal bit rate includes:
  • At least one third regression predictor in the third logistic regression model to perform prediction processing on the compressed domain information of each frame of the video image to obtain at least one third predicted bit rate, wherein the at least one third predicted bit rate corresponds to the at least one third regression predictor in a one-to-one manner;
  • each third regression predictor in the at least one third regression predictor at least one third prediction
  • the bit rate is weighted to obtain the optimal bit rate.
  • a corresponding third logistic regression model is obtained, including:
  • the third set bit rate is determined to be the fifth set bit rate, and a third logistic regression model is obtained according to the fifth set bit rate, wherein the fifth set bit rate is less than the original bit rate of each frame of video image.
  • an embodiment of the present application provides a video encoding device, including:
  • a frame segmentation module used for performing frame segmentation processing on the video to be encoded to obtain at least one frame of video image
  • a parsing module used to parse each video frame in at least one video frame to determine an optimal bit rate range for each video frame, and determine an optimal bit rate for each video frame according to an original bit rate of each video frame and the optimal bit rate range for each video frame;
  • the encoding module is used to encode the video to be encoded according to the optimal bit rate of each frame of video image.
  • an embodiment of the present application provides an electronic device, comprising: a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device performs the method of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program enables a computer to execute the method of the first aspect.
  • an embodiment of the present application provides a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and is computer-operable to cause the computer to execute the method of the first aspect.
  • each frame image is analyzed to obtain its optimal bit rate range, and then the original bit rate is compared with the optimal bit rate range, and the optimal bit rate of each frame image is obtained according to the comparison result to perform video encoding on each frame image. Then, while ensuring the encoding quality of the video, the optimal encoding bit rate is determined as much as possible for video encoding, and then the data volume of the encoded video is reduced, which is beneficial to the storage and transmission of the video.
  • FIG1 is a schematic diagram of the hardware structure of a video encoding device provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of a flow chart of a video encoding method provided in an embodiment of the present application.
  • FIG. 3 is a flow chart of a method for determining an optimal bit rate range for each frame of video image through compression domain information of each frame of video image provided by an embodiment of the present application;
  • FIG4 is a block diagram of functional modules of a video encoding device provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • the video encoding device 100 includes at least one processor 101 , a communication line 102 , a memory 103 and at least one communication interface 104 .
  • the processor 101 can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication link 102 may include a path to transmit information between the above components.
  • the communication interface 104 can be any transceiver-like device (such as an antenna, etc.) used to communicate with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (WLAN), etc.
  • transceiver-like device such as an antenna, etc.
  • WLAN wireless local area networks
  • the memory 103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited to these.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.
  • magnetic disk storage medium or other magnetic storage device or any other
  • the memory 103 can exist independently and be connected to the processor 101 via the communication line 102.
  • the memory 103 can also be integrated with the processor 101.
  • the memory 103 provided in the embodiment of the present application can generally be non-volatile.
  • the memory 103 is used to store computer-executable instructions for executing the scheme of the present application, and the execution is controlled by the processor 101.
  • the processor 101 is used to execute the computer-executable instructions stored in the memory 103, thereby implementing the method provided in the following embodiment of the present application.
  • the computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.
  • the processor 101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 1 .
  • the video encoding apparatus 100 may include multiple processors, such as the processor 101 and the processor 107 in FIG. 1 . Each of these processors may be a single-core (single-CPU) processor or a plurality of processors. A multi-core (multi-CPU) processor.
  • a processor may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the video encoding device 100 is a server, for example, it can be an independent server, or it can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
  • the video encoding device 100 may also include an output device 105 and an input device 106.
  • the output device 105 communicates with the processor 101 and can display information in a variety of ways.
  • the output device 105 can be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector.
  • the input device 106 communicates with the processor 101 and can receive user input in a variety of ways.
  • the input device 106 can be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the video encoding device 100 may be a general device or a dedicated device.
  • the embodiment of the present application does not limit the type of the video encoding device 100.
  • FIG. 2 is a flow chart of a video encoding method provided in an embodiment of the present application.
  • the video encoding method includes the following steps:
  • the optimal bit rate range of each frame of video image can be determined by the compression domain information of each frame of video image.
  • the method includes:
  • the compressed domain information is the information that is necessarily generated when the video is decoded, and may include: mean, variance, entropy, skewness, etc. of the temple id level. Therefore, in this embodiment, when decoding the video, the compressed domain information generated in the decoding process can be directly obtained to match each frame of the video image, and then the compressed domain information of each frame of the video image can be obtained.
  • the first prediction model is trained based on a first set bit rate, and the first set bit rate is lower than the original bit rate of each frame of the video image.
  • the video encoding method under multiple resolutions will be described below using 1080p, 720p, and 540p as examples.
  • the original bit rate of the video to be encoded is set to 720p.
  • the first set bit rate is 540p, which is lower than the source bit rate of 720p, and then the first prediction model is trained based on the bit rate of 540p.
  • a first training set can be obtained.
  • the first training set includes at least one first training video and at least one first prediction bit rate, wherein the initial bit rate of each first training video in the at least one first training video is 540p, and at least one first training video corresponds to at least one first prediction bit rate. That is, the first training set is composed of at least one first training video of 540p, and each first training video has a corresponding first prediction bit rate, and the first prediction bit rate can be the optimal bit rate of the first training video.
  • At least one regression predictor in the initial logistic regression model is called to parse each first training video to obtain at least one first training bit rate corresponding to each first training video, wherein at least one first training bit rate corresponds to at least one regression predictor.
  • At least one first training bit rate corresponding to each first training video and the first training bit rate of each first training video A first prediction bit rate is used to train the initial logistic regression model to obtain a first logistic regression model.
  • at least one bit rate residual can be determined according to at least one first training bit rate and the first prediction bit rate of each first training video, wherein the at least one bit rate residual corresponds to at least one regression predictor. Then, the corresponding regression predictor is trained and adjusted by each bit rate residual in the at least one bit rate residual to obtain the first logistic regression model.
  • a first logistic regression model can be obtained that can predict the optimal code point of a video with a code rate of 540p through the compression domain information of the video. Therefore, by inversely applying the model, the first predicted code rate can be obtained through the source code rate and compression domain information of the video to be encoded.
  • the first predicted code rate can be understood as the code rate corresponding to the inflection point from 720p to 540p, that is, the code rate corresponding to the intersection of the code rate-VMAF curve of the 720p video to be processed and the code rate-VMAF curve of 540p.
  • the first predicted code rate is the minimum value of the optimal code rate range under the original code rate, or the left endpoint.
  • the second prediction model is obtained by training based on a second set bit rate, and the second set bit rate is higher than the original bit rate of each frame of the video image.
  • the second set bit rate is 1080p, which is higher than the source bit rate 720p
  • the second prediction model is obtained by training based on the bit rate of 1080p.
  • a second training set can be obtained.
  • the second training set includes at least one second training video and at least one second prediction bit rate, wherein the initial bit rate of each second training video in the at least one second training video is 1080p, and at least one second training video corresponds to at least one second prediction bit rate. That is, the second training set is composed of at least one 1080p second training video, and each second training video has a corresponding second prediction bit rate, which can be the optimal bit rate of the second training video.
  • At least one regression predictor in the initial logistic regression model is called to parse each second training video to obtain at least one second training bit rate corresponding to each second training video, wherein at least one second training bit rate corresponds to at least one regression predictor.
  • the initial logistic regression model is trained to obtain a second logistic regression model.
  • at least one bit rate residual can be determined according to at least one second training bit rate and the second predicted bit rate of each second training video, wherein at least one bit rate residual corresponds to at least one regression predictor. Then, the corresponding regression predictor is trained and adjusted by each bit rate residual in the at least one bit rate residual to obtain a second logistic regression model.
  • a second logistic regression model can be obtained that can predict the optimal code point of a video with a code rate of 1080p through the compression domain information of the video.
  • the second predicted code rate can be obtained through the source code rate and compression domain information of the video to be encoded.
  • the second predicted code rate can be understood as the code rate corresponding to the inflection point from 720p to 1080p, that is, the code rate corresponding to the intersection of the code rate-VMAF curve of the 720p video to be processed and the code rate-VMAF curve of 1080p.
  • the second predicted code rate is the maximum value of the optimal code rate range under the original code rate, or the right endpoint.
  • the optimal bit rate range can be obtained by taking the first predicted bit rate as the left end point of the optimal bit rate range and the second predicted bit rate as the right end point of the optimal bit rate range.
  • the original bit rate of each frame of video image when the original bit rate of each frame of video image is within the range of the optimal bit rate, it means that the amount of data after video encoding at the original bit rate is less than the amount of data after video encoding at the first set bit rate and the second set bit rate. Therefore, it can be determined that the original bit rate of each frame of video image is the optimal bit rate.
  • the original bit rate of each frame of video image is outside the range of the optimal bit rate, it means that the amount of data after video encoding at the original bit rate is greater than the amount of data after video encoding at the first set bit rate and the second set bit rate.
  • the third prediction model is obtained based on the third set bit rate training, and the third set bit rate is determined by the relative position of the original bit rate of each frame of video image and the optimal bit rate range.
  • the relative position may include: the original bit rate is greater than the maximum value of the optimal bit rate range, that is, the original bit rate is located on the right side of the optimal bit rate range; and the original bit rate is less than the minimum value of the optimal bit rate range, that is, the original bit rate is located on the left side of the optimal bit rate range.
  • a fourth set bit rate greater than the original bit rate can be selected as the third set bit rate, and the fourth set bit rate can be selected from the bit rates given in the case of multiple bit rates; similarly, when the original bit rate of each frame of video image is less than the minimum value of the optimal bit rate range, a fifth set bit rate less than the original bit rate can be selected as the third set bit rate, and the fifth set bit rate can also be selected from the bit rates given in the case of multiple bit rates.
  • the fourth set bit rate can be equal to the second set bit rate in step 303, and the fifth set bit rate can be equal to the first set bit rate in step 302.
  • At least one third regression predictor in the third logistic regression model can be called to perform prediction processing on the compression domain information of each frame of the video image to obtain at least one third predicted bit rate, wherein at least one third predicted bit rate corresponds to at least one third regression predictor. Then, according to the regression weight of each third regression predictor in at least one third regression predictor, weighted processing can be performed on at least one third predicted bit rate to obtain the optimal bit rate.
  • the training method of the third logistic regression model is similar to the training method of the first logistic regression model in step 302 and the second logistic regression model in step 303, and will not be repeated here.
  • the optimal bit rate range of each frame image is determined according to the compression domain information of each frame image.
  • the original bit rate of each frame image is within the range, it means that the original bit rate is the optimal bit rate for video encoding under the video quality; when the original bit rate of each frame image is not within the range, it means that there is a coding bit rate that is better than the source bit rate.
  • the optimal bit rate of each frame image can be re-obtained according to the compression domain information of each frame image, and each frame image can be encoded using this rate. In this way, while ensuring the encoding quality of the video, the optimal encoding bit rate can be determined as much as possible for video encoding, thereby reducing the amount of data of the encoded video, which is beneficial to the storage and transmission of the video.
  • the video encoding device 400 includes:
  • a frame segmentation module 401 used for performing frame segmentation processing on the video to be encoded to obtain at least one frame of video image
  • the parsing module 402 is used to parse each video frame of at least one video frame to determine an optimal bit rate range for each video frame, and determine an optimal bit rate for each video frame according to an original bit rate of each video frame and an optimal bit rate range for each video frame;
  • the encoding module 403 is used to encode the video to be encoded according to the optimal bit rate of each frame of the video image.
  • the parsing module 402 in parsing each video frame in at least one video frame to determine the optimal bit rate range of each video frame, is specifically used to:
  • the first predicted bit rate and the second predicted bit rate are used as endpoints of the optimal bit rate range to determine the optimal bit rate range.
  • the video encoding apparatus 400 may further include a training module, which is specifically used to:
  • the first training set comprising at least one first training video and at least one first predicted bit rate, wherein an initial bit rate of each first training video in the at least one first training video is a first set bit rate, and the at least one first training video corresponds to the at least one first predicted bit rate in one-to-one correspondence;
  • the second training set comprising at least one second training video and at least one second predicted bit rate, wherein an initial bit rate of each second training video in the at least one second training video is a second set bit rate, and the at least one second training video corresponds to the at least one second predicted bit rate in one-to-one correspondence;
  • At least one regression predictor in the initial logistic regression model parsing each first training video, and obtaining at least one first training bit rate corresponding to each first training video, wherein the at least one first training bit rate corresponds to the at least one regression predictor in a one-to-one manner;
  • At least one regression predictor to parse each second training video to obtain at least one second training bit rate corresponding to each second training video, wherein the at least one second training bit rate corresponds to the at least one regression predictor in a one-to-one manner;
  • the initial logistic regression model is trained according to at least one second training bit rate corresponding to each second training video and the second predicted bit rate of each second training video to obtain a second logistic regression model.
  • the training module in terms of training the initial logistic regression model according to at least one first training bit rate corresponding to each first training video and the first predicted bit rate of each first training video to obtain the first logistic regression model, is specifically used to:
  • a corresponding regression predictor is trained and adjusted to obtain a first logistic regression model.
  • the parsing module 402 in determining the optimal bit rate of each frame of video image according to the original bit rate of each frame of video image and the optimal bit rate range of each frame of video image, is specifically used to:
  • the compression domain information of each frame of video image is input into the third logistic regression model to obtain the optimal bit rate, wherein the third prediction model is trained based on the third set bit rate, and the third set bit rate is determined by the relative position of the original bit rate of each frame of video image and the optimal bit rate range.
  • the compressed domain information of each frame of video image is input into the third logistic regression model to obtain
  • the analysis module 402 is specifically used for:
  • At least one third regression predictor in the third logistic regression model to perform prediction processing on the compressed domain information of each frame of the video image to obtain at least one third predicted bit rate, wherein the at least one third predicted bit rate corresponds to the at least one third regression predictor in a one-to-one manner;
  • weighted processing is performed on the at least one third predicted bit rate to obtain an optimal bit rate.
  • the parsing module 402 is specifically used to:
  • the third set bit rate is determined to be the fifth set bit rate, and a third logistic regression model is obtained according to the fifth set bit rate, wherein the fifth set bit rate is less than the original bit rate of each frame of video image.
  • the electronic device 500 includes a transceiver 501, a processor 502, and a memory 503. They are connected via a bus 504.
  • the memory 503 is used to store computer programs and data, and can transmit the data stored in the memory 503 to the processor 502.
  • the processor 502 is used to read the computer program in the memory 503 and perform the following operations:
  • the video to be encoded is encoded according to the optimal bit rate of each frame of the video image.
  • the processor 502 in parsing each video frame in at least one video frame to determine an optimal bit rate range for each video frame, is specifically configured to perform the following operations:
  • the first predicted bit rate and the second predicted bit rate are used as endpoints of the optimal bit rate range to determine the optimal bit rate range.
  • the processor 502 is further configured to perform the following operations:
  • the first training set comprising at least one first training video and at least one first predicted bit rate, wherein an initial bit rate of each first training video in the at least one first training video is a first set bit rate, and the at least one first training video corresponds to the at least one first predicted bit rate in one-to-one correspondence;
  • the second training set comprising at least one second training video and at least one second predicted bit rate, wherein an initial bit rate of each second training video in the at least one second training video is a second set bit rate, and the at least one second training video corresponds to the at least one second predicted bit rate in one-to-one correspondence;
  • At least one regression predictor in the initial logistic regression model is called to parse each first training video to obtain at least one first training bit rate corresponding to each first training video, wherein at least one first training bit rate is consistent with at least One regression predictor corresponds one to one;
  • At least one regression predictor to parse each second training video to obtain at least one second training bit rate corresponding to each second training video, wherein the at least one second training bit rate corresponds to the at least one regression predictor in a one-to-one manner;
  • the initial logistic regression model is trained according to at least one second training bit rate corresponding to each second training video and the second predicted bit rate of each second training video to obtain a second logistic regression model.
  • the processor 502 in terms of training the initial logistic regression model according to at least one first training bit rate corresponding to each first training video and the first predicted bit rate of each first training video to obtain the first logistic regression model, the processor 502 is specifically configured to perform the following operations:
  • a corresponding regression predictor is trained and adjusted to obtain a first logistic regression model.
  • the processor 502 in determining the optimal bit rate of each frame of video image according to the original bit rate of each frame of video image and the optimal bit rate range of each frame of video image, the processor 502 is specifically configured to perform the following operations:
  • the compression domain information of each frame of video image is input into the third logistic regression model to obtain the optimal bit rate, wherein the third prediction model is trained based on the third set bit rate, and the third set bit rate is determined by the relative position of the original bit rate of each frame of video image and the optimal bit rate range.
  • the processor 502 in terms of inputting the compression domain information of each frame of video image into the third logistic regression model to obtain the optimal bit rate, the processor 502 is specifically configured to perform the following operations:
  • At least one third regression predictor in the third logistic regression model to perform prediction processing on the compressed domain information of each frame of the video image to obtain at least one third predicted bit rate, wherein the at least one third predicted bit rate corresponds to the at least one third regression predictor in a one-to-one manner;
  • weighted processing is performed on the at least one third predicted bit rate to obtain an optimal bit rate.
  • the processor 502 in terms of obtaining the corresponding third logistic regression model according to the relative position between the original bit rate of each frame of the video image and the optimal bit rate range, the processor 502 is specifically configured to perform the following operations:
  • the third set bit rate is determined to be the fifth set bit rate, and a third logistic regression model is obtained according to the fifth set bit rate, wherein the fifth set bit rate is less than the original bit rate of each frame of video image.
  • the video encoding device in the present application may include a smart phone (such as an Android phone, an iOS phone, a Windows Phone, etc.), tablet computers, PDAs, laptop computers, mobile Internet devices MID (Mobile Internet Devices, MID for short), robots or wearable devices, etc.
  • a smart phone such as an Android phone, an iOS phone, a Windows Phone, etc.
  • tablet computers PDAs, laptop computers
  • mobile Internet devices MID Mobile Internet Devices, MID for short
  • robots or wearable devices etc.
  • the above video encoding devices are only examples, not exhaustive, and include but are not limited to the above video encoding devices. In practical applications, the above video encoding devices may also include: intelligent vehicle terminals, computer equipment, etc.
  • the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement some or all of the steps of any one of the video encoding methods described in the above method implementation.
  • the storage medium may include a hard disk, a floppy disk, an optical disk, a tape, a disk, a USB flash drive, a flash memory, etc.
  • the present application also provides a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute part or all of the steps of any one of the video encoding methods described in the above method implementation.
  • the disclosed device can be implemented in other ways.
  • the device implementation described above is only schematic, such as the division of the units, which is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, and the indirect coupling or communication connection of devices or units can be electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of a software program module.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer-readable memory.
  • the technical solution of the present application can essentially or partly be embodied in the form of a software product that contributes to the prior art.
  • the computer software product is stored in a memory and includes several instructions for enabling a computer device (which can be a personal computer, server or network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned memory includes: U disk, read-only memory (ROM), random access memory (RAM), Random Access Memory), mobile hard disk, disk or CD-ROM and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供一种视频编码方法、装置、电子设备及存储介质,其中,方法包括:对待编码视频进行分帧处理,得到至少一帧视频图像;对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围;根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率;根据每帧视频图像的最优码率,对待编码视频进行编码处理。使用本方法,可以保证图像质量的情况的同时,确定最优的编码码率进行视频编码,继而减小编码后视频的数据量,有利于视频的存储和传输。

Description

视频编码方法、装置、电子设备及存储介质
本申请要求于2022年12月16日提交中国专利局、申请号为202211625113.1、申请名称为“视频编码方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频编码技术领域,具体涉及一种视频编码方法、装置、电子设备及存储介质。
背景技术
在视频编码过程中,会对视频的分辨率配合码率控制进行相应的设置,具体而言,码率控制是一种通过决定为每一帧图像分配多少比特数,以控制视频文件的大小和视频图像质量的方法,一般可以包括固定比特因子(Constant Rate Factor,CRF)、固定比特率(Constant Bit Rate,CBR)、动态比特率(Variable Bit Rate,VBR)、固定量化参数(Constant Quantization Parameter,CQP)等。
目前,在短视频和直播领域中,使用较多的码率控制方式为CRF,即恒定图像质量,码率(数据传输时单位时间传送的比特数)可变。CRF主要是根据视频的平均码率和平均图像质量,选定图像质量参数和图像分辨率(也可称作选定原码点),以对视频进行编码。由于在保证视频质量的情况下,码率越小视频的数据量越小,也就越方便传输,但在多分辨率的情况下,原码点的图像分辨率在图像质量参数指示的视频质量下对应的码率不一定是最优的码率。因此,如何在保证视频的图像质量的情况下,决策出视频中每帧图像的最优码率对视频编码,是目前亟需解决的问题。
发明内容
为了解决现有技术中存在的上述问题,本申请实施方式提供了一种视频编码方法、装置、电子设备及存储介质,可在保证图像质量的情况的同时,确定最优的编码码率进行视频编码,继而减小编码后视频的数据量,有利于视频的存储和传输。
第一方面,本申请的实施方式提供了一种视频编码方法,该方法包括:
对待编码视频进行分帧处理,得到至少一帧视频图像;
对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围;
根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率;
根据每帧视频图像的最优码率,对待编码视频进行编码处理。
在一种可能的实施方式中,对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围,包括:
对每帧视频图像进行解析,得到每帧视频图像的压缩域信息;
将每帧视频图像的压缩域信息输入第一逻辑回归模型,得到第一预测码率,其中,第一预测模型是基于第一设定码率训练得到的,第一设定码率低于每帧视频图像的原码率;
将每帧视频图像的压缩域信息输入第二逻辑回归模型,得到第二预测码率,其中,第二 预测模型是基于第二设定码率训练得到的,第二设定码率高于每帧视频图像的原码率;
将第一预测码率和第二预测码率作为最优码率范围的端点,确定最优码率范围。
在一种可能的实施方式中,方法还包括:
获取第一训练集,第一训练集包括至少一个第一训练视频和至少一个第一预测码率,其中,至少一个第一训练视频中的每个第一训练视频的初始码率为第一设定码率,至少一个第一训练视频与至少一个第一预测码率一一对应;
获取第二训练集,第二训练集包括至少一个第二训练视频和至少一个第二预测码率,其中,至少一个第二训练视频中的每个第二训练视频的初始码率为第二设定码率,至少一个第二训练视频与至少一个第二预测码率一一对应;
调用初始逻辑回归模型中的至少一个回归预测器,对每个第一训练视频进行解析处理,得到每个第一训练视频对应的至少一个第一训练码率,其中,至少一个第一训练码率与至少一个回归预测器一一对应;
根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型;
调用至少一个回归预测器,对每个第二训练视频进行解析处理,得到每个第二训练视频对应的至少一个第二训练码率,其中,至少一个第二训练码率与至少一个回归预测器一一对应;
根据每个第二训练视频对应的至少一个第二训练码率,和每个第二训练视频的第二预测码率,对初始逻辑回归模型进行训练,得到第二逻辑回归模型。
在一种可能的实施方式中,根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型,包括:
根据至少一个第一训练码率和每个第一训练视频的第一预测码率,确定至少一个码率残差,其中,至少一个码率残差与至少一个回归预测器一一对应;
通过至少一个码率残差中的每个码率残差,对对应的回归预测器进行训练调整,得到第一逻辑回归模型。
在一种可能的实施方式中,根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率,包括:
当每帧视频图像的原码率处于最优码率范围的范围中时,确定每帧视频图像的原码率为最优码率;
当每帧视频图像的原码率处于最优码率范围的范围外时,根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型;
将每帧视频图像的压缩域信息输入第三逻辑回归模型,得到最优码率,其中,第三预测模型是基于第三设定码率训练得到的,第三设定码率由每帧视频图像的原码率与最优码率范围的相对位置决定。
在一种可能的实施方式中,将每帧视频图像的压缩域信息输入第三逻辑回归模型,得到最优码率,包括:
调用第三逻辑回归模型中的至少一个第三回归预测器,分别对每帧视频图像的压缩域信息进行预测处理,得到至少一个第三预测码率,其中,至少一个第三预测码率与至少一个第三回归预测器一一对应;
根据至少一个第三回归预测器中每个第三回归预测器的回归权重,对至少一个第三预测 码率进行加权处理,得到最优码率。
在一种可能的实施方式中,根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型,包括:
当每帧视频图像的原码率大于最优码率范围的最大值时,确定第三设定码率为第四设定码率,并根据第四设定码率获取第三逻辑回归模型,其中,第四设定码率大于每帧视频图像的原码率;
当每帧视频图像的原码率小于最优码率范围的最小值时,确定第三设定码率为第五设定码率,并根据第五设定码率获取第三逻辑回归模型,其中,第五设定码率小于每帧视频图像的原码率。
第二方面,本申请的实施方式提供了一种视频编码装置,包括:
分帧模块,用于对待编码视频进行分帧处理,得到至少一帧视频图像;
解析模块,用于对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围,并根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率;
编码模块,用于根据每帧视频图像的最优码率,对待编码视频进行编码处理。
第三方面,本申请实施方式提供一种电子设备,包括:处理器,处理器与存储器相连,存储器用于存储计算机程序,处理器用于执行存储器中存储的计算机程序,以使得电子设备执行如第一方面的方法。
第四方面,本申请实施方式提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序使得计算机执行如第一方面的方法。
第五方面,本申请实施方式提供一种计算机程序产品,计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,计算机可操作来使计算机执行如第一方面的方法。
实施本申请实施方式,具有如下有益效果:
在本申请实施方式中,通过对各帧图像进行解析,获取其最优码率范围,继而将原码率与该最优码率范围进行比对,并根据比对的结果获取每帧图像的最优码率对各帧图像进行视频编码。继而在保证视频的编码质量的同时,尽可能的确定最优的编码码率进行视频编码,继而减小编码后视频的数据量,有利于视频的存储和传输。
附图说明
为了更清楚地说明本申请实施方式中的技术方案,下面将对实施方式描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施方式提供的一种视频编码装置的硬件结构示意图;
图2为本申请实施方式提供的一种视频编码方法的流程示意图;
图3为本申请实施方式提供的一种通过每帧视频图像的压缩域信息确定出每帧视频图像的最优码率范围的方法的流程示意图;
图4为本申请实施方式提供的一种视频编码装置的功能模块组成框图;
图5为本申请实施方式提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施方式中的附图,对本申请实施方式中的技术方案进行清楚、完整地描述,显然,所描述的实施方式是本申请一部分实施方式,而不是全部的实施方式。基于本申请中的实施方式,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施方式,都属于本申请保护的范围。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施方式”意味着,结合实施方式描述的特定特征、结果或特性可以包含在本申请的至少一个实施方式中。在说明书中的各个位置出现该短语并不一定均是指相同的实施方式,也不是与其它实施方式互斥的独立的或备选的实施方式。本领域技术人员显式地和隐式地理解的是,本文所描述的实施方式可以与其它实施方式相结合。
参阅图1,图1为本申请实施方式提供的一种视频编码装置的硬件结构示意图。该视频编码装置100包括至少一个处理器101,通信线路102,存储器103以及至少一个通信接口104。
在本实施方式中,处理器101,可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
通信线路102,可以包括一通路,在上述组件之间传送信息。
通信接口104,可以是任何收发器一类的装置(如天线等),用于与其他设备或通信网络通信,例如以太网,RAN,无线局域网(wireless local area networks,WLAN)等。
存储器103,可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
在本实施方式中,存储器103可以独立存在,通过通信线路102与处理器101相连接。存储器103也可以和处理器101集成在一起。本申请实施方式提供的存储器103通常可以具有非易失性。其中,存储器103用于存储执行本申请方案的计算机执行指令,并由处理器101来控制执行。处理器101用于执行存储器103中存储的计算机执行指令,从而实现本申请下述实施方式中提供的方法。
在可选的实施方式中,计算机执行指令也可以称之为应用程序代码,本申请对此不作具体限定。
在可选的实施方式中,处理器101可以包括一个或多个CPU,例如图1中的CPU0和CPU1。
在可选的实施方式中,该视频编码装置100可以包括多个处理器,例如图1中的处理器101和处理器107。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是 一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在可选的实施方式中,若视频编码装置100为服务器,例如,可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。则视频编码装置100还可以包括输出设备105和输入设备106。输出设备105和处理器101通信,可以以多种方式来显示信息。例如,输出设备105可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备106和处理器101通信,可以以多种方式接收用户的输入。例如,输入设备106可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的视频编码装置100可以是一个通用设备或者是一个专用设备。本申请实施方式不限定视频编码装置100的类型。
以下,将对本申请所公开的一种视频编码方法进行说明:
参阅图2,图2为本申请实施方式提供的一种视频编码方法的流程示意图。该视频编码方法包括以下步骤:
201:对待编码视频进行分帧处理,得到至少一帧视频图像。
202:对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围。
在本实施方式中,可以通过每帧视频图像的压缩域信息确定出每帧视频图像的最优码率范围。示例性的,如图3所示,该方法包括:
301:对每帧视频图像进行解析,得到每帧视频图像的压缩域信息。
压缩域信息为对视频进行解码处理时,必然产出的信息,可以包括:temple id层级的均值、方差、entropy、偏度等。由此,在本实施方式中,可以在对视频进行解码时,直接获取解码过程中产生的压缩域信息与各帧视频图像进行匹配,继而得到每帧视频图像的压缩域信息。
302:将每帧视频图像的压缩域信息输入第一逻辑回归模型,得到第一预测码率。
在本实施方式中,第一预测模型是基于第一设定码率训练得到的,第一设定码率低于每帧视频图像的原码率。为了便于理解,以下将以1080p、720p、540p为例,对多分辨率情况下的视频编码方法进行说明,在该示例下,待编码视频的原码率设定为720p。
在上述示例的情况下,该第一设定码率即为低于源码率720p的540p,继而该第一预测模型便是基于540p的码率进行训练得到的。具体而言,首先,可以获取第一训练集。该第一训练集包括至少一个第一训练视频和至少一个第一预测码率,其中,至少一个第一训练视频中的每个第一训练视频的初始码率为540p,至少一个第一训练视频与至少一个第一预测码率一一对应。即,第一训练集是由至少一个540p的第一训练视频组成的,每个第一训练视频都有一个对应的第一预测码率,该第一预测码率可以是该第一训练视频的最优码率。
然后,调用初始逻辑回归模型中的至少一个回归预测器,对每个第一训练视频进行解析处理,得到每个第一训练视频对应的至少一个第一训练码率,其中,至少一个第一训练码率与至少一个回归预测器一一对应。
最后,根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第 一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型。具体而言,可以根据至少一个第一训练码率和每个第一训练视频的第一预测码率,确定至少一个码率残差,其中,至少一个码率残差与至少一个回归预测器一一对应。再通过至少一个码率残差中的每个码率残差,对对应的回归预测器进行训练调整,得到第一逻辑回归模型。
由此,通过第一训练集进行训练后,即可得到一个可以通过视频的压缩域信息预测出码率为540p的视频的最优码点的第一逻辑回归模型。由此,通过对该模型的逆运用,即可通过待编码视频的源码率和压缩域信息,得到第一预测码率。该第一预测码率可以理解为720p到540p的拐点对应的码率,即该待处理视频720p的码率-VMAF曲线与540p的码率-VMAF曲线的交点所对应的码率。通过码率-VMAF曲线可知,在该拐点后,维持同样的视频质量时,使用720p作为视频编码的码率所得到编码后视频的数据量将优于使用540p作为视频编码的码率所得到编码后视频的数据量。由此,该第一预测码率即为原码率下最优码率范围的最小值,或者说左端点。
303:将每帧视频图像的压缩域信息输入第二逻辑回归模型,得到第二预测码率。
在本实施方式中,第二预测模型是基于第二设定码率训练得到的,第二设定码率高于每帧视频图像的原码率。沿用步骤302中的示例,该第二设定码率即为高于源码率720p的1080p,继而该第二预测模型便是基于1080p的码率进行训练得到的。具体而言,首先,可以获取第二训练集。该第二训练集包括至少一个第二训练视频和至少一个第二预测码率,其中,至少一个第二训练视频中的每个第二训练视频的初始码率为1080p,至少一个第二训练视频与至少一个第二预测码率一一对应。即,第二训练集是由至少一个1080p的第二训练视频组成的,每个第二训练视频都有一个对应的第二预测码率,该第二预测码率可以是该第二训练视频的最优码率。
然后,调用初始逻辑回归模型中的至少一个回归预测器,对每个第二训练视频进行解析处理,得到每个第二训练视频对应的至少一个第二训练码率,其中,至少一个第二训练码率与至少一个回归预测器一一对应。
最后,根据每个第二训练视频对应的至少一个第二训练码率,和每个第二训练视频的第二预测码率,对初始逻辑回归模型进行训练,得到第二逻辑回归模型。具体而言,可以根据至少一个第二训练码率和每个第二训练视频的第二预测码率,确定至少一个码率残差,其中,至少一个码率残差与至少一个回归预测器一一对应。再通过至少一个码率残差中的每个码率残差,对对应的回归预测器进行训练调整,得到第二逻辑回归模型。
由此,通过第二训练集进行训练后,即可得到一个可以通过视频的压缩域信息预测出码率为1080p的视频的最优码点的第二逻辑回归模型。由此,通过对该模型的逆运用,即可通过待编码视频的源码率和压缩域信息,得到第二预测码率。该第二预测码率可以理解为720p到1080p的拐点对应的码率,即该待处理视频720p的码率-VMAF曲线与1080p的码率-VMAF曲线的交点所对应的码率。通过码率-VMAF曲线可知,在该拐点后,维持同样的视频质量时,使用1080p作为视频编码的码率所得到编码后视频的数据量将优于使用720p作为视频编码的码率所得到编码后视频的数据量。由此,该第二预测码率即为原码率下最优码率范围的最大值,或者说右端点。
304:将第一预测码率和第二预测码率作为最优码率范围的端点,确定最优码率范围。
根据步骤302和步骤303中所述,将第一预测码率作为最优码率范围的左端点,第二预测码率作为最优码率范围的右端点,即可得到该最优码率范围。
203:根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的 最优码率。
在本实施方式中,当每帧视频图像的原码率处于最优码率范围的范围中时,说明在该原码率下视频编码后的数据量,是小于在第一设定码率和第二设定码率下视频编码后的数据量的。因此,可以确定每帧视频图像的原码率为最优码率。
当每帧视频图像的原码率处于最优码率范围的范围外时,说明在该原码率下视频编码后的数据量,是大于在第一设定码率和第二设定码率下视频编码后的数据量的。此时,需要根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型,去预测对应的最优码率。具体而言,该第三预测模型是基于第三设定码率训练得到的,第三设定码率由每帧视频图像的原码率与最优码率范围的相对位置决定。
在本实施方式中,相对位置可以包括:原码率大于最优码率范围的最大值,即原码率位于最优码率范围的右侧;以及原码率小于最优码率范围的最小值,即原码率位于最优码率范围的左侧。当每帧视频图像的原码率大于最优码率范围的最大值时,可以选择大于该原码率的第四设定码率作为第三设定码率,该第四设定码率可以在多码率情况下所给出的码率中进行选择;同样的,当每帧视频图像的原码率小于最优码率范围的最小值时,可以选择小于该原码率的第五设定码率作为第三设定码率,该第五设定码率也可以在多码率情况下所给出的码率中进行选择。在可选的实施方式中,该第四设定码率可以与步骤303中的第二设定码率相等,该第五设定码率可以与步骤302中的第一设定码率相等。
在本实施方式中,确定出第三设定码率后,可以调用第三逻辑回归模型中的至少一个第三回归预测器,分别对每帧视频图像的压缩域信息进行预测处理,得到至少一个第三预测码率,其中,至少一个第三预测码率与至少一个第三回归预测器一一对应。然后,即可根据至少一个第三回归预测器中每个第三回归预测器的回归权重,对至少一个第三预测码率进行加权处理,得到最优码率。该第三逻辑回归模型的训练方法和步骤302中的第一逻辑回归模型以及步骤303中的第二逻辑回归模型的训练方法类似,在此不再赘述。
204:根据每帧视频图像的最优码率,对待编码视频进行编码处理。
综上所述,本发明所提供的视频编码方法中,通过对各帧图像进行解析,获取到各帧图像的压缩域信息,继而根据各帧图像的压缩域信息确定出各帧图像的最优码率范围。当各帧图像的原码率位于该范围内时,说明该原码率是该视频质量下对视频编码的最优码率;当各帧图像的原码率不位于该范围内时,说明还存在比源码率更优的编码码率。此时,可以根据各帧图像的压缩域信息重新求取各帧图像的最优码率,并采用该率对各帧图像进行编码。由此,可以在保证视频的编码质量的同时,尽可能的确定出最优的编码码率进行视频编码,继而减小编码后视频的数据量,有利于视频的存储和传输。
参阅图4,图4为本申请实施方式提供的一种视频编码装置的功能模块组成框图。如图4所示,该视频编码装置400包括:
分帧模块401,用于对待编码视频进行分帧处理,得到至少一帧视频图像;
解析模块402,用于对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围,并根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率;
编码模块403,用于根据每帧视频图像的最优码率,对待编码视频进行编码处理。
在本发明的实施方式中,在对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围方面,解析模块402,具体用于:
对每帧视频图像进行解析,得到每帧视频图像的压缩域信息;
将每帧视频图像的压缩域信息输入第一逻辑回归模型,得到第一预测码率,其中,第一预测模型是基于第一设定码率训练得到的,第一设定码率低于每帧视频图像的原码率;
将每帧视频图像的压缩域信息输入第二逻辑回归模型,得到第二预测码率,其中,第二预测模型是基于第二设定码率训练得到的,第二设定码率高于每帧视频图像的原码率;
将第一预测码率和第二预测码率作为最优码率范围的端点,确定最优码率范围。
在本发明的实施方式中,该视频编码装置400还可以包括训练模块,该训练模块具体用于:
获取第一训练集,第一训练集包括至少一个第一训练视频和至少一个第一预测码率,其中,至少一个第一训练视频中的每个第一训练视频的初始码率为第一设定码率,至少一个第一训练视频与至少一个第一预测码率一一对应;
获取第二训练集,第二训练集包括至少一个第二训练视频和至少一个第二预测码率,其中,至少一个第二训练视频中的每个第二训练视频的初始码率为第二设定码率,至少一个第二训练视频与至少一个第二预测码率一一对应;
调用初始逻辑回归模型中的至少一个回归预测器,对每个第一训练视频进行解析处理,得到每个第一训练视频对应的至少一个第一训练码率,其中,至少一个第一训练码率与至少一个回归预测器一一对应;
根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型;
调用至少一个回归预测器,对每个第二训练视频进行解析处理,得到每个第二训练视频对应的至少一个第二训练码率,其中,至少一个第二训练码率与至少一个回归预测器一一对应;
根据每个第二训练视频对应的至少一个第二训练码率,和每个第二训练视频的第二预测码率,对初始逻辑回归模型进行训练,得到第二逻辑回归模型。
在本发明的实施方式中,在根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型方面,训练模块具体用于:
根据至少一个第一训练码率和每个第一训练视频的第一预测码率,确定至少一个码率残差,其中,至少一个码率残差与至少一个回归预测器一一对应;
通过至少一个码率残差中的每个码率残差,对对应的回归预测器进行训练调整,得到第一逻辑回归模型。
在本发明的实施方式中,在根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率方面,解析模块402,具体用于:
当每帧视频图像的原码率处于最优码率范围的范围中时,确定每帧视频图像的原码率为最优码率;
当每帧视频图像的原码率处于最优码率范围的范围外时,根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型;
将每帧视频图像的压缩域信息输入第三逻辑回归模型,得到最优码率,其中,第三预测模型是基于第三设定码率训练得到的,第三设定码率由每帧视频图像的原码率与最优码率范围的相对位置决定。
在本发明的实施方式中,在将每帧视频图像的压缩域信息输入第三逻辑回归模型,得到 最优码率方面,解析模块402,具体用于:
调用第三逻辑回归模型中的至少一个第三回归预测器,分别对每帧视频图像的压缩域信息进行预测处理,得到至少一个第三预测码率,其中,至少一个第三预测码率与至少一个第三回归预测器一一对应;
根据至少一个第三回归预测器中每个第三回归预测器的回归权重,对至少一个第三预测码率进行加权处理,得到最优码率。
在本发明的实施方式中,在根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型方面,解析模块402,具体用于:
当每帧视频图像的原码率大于最优码率范围的最大值时,确定第三设定码率为第四设定码率,并根据第四设定码率获取第三逻辑回归模型,其中,第四设定码率大于每帧视频图像的原码率;
当每帧视频图像的原码率小于最优码率范围的最小值时,确定第三设定码率为第五设定码率,并根据第五设定码率获取第三逻辑回归模型,其中,第五设定码率小于每帧视频图像的原码率。
参阅图5,图5为本申请实施方式提供的一种电子设备的结构示意图。如图5所示,电子设备500包括收发器501、处理器502和存储器503。它们之间通过总线504连接。存储器503用于存储计算机程序和数据,并可以将存储器503存储的数据传输给处理器502。
处理器502用于读取存储器503中的计算机程序执行以下操作:
对待编码视频进行分帧处理,得到至少一帧视频图像;
对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围;
根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率;
根据每帧视频图像的最优码率,对待编码视频进行编码处理。
在本发明的实施方式中,在对至少一帧视频图像中的每帧视频图像进行解析,确定每帧视频图像的最优码率范围方面,处理器502,具体用于执行以下操作:
对每帧视频图像进行解析,得到每帧视频图像的压缩域信息;
将每帧视频图像的压缩域信息输入第一逻辑回归模型,得到第一预测码率,其中,第一预测模型是基于第一设定码率训练得到的,第一设定码率低于每帧视频图像的原码率;
将每帧视频图像的压缩域信息输入第二逻辑回归模型,得到第二预测码率,其中,第二预测模型是基于第二设定码率训练得到的,第二设定码率高于每帧视频图像的原码率;
将第一预测码率和第二预测码率作为最优码率范围的端点,确定最优码率范围。
在本发明的实施方式中,处理器502,还用于执行以下操作:
获取第一训练集,第一训练集包括至少一个第一训练视频和至少一个第一预测码率,其中,至少一个第一训练视频中的每个第一训练视频的初始码率为第一设定码率,至少一个第一训练视频与至少一个第一预测码率一一对应;
获取第二训练集,第二训练集包括至少一个第二训练视频和至少一个第二预测码率,其中,至少一个第二训练视频中的每个第二训练视频的初始码率为第二设定码率,至少一个第二训练视频与至少一个第二预测码率一一对应;
调用初始逻辑回归模型中的至少一个回归预测器,对每个第一训练视频进行解析处理,得到每个第一训练视频对应的至少一个第一训练码率,其中,至少一个第一训练码率与至少 一个回归预测器一一对应;
根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型;
调用至少一个回归预测器,对每个第二训练视频进行解析处理,得到每个第二训练视频对应的至少一个第二训练码率,其中,至少一个第二训练码率与至少一个回归预测器一一对应;
根据每个第二训练视频对应的至少一个第二训练码率,和每个第二训练视频的第二预测码率,对初始逻辑回归模型进行训练,得到第二逻辑回归模型。
在本发明的实施方式中,在根据每个第一训练视频对应的至少一个第一训练码率,和每个第一训练视频的第一预测码率,对初始逻辑回归模型进行训练,得到第一逻辑回归模型方面,处理器502,具体用于执行以下操作:
根据至少一个第一训练码率和每个第一训练视频的第一预测码率,确定至少一个码率残差,其中,至少一个码率残差与至少一个回归预测器一一对应;
通过至少一个码率残差中的每个码率残差,对对应的回归预测器进行训练调整,得到第一逻辑回归模型。
在本发明的实施方式中,在根据每帧视频图像的原码率和每帧视频图像的最优码率范围,确定每帧视频图像的最优码率方面,处理器502,具体用于执行以下操作:
当每帧视频图像的原码率处于最优码率范围的范围中时,确定每帧视频图像的原码率为最优码率;
当每帧视频图像的原码率处于最优码率范围的范围外时,根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型;
将每帧视频图像的压缩域信息输入第三逻辑回归模型,得到最优码率,其中,第三预测模型是基于第三设定码率训练得到的,第三设定码率由每帧视频图像的原码率与最优码率范围的相对位置决定。
在本发明的实施方式中,在将每帧视频图像的压缩域信息输入第三逻辑回归模型,得到最优码率方面,处理器502,具体用于执行以下操作:
调用第三逻辑回归模型中的至少一个第三回归预测器,分别对每帧视频图像的压缩域信息进行预测处理,得到至少一个第三预测码率,其中,至少一个第三预测码率与至少一个第三回归预测器一一对应;
根据至少一个第三回归预测器中每个第三回归预测器的回归权重,对至少一个第三预测码率进行加权处理,得到最优码率。
在本发明的实施方式中,在根据每帧视频图像的原码率与最优码率范围的相对位置,获取对应的第三逻辑回归模型方面,处理器502,具体用于执行以下操作:
当每帧视频图像的原码率大于最优码率范围的最大值时,确定第三设定码率为第四设定码率,并根据第四设定码率获取第三逻辑回归模型,其中,第四设定码率大于每帧视频图像的原码率;
当每帧视频图像的原码率小于最优码率范围的最小值时,确定第三设定码率为第五设定码率,并根据第五设定码率获取第三逻辑回归模型,其中,第五设定码率小于每帧视频图像的原码率。
应理解,本申请中的视频编码装置可以包括智能手机(如Android手机、iOS手机、Windows  Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备MID(Mobile Internet Devices,简称:MID)、机器人或穿戴式设备等。上述视频编码装置仅是举例,而非穷举,包含但不限于上述视频编码装置。在实际应用中,上述视频编码装置还可以包括:智能车载终端、计算机设备等等。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明可借助软件结合硬件平台的方式来实现。基于这样的理解,本发明的技术方案对背景技术做出贡献的全部或者部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施方式或者实施方式的某些部分所述的方法。
因此,本申请实施方式还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如上述方法实施方式中记载的任何一种视频编码方法的部分或全部步骤。例如,所述存储介质可以包括硬盘、软盘、光盘、磁带、磁盘、优盘、闪存等。
本申请实施方式还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施方式中记载的任何一种视频编码方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施方式,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施方式均属于可选的实施方式,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施方式中,对各个实施方式的描述都各有侧重,某个实施方式中没有详述的部分,可以参见其他实施方式的相关描述。
在本申请所提供的几个实施方式中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。
另外,在本申请各个实施方式中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM, Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施方式的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施方式进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施方式的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种视频编码方法,其特征在于,所述方法包括:
    对待编码视频进行分帧处理,得到至少一帧视频图像;
    对所述至少一帧视频图像中的每帧视频图像进行解析,确定所述每帧视频图像的最优码率范围;
    根据所述每帧视频图像的原码率和所述每帧视频图像的最优码率范围,确定所述每帧视频图像的最优码率;
    根据所述每帧视频图像的最优码率,对所述待编码视频进行编码处理。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述至少一帧视频图像中的每帧视频图像进行解析,确定所述每帧视频图像的最优码率范围,包括:
    对所述每帧视频图像进行解析,得到所述每帧视频图像的压缩域信息;
    将所述每帧视频图像的压缩域信息输入第一逻辑回归模型,得到第一预测码率,其中,所述第一预测模型是基于第一设定码率训练得到的,所述第一设定码率低于所述每帧视频图像的原码率;
    将所述每帧视频图像的压缩域信息输入第二逻辑回归模型,得到第二预测码率,其中,所述第二预测模型是基于第二设定码率训练得到的,所述第二设定码率高于所述每帧视频图像的原码率;
    将所述第一预测码率和所述第二预测码率作为所述最优码率范围的端点,确定所述最优码率范围。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取第一训练集,所述第一训练集包括至少一个第一训练视频和至少一个第一预测码率,其中,所述至少一个第一训练视频中的每个第一训练视频的初始码率为所述第一设定码率,所述至少一个第一训练视频与所述至少一个第一预测码率一一对应;
    获取第二训练集,所述第二训练集包括至少一个第二训练视频和至少一个第二预测码率,其中,所述至少一个第二训练视频中的每个第二训练视频的初始码率为所述第二设定码率,所述至少一个第二训练视频与所述至少一个第二预测码率一一对应;
    调用初始逻辑回归模型中的至少一个回归预测器,对所述每个第一训练视频进行解析处理,得到所述每个第一训练视频对应的至少一个第一训练码率,其中,所述至少一个第一训练码率与所述至少一个回归预测器一一对应;
    根据所述每个第一训练视频对应的至少一个第一训练码率,和所述每个第一训练预测频的第一训练码率,对所述初始逻辑回归模型进行训练,得到所述第一逻辑回归模型;
    调用所述至少一个回归预测器,对所述每个第二训练视频进行解析处理,得到所述每个第二训练视频对应的至少一个第二训练码率,其中,所述至少一个第二训练码率与所述至少一个回归预测器一一对应;
    根据所述每个第二训练视频对应的至少一个第二训练码率,和所述每个第二训练视频的第二预测码率,对所述初始逻辑回归模型进行训练,得到所述第二逻辑回归模型。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述每个第一训练视频对应的至少一个第一训练码率,和所述每个第一训练视频的第一预测码率,对所述初始逻辑回归模型 进行训练,得到所述第一逻辑回归模型,包括:
    根据所述至少一个第一训练码率和所述每个第一训练视频的第一预测码率,确定至少一个码率残差,其中,所述至少一个码率残差与所述至少一个回归预测器一一对应;
    通过所述至少一个码率残差中的每个码率残差,对对应的回归预测器进行训练调整,得到所述第一逻辑回归模型。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述每帧视频图像的原码率和所述每帧视频图像的最优码率范围,确定所述每帧视频图像的最优码率,包括:
    当所述每帧视频图像的原码率处于所述最优码率范围的范围中时,确定所述每帧视频图像的原码率为所述最优码率;
    当所述每帧视频图像的原码率处于所述最优码率范围的范围外时,根据所述每帧视频图像的原码率与所述最优码率范围的相对位置,获取对应的第三逻辑回归模型;
    将所述每帧视频图像的压缩域信息输入所述第三逻辑回归模型,得到所述最优码率,其中,所述第三预测模型是基于第三设定码率训练得到的,所述第三设定码率由所述每帧视频图像的原码率与所述最优码率范围的相对位置决定。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述每帧视频图像的压缩域信息输入所述第三逻辑回归模型,得到所述最优码率,包括:
    调用所述第三逻辑回归模型中的至少一个第三回归预测器,分别对所述每帧视频图像的压缩域信息进行预测处理,得到至少一个第三预测码率,其中,所述至少一个第三预测码率与所述至少一个第三回归预测器一一对应;
    根据所述至少一个第三回归预测器中每个第三回归预测器的回归权重,对所述至少一个第三预测码率进行加权处理,得到所述最优码率。
  7. 根据权利要求5或6所述的方法,其特征在于,所述根据所述每帧视频图像的原码率与所述最优码率范围的相对位置,获取对应的第三逻辑回归模型,包括:
    当所述每帧视频图像的原码率大于所述最优码率范围的最大值时,确定所述第三设定码率为第四设定码率,并根据所述第四设定码率获取所述第三逻辑回归模型,其中,所述第四设定码率大于所述每帧视频图像的原码率;
    当所述每帧视频图像的原码率小于所述最优码率范围的最小值时,确定所述第三设定码率为第五设定码率,并根据所述第五设定码率获取所述第三逻辑回归模型,其中,所述第五设定码率小于所述每帧视频图像的原码率。
  8. 一种视频编码装置,其特征在于,所述装置包括:
    分帧模块,用于对待编码视频进行分帧处理,得到至少一帧视频图像;
    解析模块,用于对所述至少一帧视频图像中的每帧视频图像进行解析,确定所述每帧视频图像的最优码率范围,并根据所述每帧视频图像的原码率和所述每帧视频图像的最优码率范围,确定所述每帧视频图像的最优码率;
    编码模块,用于根据所述每帧视频图像的最优码率,对所述待编码视频进行编码处理。
  9. 一种电子设备,其特征在于,包括处理器、存储器、通信接口以及一个或多个程序, 其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述一个或多个程序包括用于执行权利要求1-7任一项所述的方法中的步骤的指令。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求1-7任一项所述的方法。
PCT/CN2023/109379 2022-12-16 2023-07-26 视频编码方法、装置、电子设备及存储介质 WO2024124911A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211625113.1 2022-12-16
CN202211625113.1A CN117750018A (zh) 2022-12-16 2022-12-16 视频编码方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024124911A1 true WO2024124911A1 (zh) 2024-06-20

Family

ID=90257884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109379 WO2024124911A1 (zh) 2022-12-16 2023-07-26 视频编码方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117750018A (zh)
WO (1) WO2024124911A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170264902A1 (en) * 2016-03-09 2017-09-14 Sony Corporation System and method for video processing based on quantization parameter
CN112399176A (zh) * 2020-11-17 2021-02-23 深圳大学 一种视频编码方法、装置、计算机设备及存储介质
CN113949874A (zh) * 2021-10-18 2022-01-18 北京金山云网络技术有限公司 视频编码方法、装置和电子设备
CN114554211A (zh) * 2022-01-14 2022-05-27 百果园技术(新加坡)有限公司 内容自适应视频编码方法、装置、设备和存储介质
CN114866772A (zh) * 2022-05-23 2022-08-05 普联技术有限公司 一种编码方法、编码装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170264902A1 (en) * 2016-03-09 2017-09-14 Sony Corporation System and method for video processing based on quantization parameter
CN112399176A (zh) * 2020-11-17 2021-02-23 深圳大学 一种视频编码方法、装置、计算机设备及存储介质
CN113949874A (zh) * 2021-10-18 2022-01-18 北京金山云网络技术有限公司 视频编码方法、装置和电子设备
CN114554211A (zh) * 2022-01-14 2022-05-27 百果园技术(新加坡)有限公司 内容自适应视频编码方法、装置、设备和存储介质
CN114866772A (zh) * 2022-05-23 2022-08-05 普联技术有限公司 一种编码方法、编码装置及电子设备

Also Published As

Publication number Publication date
CN117750018A (zh) 2024-03-22

Similar Documents

Publication Publication Date Title
US10897620B2 (en) Method and apparatus for processing a video
WO2021068598A1 (zh) 共享屏幕的编码方法、装置、存储介质及电子设备
WO2022057789A1 (zh) 视频清晰度识别方法、电子设备及存储介质
CN115205925A (zh) 表情系数确定方法、装置、电子设备及存储介质
CN112104867B (zh) 一种视频处理方法、视频处理装置、智能设备及存储介质
CN112437301B (zh) 一种面向视觉分析的码率控制方法、装置、存储介质及终端
WO2022000298A1 (en) Reinforcement learning based rate control
US12069249B2 (en) Coding mode selection method and apparatus, and electronic device and computer-readable medium
CN112101543A (zh) 神经网络模型确定方法、装置、电子设备及可读存储介质
US20190335186A1 (en) Image transcoding method and apparatus
CN114245175A (zh) 视频转码方法、装置、电子设备及存储介质
WO2024124911A1 (zh) 视频编码方法、装置、电子设备及存储介质
CN113411587B (zh) 视频压缩方法、装置及计算机可读存储介质
CN115767149A (zh) 一种视频数据的传输方法和装置
CN111510715B (zh) 视频处理方法、系统、计算机设备及存储介质
CN114339252A (zh) 一种数据压缩方法及装置
WO2024124914A1 (zh) 人脸区域识别方法、装置、电子设备及存储介质
WO2024120396A1 (zh) 视频编码方法、装置、电子设备及存储介质
CN117176962B (zh) 一种视频编解码方法、装置及相关设备
US11546597B2 (en) Block-based spatial activity measures for pictures
WO2024109138A1 (zh) 视频编码方法、装置及存储介质
CN117998165A (zh) 一种录像导出方法、装置、设备及存储介质
CN117459719A (zh) 一种参考帧选择方法、装置、电子设备和存储介质
WO2023022717A1 (en) Adjustments of remote access applications based on workloads
CN116996681A (zh) 一种编码码率的确定方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23902121

Country of ref document: EP

Kind code of ref document: A1