WO2024041365A1 - 一种视频决策码率确定方法、装置、存储介质及电子装置 - Google Patents
一种视频决策码率确定方法、装置、存储介质及电子装置 Download PDFInfo
- Publication number
- WO2024041365A1 WO2024041365A1 PCT/CN2023/111567 CN2023111567W WO2024041365A1 WO 2024041365 A1 WO2024041365 A1 WO 2024041365A1 CN 2023111567 W CN2023111567 W CN 2023111567W WO 2024041365 A1 WO2024041365 A1 WO 2024041365A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- information
- code rate
- prediction
- network
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000005540 biological transmission Effects 0.000 claims abstract description 33
- 230000008447 perception Effects 0.000 claims description 24
- 238000013527 convolutional neural network Methods 0.000 claims description 22
- 239000013598 vector Substances 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 12
- 230000002123 temporal effect Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 4
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 13
- 230000003044 adaptive effect Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 208000003028 Stuttering Diseases 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
- H04N17/004—Diagnosis, testing or measuring for television systems or their details for digital television systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/164—Feedback from the receiver or from the transmission channel
- H04N19/166—Feedback from the receiver or from the transmission channel concerning the amount of transmission errors, e.g. bit error rate [BER]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/573—Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
- H04N21/4355—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen
- H04N21/4356—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen by altering the spatial resolution, e.g. to reformat additional data on a handheld device, attached to the STB
Definitions
- the present application relates to the field of video transmission, specifically, to a video decision code rate determination method, device, storage medium and electronic device.
- the code rate is usually selected on the client side, such as the adaptive bitrate (Adaptive bitrate, referred to as ABR) model, which uses throughput prediction, buffer, comprehensive utilization bandwidth prediction and cache information.
- ABR adaptive bitrate
- the code rate played by the user match the current network bandwidth as much as possible, thereby improving the user's quality of experience (Quality of Experience, QoE for short).
- FIG. 3 is a comparison chart between the set encoding bit rate and the actual encoding bit rate of different video contents in related technologies. As shown in Figure 3, HowTo is a practical knowledge short video, Game is a game video, and Lecture is a lecture video. The actual encoding bit rate There is always a difference from the set encoding bit rate, and the actual encoding bit rates of different types of videos under the same set encoding bit rate are also different.
- bit rate decisions made on this basis will cause the decision to be too aggressive when the difference is large. Video stuttering, frame loss, etc. will occur; at the same time, when the value is small, the decision-making will be too conservative, and the bandwidth will not be fully utilized. Neither situation can well guarantee the user experience. quality.
- Embodiments of the present application provide a video decision-making code rate determination method, device, storage medium and electronic device to at least solve the problem that in related technologies, there is always a certain difference between the actual coding code rate and the set coding code rate, resulting in code rate of decision-making errors.
- a video decision code rate determination method includes: obtaining video complexity information and video coding information of the transmitted video; determining a code rate prediction value based on the video complexity information and video coding information. and the confidence interval of the code rate prediction value; obtain network status information during the video transmission process; determine the decision-making code rate for video transmission based on the code rate prediction value, confidence interval and network status information.
- a video decision code rate determination device includes: a first acquisition module for acquiring video complexity information and video coding information of the transmitted video; a code rate prediction module for The code rate prediction value and the confidence interval of the code rate prediction value are determined based on the video complexity information and video coding information; the second acquisition module is used to obtain the network status information during the video transmission process; the code rate decision module is used to determine the code rate prediction value based on the code rate. Rate prediction value, confidence interval and network status information determine the decision-making bit rate for video transmission.
- a computer-readable storage medium is also provided.
- a computer program is stored in the storage medium, wherein the computer program executes the steps in any of the above method embodiments when run by a processor.
- an electronic device including a memory and a processor.
- a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments. .
- Figure 1 is a hardware structure block diagram of the video decision code rate determination method according to the embodiment of the present application.
- Figure 2 is a flow chart of a video decision code rate determination method according to an embodiment of the present application.
- Figure 3 is a comparison diagram between the set encoding bit rate and the actual encoding bit rate of different video contents in related technologies
- Figure 4 is a schematic diagram of the parameter structure of video prediction input information according to an embodiment of the present application.
- Figure 5 is a schematic structural diagram of the code rate prediction neural network in the embodiment of the present application.
- Figure 6 is a test structure diagram in the adaptive code rate decision network according to an embodiment of the present application.
- Figure 7 is a flow chart of code rate prediction and code rate decision-making according to an embodiment of the present application.
- Figure 8 is a block diagram of a video decision code rate determination device according to an embodiment of the present application.
- Figure 9 is a schematic diagram of the operating environment of a video decision code rate determination device according to another embodiment of the present application.
- Figure 1 is a hardware structure block diagram of the video decision code rate determination method according to the embodiment of the present application.
- the hardware single board can include one or more (only the a) processor 102 (the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a communication function transmission device 106 and input and output device 108.
- the structure shown in Figure 1 is only illustrative, and it does not limit the structure of the above-mentioned mobile terminal.
- the mobile terminal may also include more or fewer components than shown in FIG. 1 , or have a different configuration than shown in FIG. 1 .
- the memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the video decision code rate determination method in the embodiment of the present application.
- the processor 102 stores the data in the memory 104 by running
- the computer program in the video processor is used to execute various functional applications and video decision-making code rate determination processing, that is, to implement the above method.
- Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the memory 104 may further include memory located remotely relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
- Transmission device 106 is used to receive or send data via a network.
- Specific examples of the above-mentioned network may include wireless networks provided by communication providers.
- the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
- the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet wirelessly.
- NIC Network Interface Controller
- FIG. 2 is a flow chart of a method for determining a video decision code rate according to an embodiment of the present application. As shown in Figure 2, the process includes the following steps:
- Step S202 obtain the video complexity information and video coding information of the transmitted video
- Step S204 determine the code rate prediction value and the confidence interval of the code rate prediction value according to the video complexity information and video coding information;
- Step S206 obtain network status information during video transmission
- Step S208 Determine the decision code rate for video transmission based on the code rate prediction value, confidence interval and network status information.
- step S202 may specifically include the following steps:
- Step S2022 Obtain a preset amount of spatial domain perception information and time domain perception information, where the video complexity information includes: the spatial domain perception information and the time domain perception information;
- Step S2024 Obtain a preset number of historical bit rate differences, current set bit rates, and current I-frame information, where the video encoding information includes: the historical bit rate difference, the current set bit rate, and The current I frame information.
- step S2022 may specifically include:
- Every first preset time interval extract a video image group within the first preset time from the transmitted video
- the spatial perception information is determined according to the sampling result of the spatial down-sampling.
- Recommendation ITU-R BT.1788 provides the calculation method of time domain perception information TI and spatial domain perception information SI.
- SI is obtained by filtering the video frames through the Sobel filter and then calculating the standard deviation.
- TI is Based on the picture changes between the two frames, the difference between the pixels at the same position between the two frames is obtained.
- the original definitions of SI and TI require calculation of all pixels of all video frames in a video, which requires a lot of calculation time and memory requirements, and cannot be used in scenarios with high real-time requirements.
- this embodiment is in SI, Based on the original definition of TI, separate downsampling in time and space is performed. Extract a certain number of video frames from the video image group contained in the previous second to achieve temporal downsampling, and then extract values from them to achieve spatial downsampling.
- the preset number is 4 and the first preset time is 1/3 second, then it is necessary to obtain the SI and TI values of 4 video frames within the first 4/3 seconds, and calculate the SI and TI values It is the complexity characteristic value of a random frame within 1/3 second.
- the feature value of 4 frames requires a total of 8 frames of data to calculate.
- a video frame with a resolution of 1920*1080 can obtain a matrix of 192*108 size by downsampling, and the pixel value Using the Y component in the YUV component, eight 192*108 pictures are obtained through downsampling to calculate the SI and TI values.
- This application also supports other Downsampling method.
- step S2024 may specifically include:
- Every second preset time interval obtain the historical set bit rate and historical actual bit rate within the second preset time;
- the currently set code rate and current I frame information are obtained, where the current I frame information is used to identify whether the video frames within the second preset time period include I frames.
- the preset number is 4 and the second preset time is 1 second, it is necessary to obtain the set bitrate Set_Bitrate and the actual bitrate Real_Bitrate for each second in the first 4 seconds, and then calculate the first 4 seconds.
- the flag bit It used to identify whether the current video frame is an I frame, and judge the video frame after the current transmission video frame. If the second If the video frame within the preset time period contains an I frame, the flag is 1. If the video frame within the second preset time period does not contain an I frame, the flag is 0.
- step S204 may specifically include the following steps:
- Step S2042 combine a preset number of the video complexity information and the video coding information to obtain video prediction input information
- Step S2044 Enter the video prediction input information into a pre-trained code rate prediction neural network to obtain the code rate prediction value and the confidence interval output by the code rate prediction neural network.
- the above step S2042 may specifically include: combining a preset amount of time domain sensing information, spatial domain sensing information, historical code rate difference, current set code rate and current I frame information to obtain the video prediction input Information, wherein the video complexity information includes the spatial domain perception information and the temporal domain perception information, and the video coding information includes the historical bit rate difference, the current set bit rate and the current I Frame information, the video prediction input information is an M ⁇ N matrix, M is the type of information in the video prediction input information, and N is the preset number.
- Figure 4 is a schematic parameter structure diagram of video prediction input information according to an embodiment of the present application.
- the size of the video prediction input information State matrix is (5, 4), and the matrix contains temporal sensing information SI, spatial sensing information
- information TI information identifier
- dif current set code rate b
- current I frame information I current I frame information
- different numerical subscripts represent information obtained at different times, m-1, m-2, m-3, and m-4 respectively represent information obtained at each first preset time interval. If the first preset time is 1/3 second, you need to obtain the SI and TI values in the previous 4/3 seconds; t represents the current time, t-1, t-2, t-3, and t-4 respectively represent the second preset time in each interval For the information obtained, if the second preset time is 1 second, you need to obtain the bit rate difference within the previous 4 seconds, the I frame information and set bit rate within the previous 3 seconds, as well as the current I frame information and current settings. Code rate.
- the above-mentioned step S2044 may specifically include: inputting the video prediction input information into the prediction sub-network to obtain the code rate prediction value output by the prediction sub-network; inputting the video prediction input information
- the error sub-network obtains the confidence interval output by the error sub-network;
- the code rate prediction neural network includes a prediction sub-network and an error sub-network.
- the prediction sub-network Pre network and the error sub-network Err network in the code rate prediction neural network Pre-Err network in step S2044 share a network layer.
- FIG. 5 is a schematic structural diagram of the code rate prediction neural network in the embodiment of the present application.
- the structure of the code rate prediction neural network may specifically include: Gate Recurrent Unit (GRU), Vol. Convolutional Neural Network (CNN), connection layer (Concat), fully connected layer (Fully Connected Layers, referred to as FC), the input Input is the video prediction input information, and the output result is the confidence interval of the Pre_Bitrate code rate prediction value and the Pre_Error code rate prediction value.
- the CNN layer is a one-dimensional convolutional neural network 1D-CNN.
- the prediction sub-network includes: a GRU layer composed of two layers of gated recurrent units GRU, a first convolutional neural network CNN layer, a first connected Concat layer, and a first fully connected FC layer.
- inputting the video prediction input information into the prediction sub-network to obtain the code rate prediction value output by the prediction sub-network includes the following steps:
- Step S1 Input the video prediction input information into the GRU layer to obtain the first output result output by the GRU layer, where the video prediction input information is an M ⁇ N matrix;
- Step S2 Divide the video prediction input information into M one-dimensional vectors according to the information type M of the video prediction input information, where the length of the one-dimensional vector is N;
- Step S3 input the M one-dimensional vectors into the first CNN layer respectively to obtain the second output result output by the first CNN layer;
- Step S4 combine the first output result and the second output result through the first connection layer to obtain the first target feature vector
- Step S5 Input the first target feature vector into the first fully connected layer to obtain the code rate prediction value output by the first fully connected layer.
- step S1 performs an overall calculation on the State matrix through the GRU layer, and the connection between different types of data can be learned; step S3 performs separate convolution operations on different types of data, and can learn the characteristics of a single type of data. ; Step S4, combine the results of the GRU layer and the CNN layer in the Concat layer to obtain a vector describing the video content characteristics.
- the code rate prediction value loss function in the prediction sub-network is:
- Pre_Bitrate_Loss MSE(Pre_Bitrate, Real_Bitrate);
- Pre_Bitrate_Loss is the code rate prediction value loss function
- Pre_Bitrate is the code rate prediction value
- Real_Bitrate is the true value of the code rate prediction value.
- the error subnetwork includes: a slice layer, a second convolutional neural network CNN layer, a second connected Concat layer, and a second fully connected FC layer.
- inputting the video prediction input information into the error subnetwork to obtain the confidence interval output by the error subnetwork includes the following steps:
- Step S6 Extract the one-dimensional vector of historical bit rate differences from the video prediction input information through the slice layer;
- Step S7 Input the one-dimensional vector of the historical code rate difference into the second CNN layer to obtain the third output result output by the second CNN layer;
- Step S8 combine the third output result and the first target feature vector through the second connection layer to obtain the second target feature vector;
- Step S9 Input the second target feature vector into the second fully connected layer to obtain the confidence interval output by the second fully connected layer.
- the confidence interval loss function in the error subnetwork is:
- Pre_Error_Loss MSE(Pre_Error, Real_Error*f(New_Bitrate));
- Pre_Error_Loss is the confidence interval loss function
- Pre_Error is the confidence interval
- Real_Error is the actual code rate difference
- f(New_Bitrate) is a mapping function based on the change of New_Bitrate
- Pre_Bitrate is the bit rate prediction value
- Real_Bitrate is the true value of the bit rate prediction value
- New_Bitrate is the current set bit rate.
- Pre_Error if Pre_Error also directly uses MSE to calculate the loss function, Pre_Error tends to always output a value of 0, that is, the prediction task of Pre network predicting Pre_Bitrate is repeated. After mapping by function f(x), it is equivalent to multiplying Real_Error by a scaling factor that changes with New_Bitrate. The changing trend of the scaling factor is that it becomes larger as the set bit rate increases, which is also consistent with the coding difference. actual characteristics.
- the network status information obtained in step S206 includes: sending code rate, buffer area occupation, receiving code rate, delay, actual packet loss rate, and number of retransmitted packets.
- the above step S208 may specifically include: combining the code rate prediction value, the confidence interval and the network status information to obtain a target status set; inputting the target status set into a pre-trained A code rate decision network model is used to perform code rate selection decisions, and the decision code rate output by the code rate decision network model is obtained.
- the code rate decision network model can be an adaptive bit rate (Adaptive bitrate (abbreviated as ABR) model.
- ABR adaptive bitrate
- the method further includes setting appropriate encoding parameters, such as resolution, frame rate, quantization parameters, etc., according to the decision code rate. Specifically, it is also necessary to determine whether the encoding parameters need to be adjusted based on the decision code rate and the encoder. When the encoding parameters need to be adjusted, the specific resolution, frame rate, quantization parameters, etc. of the encoder must be determined.
- the encoding parameters of the encoder not only ensure the smoothness of video transmission, but also realize the full utilization of network bandwidth and improve the user's quality of experience.
- FIG. 6 is a test structure diagram in an adaptive code rate decision network according to an embodiment of the present application. As shown in Figure 6, the following three types of adaptive code rate decision networks are tested:
- the original decision-making network net_raw only refers to the network status information Net_Info to make code rate decisions
- the full information decision-making network net_all directly refers to all network status information Net_Info, time domain sensing information TI, air domain sensing information SI and coding difference dif to make code rate decisions;
- the decision-making network net_ours of this application is based on the code rate prediction neural network Pre-Err and makes code rate decisions with reference to the code rate prediction value Pre_Bitrate, the confidence interval Pre_Error and the network status information Net_Info.
- test results are as follows:
- net_all uses low bit rate for a long time, although there will be no lag or delay, it does not fully utilize the bandwidth environment and cannot improve the user's QoE;
- net_ours takes into account the advantages of both, not only avoiding the impact caused by coding differences, but also ensuring a certain bandwidth utilization, thus ensuring the user's QoE.
- the Pre-Err network provides a more accurate code rate prediction value
- technicians can also apply The prediction result should be Used in traditional heuristic code rate control methods, this method can also avoid network congestion caused by differences.
- Figure 7 is a flow chart of code rate prediction and code rate decision-making according to an embodiment of the present application. As shown in Figure 7, the flow chart is divided into two parts: actual video code rate prediction and video transmission code rate guidance, and specifically includes the following steps :
- Step S701 calculate SI and TI
- Step S702 collect video coding information
- Step S703 input the information in steps S701 and S702 into the Pre-Err network
- Step S704 Pre-Err network outputs the predicted value
- Step S705 collect network status information
- Step S706, combine state sets
- Step S707 input the above state set into the code rate decision network
- Step S708 The code rate decision network outputs the decision code rate.
- SI is the spatial domain sensing information and TI is the time domain sensing information.
- the SI and TI values can represent the complexity characteristics of the video frame; the video coding information is used to represent the video coding characteristics; the code rate prediction network Pre -Err network includes two parts: prediction sub-network and error sub-network, which can predict the actual coding rate and the fluctuation range of the actual coding rate respectively.
- steps S705 to S708 refer to the network status information and the predicted actual code rate range, use the code rate decision network to determine the video decision code rate, and then guide the encoder's transmission code rate, which not only achieves full network bandwidth Utilization ensures the quality of video transmission, avoids the problem of error in bit rate decision-making due to the always existing difference between the actual bit rate and the set bit rate, and improves the quality of user experience.
- FIG. 8 is a block diagram of the device for determining the video decision code rate according to the embodiment of the present application.
- the video decision code rate Identified devices include:
- the first acquisition module 802 is used to acquire the video complexity information and video coding information of the transmitted video
- Code rate prediction module 804 configured to determine a code rate prediction value and a confidence interval of the code rate prediction value according to the video complexity information and the video coding information;
- the second acquisition module 806 is used to acquire network status information during video transmission
- the code rate decision module 808 is used to determine the decision code rate for video transmission based on the code rate prediction value, the confidence interval and the network status information.
- the first acquisition module 802 includes:
- a first acquisition unit configured to acquire a preset amount of spatial domain perception information and temporal domain perception information, wherein the video complexity information includes: the spatial domain perception information and the temporal domain perception information;
- the second acquisition unit is used to acquire a preset number of historical bit rate differences, current set bit rates, and current I-frame information, where the video coding information includes: the historical bit rate difference, the current set bit rate, and the current set bit rate. Fixed code rate and the current I frame information.
- the first acquisition unit is further configured to extract a video image group within a first preset time period from the transmitted video at a first preset time interval; extract video frames from the video image group Perform time down-sampling and spatial down-sampling; determine the time-domain perception information according to the sampling result of the time down-sampling; determine the spatial domain perception information according to the sampling result of the spatial down-sampling.
- the second acquisition unit is further configured to acquire the history within the second preset time every second preset time interval.
- the historical set code rate and the historical actual code rate determine the difference between the historical actual code rate and the historical set code rate as the historical code rate difference; every second preset time interval, obtain the current setting Fixed code rate and current I frame information, wherein the current I frame information is used to identify whether the video frame within the second preset time contains an I frame.
- the code rate prediction module 804 also includes:
- a first combining unit configured to combine a preset number of the video complexity information and the video coding information to obtain video prediction input information
- a prediction unit configured to input the video prediction input information into a pre-trained code rate prediction neural network, and obtain the code rate prediction value and the confidence interval output by the code rate prediction neural network.
- the combining unit is also used to combine a preset amount of temporal sensing information, spatial sensing information, historical bit rate difference, current set bit rate and current I frame information to obtain the video prediction input information , wherein the video complexity information includes the spatial domain perception information and the temporal domain perception information, and the video coding information includes the historical code rate difference, the current set code rate and the current I frame Information, the video prediction input information is an M ⁇ N matrix, M is the type of information in the video prediction input information, and N is the preset number.
- the prediction unit further includes:
- a first prediction unit configured to input the video prediction input information into the prediction sub-network and obtain the code rate prediction value output by the prediction sub-network;
- the second prediction unit is used to input the video prediction input information into the error sub-network to obtain the confidence interval output by the error sub-network.
- the code rate decision module 808 also includes:
- a second combining unit configured to combine the code rate prediction value, the confidence interval and the network status information to obtain a target status set
- a decision-making unit configured to input the target state set into a pre-trained code rate decision network to obtain the decision code rate output by the code rate decision network.
- FIG. 9 is a schematic diagram of the operating environment of the video decision-making code rate determination device according to another embodiment of the present application.
- the video decision-making code rate determination device includes a data acquisition part and a data processing part:
- the data collection part includes:
- the video picture feature extraction module 901 is used to extract video picture features from the real-time video transmission process
- the video coding feature collection module 902 is used to obtain historical video coding information, the current set coding rate, and the current I frame information;
- the network status feature collection module 903 is used to collect network status information based on network transmission conditions, including: sending code rate, buffer area occupation, receiving code rate, delay, actual packet loss rate, and number of retransmitted packets;
- the data processing part includes:
- the complexity feature value calculation module 904 is used to calculate the spatial perception information SI and the temporal perception information TI according to the extracted video picture features;
- the code rate prediction network module 905 is used to determine the code rate prediction value and the confidence interval of the code rate prediction value
- the state combination module 906 is used to combine network state information, code rate prediction values and confidence intervals to obtain a new state set
- Adaptive code rate decision network 907 is used to determine the decision code rate based on the new state set
- the code rate decision implementation module 908 is used to determine appropriate coding parameters, such as resolution, frame rate, quantization parameters, etc., based on the decision code rate.
- Embodiments of the present application also provide a computer-readable storage medium that stores a computer program, wherein the computer program executes the steps in any of the above method embodiments when run by a processor.
- the above-mentioned computer-readable storage medium may include but is not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.
- ROM read-only memory
- RAM random access memory
- mobile hard disk magnetic disk or optical disk and other media that can store computer programs.
- An embodiment of the present application also provides an electronic device, including a memory and a processor.
- a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
- the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
- modules or steps of the present application can be implemented using general-purpose computing devices, and they can be concentrated on a single computing device, or distributed across a network composed of multiple computing devices. They may be implemented in program code executable by a computing device, such that they may be stored in a storage device for execution by the computing device, and in some cases may be executed in a sequence different from that shown herein. or the described steps, or they are respectively made into individual integrated circuit modules, or multiple modules or steps among them are made into a single integrated circuit module. As such, the application is not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请提供了一种视频决策码率确定方法、装置、存储介质及电子装置,该方法包括:获取传输视频的视频复杂度信息和视频编码信息,根据视频复杂度信息和视频编码信息确定码率预测值和码率预测值的置信区间,获取视频传输过程中的网络状态信息,根据码率预测值、置信区间及网络状态信息确定视频传输的决策码率。
Description
相关申请的交叉引用
本公开基于2022年8月25日提交的发明名称为“一种视频决策码率确定方法、装置、存储介质及电子装置”的中国专利申请CN202211026597.8,并且要求该专利申请的优先权,通过引用将其所公开的内容全部并入本公开。
本申请涉及视频传输领域,具体而言,涉及一种视频决策码率确定方法、装置、存储介质及电子装置。
随着网络视频的需求量快速增加,出现了越来越多的实时音视频传输场景,例如视频通话、屏幕共享、远程桌面访问、云游戏等。在视频传输的过程中,往往将编码码率作为重要的编码参数,以此控制视频大小与可用网络带宽之间的关系。在有限的网络带宽资源下让用户得到更好的观看体验,一直是视频传输方面的热点问题。
现有技术中,通常会在客户端做码率的选择,比如自适应比特率(Adaptive bitrate,简称为ABR)模型,就是通过吞吐量预测、缓冲区、综合利用带宽预测以及缓存信息等方法,使得用户所播放的视频块码率尽可能匹配当前网络带宽,从而提升用户的体验质量(Quality of Experience,简称为QoE)。
然而,现有技术中的自适应方法通常是在计算出设定编码码率后交给视频编码器实现码率控制,但编码器并不是总能很好地达到设定的编码码率,实际的编码码率大小会与设定值之间有一定的差值。这样的现象在不同的视频内容之间尤其明显,画面简单的视频达不到很高的实际码率,画面复杂的视频也很难压缩到很低的实际码率上。图3是相关技术中不同视频内容设定编码码率与实际编码码率对比图,如图3所示,HowTo是实用知识类短视频,Game是游戏视频,Lecture是演讲视频,实际编码码率与设定编码码率之间总是存在差值,且不同类型视频在同一设定编码码率下的实际编码码率也不相同。
设定编码码率与实际编码码率之间存在的差值对于视频传输来说是一个巨大的隐患,在此基础上做出的码率决策会在该差值较大时造成决策过于激进,伴随着视频的卡顿、丢帧等情况发生;同时也会在该值较小的时候造成决策过于保守,伴随着带宽不能充分利用的情况发生,两种情况都不能很好地保证用户的体验质量。
发明内容
本申请实施例提供了一种视频决策码率确定方法、装置、存储介质及电子装置,以至少解决相关技术中实际编码码率与设定编码码率之间总是存在一定差值,导致码率决策失误的问题。
根据本申请的一个实施例,提供了一种视频决策码率确定方法,该方法包括:获取传输视频的视频复杂度信息和视频编码信息;根据视频复杂度信息和视频编码信息确定码率预测值和码率预测值的置信区间;获取视频传输过程中的网络状态信息;根据码率预测值、置信区间及网络状态信息确定视频传输的决策码率。
根据本申请的另一个实施例,提供了一种视频决策码率确定装置,该装置包括:第一获取模块,用于获取传输视频的视频复杂度信息和视频编码信息;码率预测模块,用于根据视频复杂度信息和视频编码信息确定码率预测值和码率预测值的置信区间;第二获取模块,用于获取视频传输过程中的网络状态信息;码率决策模块,用于根据码率预测值、置信区间及网络状态信息确定视频传输的决策码率。
根据本申请的又一个实施例,还提供了一种计算机可读的存储介质,存储介质中存储有计算机程序,其中,计算机程序被处理器运行时执行上述任一项方法实施例中的步骤。
根据本申请的又一个实施例,还提供了一种电子装置,包括存储器和处理器,存储器中存储有计算机程序,处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
图1是本申请实施例的视频决策码率确定方法的硬件结构框图;
图2是根据本申请实施例的视频决策码率确定方法的流程图;
图3是相关技术中不同视频内容设定编码码率与实际编码码率对比图;
图4是根据本申请实施例的视频预测输入信息的参数结构示意图;
图5是本申请实施例中的码率预测神经网络的结构示意图;
图6是根据本申请实施例在自适应码率决策网络中的测试结构图;
图7是根据本申请实施例的码率预测和码率决策的流程图;
图8是根据本申请实施例的视频决策码率确定装置的框图;
图9是根据本申请另一实施例的视频决策码率确定装置的运行环境示意图。
下文中将参考附图并结合实施例来详细说明本申请的实施例。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例中所提供的方法实施例可以在移动终端、计算机终端、云服务器或者类似的运算装置中执行。以运行在计算机终端上为例,图1是本申请实施例的视频决策码率确定方法的硬件结构框图,如图1所示,硬件单板可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,其中,上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本申请实施例中的视频决策码率确定方法对应的计算机程序,处理器102通过运行存储在存储器104
内的计算机程序,从而执行各种功能应用以及视频决策码率确定处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输设备106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括通信供应商提供的无线网络。在一个实例中,传输设备106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
在本实施例中提供了一种视频决策码率确定方法,图2是根据本申请实施例的视频决策码率确定方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,获取传输视频的视频复杂度信息和视频编码信息;
步骤S204,根据视频复杂度信息和视频编码信息确定码率预测值和码率预测值的置信区间;
步骤S206,获取视频传输过程中的网络状态信息;
步骤S208,根据码率预测值、置信区间及网络状态信息确定视频传输的决策码率。
在本实施例中,上述步骤S202具体可以包括如下步骤:
步骤S2022,获取预设数量的空域感知信息和时域感知信息,其中,所述视频复杂度信息包括:所述空域感知信息和所述时域感知信息;
步骤S2024,获取预设数量的历史码率差值、当前设定码率以及当前I帧信息,其中,所述视频编码信息包括:所述历史码率差值、所述当前设定码率以及所述当前I帧信息。
在本实施例中,上述步骤S2022具体可以包括:
每间隔第一预设时间,从所述传输视频中提取第一预设时间内的视频图像组;
从所述视频图像组中抽取视频帧进行时间下采样与空间下采样;
根据所述时间下采样的采样结果确定所述时域感知信息;
根据所述空间下采样的采样结果确定所述空域感知信息。
具体的,ITU-R BT.1788建议书中提供了时域感知信息TI和空域感知信息SI的计算方法,SI是将视频帧画面通过Sobel滤波器进行滤波,然后计算标准差得到,TI则是基于两帧之间的画面变化,两帧之间相同位置上各像素之间做差值得到。但是,SI和TI的原始定义需要对一段视频内的所有视频帧的所有像素点做计算,需要大量的计算时间和内存要求,不能用于实时性要求高的场景,而本实施例在SI、TI原始定义的基础上做了时间和空间上各自的下采样。从上一秒包含的视频图像组中抽取一定数量的视频帧实现时间上的下采样,然后对其做抽值实现空间上的下采样。
在本实施例中,若预设数量为4,第一预设时间为1/3秒,则需要获取前4/3秒内4帧视频帧的SI和TI值,计算得到的SI和TI值为1/3秒内随机一帧的复杂度特征值。
由于TI值需要前后两帧的数据信息,所以4帧的特征值一共需要8帧的数据量才能计算,分辨率为1920*1080的视频帧通过下采样可以得到192*108大小的矩阵,像素值采用YUV分量中的Y分量,通过下采样后得到8张192*108的画面计算SI和TI值。本申请也支持其他
下采样方式。
在本实施例中,上述步骤S2024具体可以包括:
每间隔第二预设时间,获取第二预设时间内的历史设定码率与历史实际码率;
将所述历史实际码率与所述历史设定码率的差值确定为所述历史码率差值;
每间隔第二预设时间,获取当前设定码率和当前I帧信息,其中,所述当前I帧信息用于标识第二预设时间内的视频帧是否包含I帧。
在本实施例中,若预设数量为4,第二预设时间为1秒,则需要获取前4秒时间内每一秒各自的设定码率Set_Bitrate和实际码率Real_Bitrate,进而计算前4秒时间内每一秒各自的码率差值dif_Bitrate,具体的,dif_Bitrate=Real_Bitrate-Set_Bitrate。
在本实施例中,还需要考虑编码器的当前设定码率New_Bitrate和用来标识当前视频帧是否为I帧的标志位It,对当前传输视频帧之后的视频帧进行判断,若第二预设时间内的视频帧包含I帧,则标识为1,若第二预设时间内的视频帧没有I帧,则标识为0。
在本实施例中,上述步骤S204具体可以包括如下步骤:
步骤S2042,将预设数量的所述视频复杂度信息和所述视频编码信息组合得到视频预测输入信息;
步骤S2044,将所述视频预测输入信息输入预先训练好的码率预测神经网络,得到所述码率预测神经网络输出的所述码率预测值和所述置信区间。
在本实施例中,上述步骤S2042具体可以包括:将预设数量的时域感知信息、空域感知信息、历史码率差值、当前设定码率及当前I帧信息组合得到所述视频预测输入信息,其中,所述视频复杂度信息包括所述空域感知信息和所述时域感知信息,所述视频编码信息包括所述历史码率差值、所述当前设定码率以及所述当前I帧信息,所述视频预测输入信息为M×N的矩阵,M为所述视频预测输入信息中的信息种类,N为所述预设数量。
图4是根据本申请实施例的视频预测输入信息的参数结构示意图,如图4所示,视频预测输入信息State矩阵的大小为(5,4),矩阵中包含时域感知信息SI、空域感知信息TI、历史码率差值dif、当前设定码率b及当前I帧信息I共五种信息,每种信息数量为4。
具体的,不同的数字下标表示在不同时刻获取的信息,m-1、m-2、m-3、m-4分别表示每间隔第一预设时间获取的信息,若第一预设时间为1/3秒,则需要获取前4/3秒内的SI和TI值;t表示当前时刻,t-1、t-2、t-3、t-4分别表示每间隔第二预设时间获取的信息,若第二预设时间为1秒,则需要获取前4秒内的码率差值,前3秒内的I帧信息、设定码率,以及当前I帧信息、当前设定码率。
在本实施例中,上述步骤S2044具体可以包括:将所述视频预测输入信息输入所述预测子网络,得到所述预测子网络输出的所述码率预测值;将所述视频预测输入信息输入所述误差子网络,得到所述误差子网络输出的所述置信区间;码率预测神经网络包括预测子网络和误差子网络。
具体的,步骤S2044中的码率预测神经网络Pre-Err网络中的预测子网络Pre网络和误差子网络Err网络共用网络层。
图5是本申请实施例中的码率预测神经网络的结构示意图,如图5所示,码率预测神经网络的结构具体可以包括:门控循环单元(Gate Recurrent Unit,简称为GRU),卷积神经网络(Convolutional Neural Network,简称为CNN),连接层(Concat),全连接层(Fully
Connected Layers,简称为FC),输入Input为视频预测输入信息,输出结果为Pre_Bitrate码率预测值和Pre_Error码率预测值的置信区间,其中的CNN层为一维卷积神经网络1D-CNN。
在本实施例中,预测子网络包括:由两层门控循环单元GRU组成的GRU层,第一卷积神经网络CNN层,第一连接Concat层,以及第一全连接FC层。
具体地,将所述视频预测输入信息输入所述预测子网络,得到所述预测子网络输出的所述码率预测值,包括如下步骤:
步骤S1,将所述视频预测输入信息输入所述GRU层,得到所述GRU层输出的第一输出结果,其中,所述视频预测输入信息为M×N的矩阵;
步骤S2,根据所述视频预测输入信息的信息种类M将所述视频预测输入信息分为M个一维向量,其中,所述一维向量长度为N;
步骤S3,将所述M个一维向量分别输入第一CNN层,得到所述第一CNN层输出的第二输出结果;
步骤S4,通过第一连接层将所述第一输出结果和所述第二输出结果进行组合,得到第一目标特征向量;
步骤S5,将所述第一目标特征向量输入第一全连接层,得到所述第一全连接层输出的所述码率预测值。
在本实施例中,步骤S1通过GRU层对State矩阵进行整体计算,可以学习到不同种类数据之间的联系;步骤S3对不同种类数据单独进行卷积运算,可以学习到单个种类数据自己的特征;步骤S4,将GRU层和CNN层的结果在Concat层结合,可以得到描述视频内容特征的向量。
在本实施例中,预测子网络中码率预测值损失函数为:
Pre_Bitrate_Loss=MSE(Pre_Bitrate,Real_Bitrate);
其中,Pre_Bitrate_Loss为码率预测值损失函数,Pre_Bitrate为码率预测值,Real_Bitrate为码率预测值的真值。
在本实施例中,误差子网络包括:切片层,第二卷积神经网络CNN层,第二连接Concat层,以及第二全连接FC层。
具体地,将所述视频预测输入信息输入所述误差子网络,得到所述误差子网络输出的所述置信区间,包括如下步骤:
步骤S6,通过切片层从视频预测输入信息中提取历史码率差值的一维向量;
步骤S7,将历史码率差值的一维向量输入第二CNN层,得到第二CNN层输出的第三输出结果;
步骤S8,通过第二连接层将第三输出结果和第一目标特征向量进行组合,得到第二目标特征向量;
步骤S9,将第二目标特征向量输入第二全连接层,得到第二全连接层输出的置信区间。
在本实施例中,误差子网络中的置信区间损失函数为:
Pre_Error_Loss=MSE(Pre_Error,Real_Error*f(New_Bitrate));
Real_Error=|Pre_Bitrate-Real_Bitrate|;
其中,Pre_Error_Loss为所述置信区间损失函数,Pre_Error为置信区间,Real_Error为实际码率差值,f(New_Bitrate)为根据New_Bitrate变化的一次映射函数,Pre_Bitrate
为码率预测值,Real_Bitrate为码率预测值的真值,New_Bitrate为当前设定码率。
具体的,因为视频编码码率差值有着明显的随编码码率升高而变大的特征,所以使用一次函数做拟合,一次映射函数可以是f(x)=k*(x+b)。
在本实施例中,如果Pre_Error也直接使用MSE计算损失函数,Pre_Error趋于一直输出0值,即重复Pre网络预测Pre_Bitrate的预测任务。而经函数f(x)映射后,相当于使Real_Error乘上了一个随New_Bitrate变化的比例因子,比例因子的变化趋势是随着设定码率升高而变大,这也符合编码差值的实际特性。
在本实施例中,上述步骤S206中获取的网络状态信息包括:发送码率、缓存区占用、接收码率、延迟、实际丢包率及重传包数量。
在本实施例中,上述步骤S208具体可以包括:将所述码率预测值、所述置信区间及所述网络状态信息组合得到目标状态集;将所述目标状态集输入预先训练好的用于进行码率选择决策的码率决策网络模型,得到所述码率决策网络模型输出的所述决策码率。
具体的,将码率预测值Pre_Bitrate、置信区间Pre_Error及网络状态信息Net_Info组合时需要考虑三个值各自的时间戳,进行时间上的对齐工作,码率决策网络模型可以是自适应比特率(Adaptive bitrate,简称为ABR)模型。
在本实施例中,在上述步骤S208之后,所述方法还包括根据决策码率设置合适的编码参数,如分辨率、帧率、量化参数等。具体的,还需要根据决策码率和编码器确定是否需要对编码参数进行调整,在需要调整编码参数时确定编码器具体的分辨率、帧率、量化参数等。
在本实施例中,通过上述步骤S202至S208,可以实现对实际编码码率和码率波动范围的精准预测,进而参考码率预测值和码率预测值的置信区间进行码率决策,确定编码器的编码参数,既保证了视频传输的流畅,又实现了网络带宽的充分利用,提高了用户的体验质量。
图6是根据本申请实施例在自适应码率决策网络中的测试结构图,如图6所示,对以下三种自适应码率决策网络进行测试:
原始决策网络net_raw,仅参考网络状态信息Net_Info进行码率决策;
全信息决策网络net_all,直接参考全部网络状态信息Net_Info、时域感知信息TI、空域感知信息SI以及编码差值dif进行码率决策;
本申请决策网络net_ours,在码率预测神经网络Pre-Err的基础上,参考码率预测值Pre_Bitrate、置信区间Pre_Error及网络状态信息Net_Info进行码率决策。
具体的,将三种网络结构在同一视频、同一网络环境下进行测试,测试结果如下:
net_raw因为没有考虑编码码率与实际码率差值的情况,容易造成实际码率远远超出网络带宽的情况,这可能会给用户带来视频卡顿、延迟的现象,造成用户的体验质量(Quality of Experience,简称为QoE)下降;
net_all因为长时间使用低码率的决策,虽然不会出现卡顿、延迟,但却没有充分利用带宽环境,不能很好地提升用户的QoE;
net_ours兼顾了两者的优点,既避免了因为编码差值带来的影响,同时也保障了一定的带宽利用率,从而保障了用户的QoE。
在另一实施例中,除了将码率预测神经网络Pre-Err的预测结果应用于自适应码率决策网络以外,由于Pre-Err网络提供了更准确的码率预测值,技术人员还可以将该预测结果应
用在传统启发式的码率控制方法中,该方法也可以避免由于差值带来网络拥塞现象的发生。
图7是根据本申请实施例的码率预测和码率决策的流程图,如图7所示,该流程图分为实际视频码率预测和视频传输码率指导两部分,并具体包括以下步骤:
步骤S701,计算SI、TI;
步骤S702,收集视频编码信息;
步骤S703,将步骤S701和S702中的信息输入Pre-Err网络;
步骤S704,Pre-Err网络输出预测值;
步骤S705,收集网络状态信息;
步骤S706,组合状态集;
步骤S707,将上述状态集输入码率决策网络;
步骤S708,码率决策网络输出决策码率。
在上述步骤S701至S704中,SI为空域感知信息、TI为时域感知信息,通过SI、TI值可以表示视频帧的复杂度特征;视频编码信息用于表示视频编码特征;码率预测网络Pre-Err网络包括预测子网络和误差子网络两部分,可以分别对实际编码码率和实际编码码率的波动范围进行预测。
在上述步骤S705至S708中,参考网络状态信息和预测的实际码率范围,使用码率决策网络确定视频决策码率,进而对编码器的发送码率进行指导,既实现了对网络带宽的充分利用,又保证了视频传输质量,避免了因实际码率与设定码率之间总存在差值导致的码率决策失误的问题,提升了用户体验质量。
根据本申请实施例的另一方面,还提供了一种视频决策码率确定装置,图8是根据本申请实施例的视频决策码率确定装置的框图,如图8所示,视频决策码率确定装置包括:
第一获取模块802,用于获取传输视频的视频复杂度信息和视频编码信息;
码率预测模块804,用于根据所述视频复杂度信息和所述视频编码信息确定码率预测值和所述码率预测值的置信区间;
第二获取模块806,用于获取视频传输过程中的网络状态信息;
码率决策模块808,用于根据所述码率预测值、所述置信区间及所述网络状态信息确定视频传输的决策码率。
在一实施例中,第一获取模块802包括:
第一获取单元,用于获取预设数量的空域感知信息和时域感知信息,其中,所述视频复杂度信息包括:所述空域感知信息和所述时域感知信息;
第二获取单元,用于获取预设数量的历史码率差值、当前设定码率以及当前I帧信息,其中,所述视频编码信息包括:所述历史码率差值、所述当前设定码率以及所述当前I帧信息。
在一实施例中,第一获取单元,还用于每间隔第一预设时间,从所述传输视频中提取第一预设时间内的视频图像组;从所述视频图像组中抽取视频帧进行时间下采样与空间下采样;根据所述时间下采样的采样结果确定所述时域感知信息;根据所述空间下采样的采样结果确定所述空域感知信息。
在一实施例中,第二获取单元,还用于每间隔第二预设时间,获取第二预设时间内的历
史设定码率与历史实际码率;将所述历史实际码率与所述历史设定码率的差值确定为所述历史码率差值;每间隔第二预设时间,获取当前设定码率和当前I帧信息,其中,所述当前I帧信息用于标识第二预设时间内的视频帧是否包含I帧。
在一实施例中,码率预测模块804,还包括:
第一组合单元,用于将预设数量的所述视频复杂度信息和所述视频编码信息组合得到视频预测输入信息;
预测单元,用于将所述视频预测输入信息输入预先训练好的码率预测神经网络,得到所述码率预测神经网络输出的所述码率预测值和所述置信区间。
在一实施例中,组合单元,还用于将预设数量的时域感知信息、空域感知信息、历史码率差值、当前设定码率及当前I帧信息组合得到所述视频预测输入信息,其中,所述视频复杂度信息包括所述空域感知信息和所述时域感知信息,所述视频编码信息包括所述历史码率差值、所述当前设定码率以及所述当前I帧信息,所述视频预测输入信息为M×N的矩阵,M为所述视频预测输入信息中的信息种类,N为所述预设数量。
在一实施例中,预测单元,还包括:
第一预测单元,用于将所述视频预测输入信息输入所述预测子网络,得到所述预测子网络输出的所述码率预测值;
第二预测单元,用于将所述视频预测输入信息输入所述误差子网络,得到所述误差子网络输出的所述置信区间。
在一实施例中,码率决策模块808,还包括:
第二组合单元,用于将所述码率预测值、所述置信区间及所述网络状态信息组合得到目标状态集;
决策单元,用于将所述目标状态集输入预先训练好的码率决策网络,得到所述码率决策网络输出的所述决策码率。
图9是根据本申请另一实施例的视频决策码率确定装置的运行环境示意图,如图9所示,视频决策码率确定装置中包括数据采集部分和数据处理部分:
在本实施例中,数据采集部分包括:
视频画面特征提取模块901,用于从实时视频传输过程中提取视频画面特征;
视频编码特征采集模块902,用于获取历史的视频编码信息、当前的设定编码码率及当前I帧信息;
网络状态特征采集模块903,用于根据网络传输情况采集网络状态信息,包括:发送码率、缓存区占用、接收码率、延迟、实际丢包率、重传包数量;
在本实施例中,数据处理部分包括:
复杂度特征值计算模块904,用于根据提取的视频画面特征计算空域感知信息SI和时域感知信息TI;
码率预测网络模块905,用于确定码率预测值和码率预测值的置信区间;
状态结合模块906,用于将网络状态信息、码率预测值及置信区间结合得到新的状态集;
自适应码率决策网络907,用于根据新的状态集确定决策码率;
码率决策实施模块908,用于根据决策码率确定合适的编码参数,如分辨率、帧率、量化参数等。
本申请的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,其中,该计算机程序被处理器运行时执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的示例性实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
Claims (14)
- 一种视频决策码率确定方法,所述方法包括:获取传输视频的视频复杂度信息和视频编码信息;根据所述视频复杂度信息和所述视频编码信息确定码率预测值和所述码率预测值的置信区间;获取视频传输过程中的网络状态信息;根据所述码率预测值、所述置信区间及所述网络状态信息确定视频传输的决策码率。
- 根据权利要求1所述的方法,其中,所述获取传输视频的视频复杂度信息和视频编码信息,包括:获取预设数量的空域感知信息和时域感知信息,其中,所述视频复杂度信息包括:所述空域感知信息和所述时域感知信息;获取预设数量的历史码率差值、当前设定码率以及当前I帧信息,其中,所述视频编码信息包括:所述历史码率差值、所述当前设定码率以及所述当前I帧信息。
- 根据权利要求2所述的方法,其中,所述获取预设数量的空域感知信息和时域感知信息,包括:每间隔第一预设时间,从所述传输视频中提取第一预设时间内的视频图像组;从所述视频图像组中抽取视频帧进行时间下采样与空间下采样;根据所述时间下采样的采样结果确定所述时域感知信息;根据所述空间下采样的采样结果确定所述空域感知信息。
- 根据权利要求2所述的方法,其中,所述获取预设数量的历史码率差值、设定码率以及I帧信息,包括:每间隔第二预设时间,获取第二预设时间内的历史设定码率与历史实际码率;将所述历史实际码率与所述历史设定码率的差值确定为所述历史码率差值;每间隔第二预设时间,获取当前设定码率和当前I帧信息,其中,所述当前I帧信息用于标识第二预设时间内的视频帧是否包含I帧。
- 根据权利要求1所述的方法,其中,所述根据所述视频复杂度信息和所述视频编码信息确定码率预测值和所述码率预测值的置信区间,包括:将预设数量的所述视频复杂度信息和所述视频编码信息组合得到视频预测输入信息;将所述视频预测输入信息输入预先训练好的码率预测神经网络,得到所述码率预测神经网络输出的所述码率预测值和所述置信区间。
- 根据权利要求5所述的方法,其中,所述将预设数量的所述视频复杂度特征值和所述视频编码信息组合得到视频预测输入信息,包括:将预设数量的时域感知信息、空域感知信息、历史码率差值、当前设定码率及当前I帧信息组合得到所述视频预测输入信息,其中,所述视频复杂度信息包括所述空域感知信息和所述时域感知信息,所述视频编码信息包括所述历史码率差值、所述当前设定码率以及所述当前I帧信息,所述视频预测输入信息为M×N的矩阵,M为所述视频预测输入信息中的信息种类,N为所述预设数量。
- 根据权利要求6所述的方法,其中,所述将所述视频预测输入信息输入预先训练好的 码率预测神经网络,得到所述码率预测神经网络输出的所述码率预测值和所述置信区间,包括:所述码率预测神经网络包括预测子网络和误差子网络;将所述视频预测输入信息输入所述预测子网络,得到所述预测子网络输出的所述码率预测值;将所述视频预测输入信息输入所述误差子网络,得到所述误差子网络输出的所述置信区间。
- 根据权利要求7所述的方法,其中,所述将所述视频预测输入信息输入所述预测子网络,得到所述预测子网络输出的所述码率预测值,包括:所述预测子网络包括:由两层门控循环单元GRU组成的GRU层,第一卷积神经网络CNN层,第一连接Concat层,以及第一全连接FC层;将所述视频预测输入信息输入所述GRU层,得到所述GRU层输出的第一输出结果,其中,所述视频预测输入信息为M×N的矩阵;根据所述视频预测输入信息的信息种类M将所述视频预测输入信息分为M个一维向量,其中,所述一维向量长度为N;将所述M个一维向量分别输入第一CNN层,得到所述第一CNN层输出的第二输出结果;通过第一连接层将所述第一输出结果和所述第二输出结果进行组合,得到第一目标特征向量;将所述第一目标特征向量输入第一全连接层,得到所述第一全连接层输出的所述码率预测值。
- 根据权利要求7所述的方法,其中,所述将所述视频预测输入信息输入所述误差子网络,得到所述误差子网络输出的所述置信区间,包括:所述误差子网络包括:切片层,第二卷积神经网络CNN层,第二连接Concat层,以及第二全连接FC层;通过所述切片层从所述视频预测输入信息中提取所述历史码率差值的一维向量;将所述历史码率差值的一维向量输入第二CNN层,得到所述第二CNN层输出的第三输出结果;通过第二连接层将所述第三输出结果和所述第一目标特征向量进行组合,得到第二目标特征向量;将所述第二目标特征向量输入第二全连接层,得到所述第二全连接层输出的所述置信区间。
- 根据权利要求9所述的方法,其中,所述误差子网络中的置信区间的损失函数为:Pre_Error_Loss=MSE(Pre_Error,Real_Error*f(New_Bitrate));Real_Error=|Pre_Bitrate-Real_Bitrate|;其中,Pre_Error_Loss为所述置信区间损失函数,Pre_Error为所述置信区间,Real_Error为实际码率差值,f(New_Bitrate)为根据New_Bitrate变化的一次映射函数,Pre_Bitrate为码率预测值,Real_Bitrate为码率预测值的真值,New_Bitrate为当前设定码率。
- 根据权利要求1所述的方法,其中,所述根据所述码率预测值、所述置信区间及所述网络状态信息确定视频传输的决策码率,包括:将所述码率预测值、所述置信区间及所述网络状态信息组合得到目标状态集;将所述目标状态集输入预先训练好的用于进行码率选择决策的码率决策网络模型,得到所述码率决策网络模型输出的所述决策码率。
- 一种视频决策码率确定装置,所述装置包括:第一获取模块,用于获取传输视频的视频复杂度信息和视频编码信息;码率预测模块,用于根据所述视频复杂度信息和所述视频编码信息确定码率预测值和所述码率预测值的置信区间;第二获取模块,用于获取视频传输过程中的网络状态信息;码率决策模块,用于根据所述码率预测值、所述置信区间及所述网络状态信息确定视频传输的决策码率。
- 一种计算机可读的存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被处理器运行时执行所述权利要求1至11任一项中所述的方法。
- 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至11任一项中所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211026597.8A CN117640920A (zh) | 2022-08-25 | 2022-08-25 | 一种视频决策码率确定方法、装置、存储介质及电子装置 |
CN202211026597.8 | 2022-08-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024041365A1 true WO2024041365A1 (zh) | 2024-02-29 |
Family
ID=90012458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/111567 WO2024041365A1 (zh) | 2022-08-25 | 2023-08-07 | 一种视频决策码率确定方法、装置、存储介质及电子装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117640920A (zh) |
WO (1) | WO2024041365A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118102005A (zh) * | 2024-04-28 | 2024-05-28 | 腾讯科技(深圳)有限公司 | 视频数据处理方法、装置、设备以及介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090323798A1 (en) * | 2008-06-25 | 2009-12-31 | International Business Machines Corporation | Method and system for low-complexity slepian-wolf rate estimation in wyner-ziv video encoding |
CN110324621A (zh) * | 2019-07-04 | 2019-10-11 | 北京达佳互联信息技术有限公司 | 视频编码方法、装置、电子设备和存储介质 |
CN110996125A (zh) * | 2019-11-18 | 2020-04-10 | 腾讯科技(深圳)有限公司 | 一种视频流的生成方法、装置、电子设备及存储介质 |
CN112291620A (zh) * | 2020-09-22 | 2021-01-29 | 北京邮电大学 | 视频播放方法、装置、电子设备及存储介质 |
CN113242469A (zh) * | 2021-04-21 | 2021-08-10 | 南京大学 | 一种自适应视频传输配置方法和系统 |
CN114885167A (zh) * | 2022-04-29 | 2022-08-09 | 上海哔哩哔哩科技有限公司 | 视频编码方法及装置 |
-
2022
- 2022-08-25 CN CN202211026597.8A patent/CN117640920A/zh active Pending
-
2023
- 2023-08-07 WO PCT/CN2023/111567 patent/WO2024041365A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090323798A1 (en) * | 2008-06-25 | 2009-12-31 | International Business Machines Corporation | Method and system for low-complexity slepian-wolf rate estimation in wyner-ziv video encoding |
CN110324621A (zh) * | 2019-07-04 | 2019-10-11 | 北京达佳互联信息技术有限公司 | 视频编码方法、装置、电子设备和存储介质 |
CN110996125A (zh) * | 2019-11-18 | 2020-04-10 | 腾讯科技(深圳)有限公司 | 一种视频流的生成方法、装置、电子设备及存储介质 |
CN112291620A (zh) * | 2020-09-22 | 2021-01-29 | 北京邮电大学 | 视频播放方法、装置、电子设备及存储介质 |
CN113242469A (zh) * | 2021-04-21 | 2021-08-10 | 南京大学 | 一种自适应视频传输配置方法和系统 |
CN114885167A (zh) * | 2022-04-29 | 2022-08-09 | 上海哔哩哔哩科技有限公司 | 视频编码方法及装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118102005A (zh) * | 2024-04-28 | 2024-05-28 | 腾讯科技(深圳)有限公司 | 视频数据处理方法、装置、设备以及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN117640920A (zh) | 2024-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | A two-tier system for on-demand streaming of 360 degree video over dynamic networks | |
CN113923441B (zh) | 视频质量的评估方法、装置及电子设备 | |
CN112001274B (zh) | 人群密度确定方法、装置、存储介质和处理器 | |
WO2024041365A1 (zh) | 一种视频决策码率确定方法、装置、存储介质及电子装置 | |
JP2002325094A (ja) | ネットワークにおけるビットストリーム転送中のネットワーク資源の動的割当て方法およびシステム | |
CN105430383A (zh) | 一种视频流媒体业务的体验质量评估方法 | |
Bentaleb et al. | Data-driven bandwidth prediction models and automated model selection for low latency | |
CN109714557A (zh) | 视频通话的质量评估方法、装置、电子设备和存储介质 | |
CN110248189B (zh) | 一种视频质量预测方法、装置、介质和电子设备 | |
Aguayo et al. | DASH adaptation algorithm based on adaptive forgetting factor estimation | |
WO2022000298A1 (en) | Reinforcement learning based rate control | |
CN106713901B (zh) | 一种视频质量评价方法及装置 | |
US20220337831A1 (en) | Viewport-based transcoding for immersive visual streams | |
US20220408097A1 (en) | Adaptively encoding video frames using content and network analysis | |
CN111726656A (zh) | 一种直播视频的转码方法、装置、服务器和存储介质 | |
Li et al. | Improving adaptive real-time video communication via cross-layer optimization | |
CN108810468B (zh) | 一种优化显示效果的视频传输装置及方法 | |
CN114374841A (zh) | 视频编码码率控制的优化方法、装置及电子设备 | |
WO2024120134A1 (zh) | 视频传输方法、装置、设备及存储介质 | |
WO2024041268A1 (zh) | 视频质量评估方法、装置、计算机设备、计算机存储介质及计算机程序产品 | |
Arun Raj et al. | Adaptive video streaming over HTTP through 4G wireless networks based on buffer analysis | |
CN115174919B (zh) | 一种视频处理方法、装置、设备及介质 | |
US11871061B1 (en) | Automated adaptive bitrate encoding | |
US11445200B2 (en) | Method and system for processing video content | |
CN115499657A (zh) | 视频码率自适应网络的训练方法、应用方法、装置及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23856455 Country of ref document: EP Kind code of ref document: A1 |