WO2024017106A1 - 一种码表更新方法、装置、设备及存储介质 - Google Patents
一种码表更新方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2024017106A1 WO2024017106A1 PCT/CN2023/106919 CN2023106919W WO2024017106A1 WO 2024017106 A1 WO2024017106 A1 WO 2024017106A1 CN 2023106919 W CN2023106919 W CN 2023106919W WO 2024017106 A1 WO2024017106 A1 WO 2024017106A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- code table
- code
- preference
- bit rate
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000008447 perception Effects 0.000 claims abstract description 49
- 230000006870 function Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 238000007405 data analysis Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
- H04N21/4666—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4668—Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
Definitions
- the embodiments of the present application relate to the field of video processing technology, and in particular, to a code table updating method, device, equipment and storage medium.
- Video sources are often transcoded to meet the needs of different user groups.
- Video transcoding refers to the process of recompressing an already compressed and encoded video stream according to another (or multiple) encoding formats. Transcoding will involve changes in parameters such as video resolution, bit rate, frame rate, and frame structure, making Video can meet the requirements of different user groups.
- traditional video transcoding usually adopts the "one-to-multiple" method, that is, the source video is transcoded into a series of videos with different bit rates and resolutions according to a fixed code table, and then the video is transcoded according to the user's terminal equipment and network conditions. Distribute videos of corresponding grades. Fixed code tables can ensure that most video content obtains better quality when the bit rate is limited, but the flexibility of the fixed code table solution is low, which can easily lead to mismatch between bit rate and resolution, affecting the user's video playback experience. Case.
- Embodiments of the present application provide a code table update method, device, equipment and storage medium to solve the problem of low flexibility of fixed code table solutions in related technologies, resulting in unsuitable code rate and resolution and affecting the user's video playback experience.
- Technical issues improve the flexibility of code table determination, and improve user video playback experience.
- embodiments of the present application provide a method for updating a code table, including:
- the source stream video data is input into the trained preference perception model, and the preference perception model analyzes the source stream video data and outputs a video quality distribution table, where different second video codes are recorded in the video quality distribution table The predicted preference probability corresponding to the combination of rate and second video resolution;
- a code table is updated based on the video quality distribution table, and the code table records a combination of the first video bit rate and the first video resolution of different video gears.
- embodiments of the present application provide a code table updating device, including a data acquisition module, a data analysis module and a code table updating module, wherein:
- the data acquisition module is configured to acquire source stream video data
- the data analysis module is configured to input the source stream video data into a trained preference perception model, and the preference perception model analyzes the source stream video data and outputs a video quality distribution table, the video quality distribution table Predicted preference probabilities corresponding to different combinations of second video bit rates and second video resolutions are recorded;
- the code table update module is configured to update the code table based on the video quality distribution table, and the code table records a combination of the first video bit rate and the first video resolution of different video gears.
- embodiments of the present application provide a code table updating device, including: a memory and one or more processors;
- the memory is used to store one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the code table updating method as described in the first aspect.
- embodiments of the present application provide a storage medium that stores computer-executable instructions, which when executed by a computer processor are used to perform the code table updating method as described in the first aspect.
- inventions of the present application provide a computer program product.
- the computer program product includes a computer program.
- the computer program is stored in a computer-readable storage medium.
- At least one processor of the device reads the computer program from the computer-readable storage medium.
- the computer program is fetched and executed, causing the device to execute the code table updating method described in the first aspect.
- the source stream video data is input into the preference perception model, and the preference perception model analyzes the source stream video data to obtain a video quality distribution table.
- the preference perception model analyzes the source stream video data to obtain a video quality distribution table.
- the predicted preference probability corresponding to the combination of video resolutions, code table The combination of the first video bit rate and the first video resolution recorded in different video gears is updated to obtain a combination of bit rate and resolution that is more suitable for the code table of the current source stream video data and improves the flexibility of code table determination. , effectively guaranteeing users’ video playback experience.
- Figure 1 is a flow chart of a code table updating method provided by an embodiment of the present application.
- Figure 2 is a schematic diagram of a residual structure provided by an embodiment of the present application.
- Figure 3 is a schematic network structure diagram of a preference perception model provided by an embodiment of the present application.
- Figure 4 is a flow chart of another code table update method provided by an embodiment of the present application.
- Figure 5 is a schematic flowchart of a code rate row determination process provided by an embodiment of the present application.
- Figure 6 is a schematic structural diagram of a code table updating device provided by an embodiment of the present application.
- Figure 7 is a schematic structural diagram of a code table updating device provided by an embodiment of the present application.
- Figure 1 shows a flow chart of a code table updating method provided by an embodiment of the present application.
- the code table updating method provided by an embodiment of the present application can be executed by a code table updating device.
- the code table updating device can be implemented through hardware and/or Or software, and integrated into the code table update device (such as encoding server).
- the code table update method includes:
- the source stream video data is video data before transcoding.
- video transcoding Video Transcoding
- multiple video gears will be obtained (different video gears correspond to different video bit rates and combination of video resolutions).
- the original code table can be issued by the codec server (CS, codec server) according to the user's region.
- Video transcoding can be understood as the process of re-compressing an already compressed and encoded video stream according to another (or multiple) encoding formats, such as transcoding an H.264 format stream into the HEVC format.
- Video transcoding also involves changing parameters such as video bit rate and video resolution, so that the video can meet different video playback requirements. For example, reduce the video resolution of the source video, transcode the high-definition video to low-definition video to adapt to the decoding capabilities of low-end mobile phones, or reduce the bit rate of the source video and reduce the size of the video stream to adapt to limited network bandwidth. transmission scenario.
- video transcoding of source stream video data is based on a fixed code table, that is, the source stream video data is transcoded into multiple fixed video files according to a fixed combination of code rate and resolution in the code table. bits of transcoded video data, and then determines the video file to be distributed to the user based on the user's terminal device and network conditions.
- Fixed code tables are often formulated by developers based on experience and can ensure that most video content can obtain better video quality under limited bit rates.
- fixed code tables are greatly restricted by application scenarios, and video data transcoded based on fixed code tables cannot meet the video viewing needs of different users.
- this encoding scheme is not suitable for services with high real-time requirements (such as live transcoding). Since services with high real-time requirements have high timeliness requirements, it is impossible to use the exhaustive method to obtain the code table, and the operability is poor.
- the encoding scheme uses peak signal-to-noise ratio (PSNR) as an indicator to measure video quality.
- PSNR peak signal-to-noise ratio
- peak signal-to-noise ratio cannot accurately reflect the user's subjective quality experience, making it difficult to guarantee the user's video viewing experience.
- this solution provides a preference perception model related to video content, dynamically updates the code table, reduces the fixed video bit rate and video resolution mode under the fixed code table solution, and is more consistent with video content and user experience. Ensure users’ video viewing experience.
- the preference-aware model of deep learning is used to predict the optimal video resolution under different video bit rates, which can be updated in real time.
- the code table does not need to use the exhaustive method for multiple encodings, and is more operable. And based on the user's subjective video quality experience, it predicts the preference for video resolution at different video bit rates under the corresponding video content, and the update of the code table is more in line with the user's actual viewing experience.
- the codec server provides the initial code table to the video server VS (video server).
- the code table update device in the transcoding server obtains the code table from the video server and updates the terminal device (such as a live video broadcast) in real time according to the code table.
- the source stream video data provided by the anchor in the scene is transcoded to obtain transcoded video data in multiple video grades (such as ultra-clear, high-definition, and full high-definition).
- the code table update device obtains source stream video data (source stream video data that meets resolution requirements, such as source stream video data reaching 720p) according to a set period.
- the acquired source stream video data will be provided to the preference perception model for analysis to determine each video gear. Optimal video resolution.
- This solution obtains the source stream video data according to the set period to update the code table, ensuring that in scenarios with high real-time requirements, the code table can be dynamically updated in a timely manner when the user switches scenes (such as live video switching scenes), ensuring the user's viewing experience.
- S102 Input the source stream video data into the trained preference perception model.
- the preference perception model analyzes the source stream video data and outputs a video quality distribution table.
- the video quality distribution table records different second video bit rates and second video resolutions. The predicted preference probability corresponding to the combination of rates.
- the preference perception model provided by this solution is used to analyze the input video data and output the corresponding video quality distribution table.
- the video quality distribution table records the predicted preferences corresponding to different combinations of second video bit rates and second video resolutions. Probability, where the predicted preference probability can be understood as the predicted value of the user's preference probability for the input video data under different combinations of video bit rates and video resolutions.
- the preference-aware model can extract video features through a convolutional neural network and collect sample data (such as the preference probabilities of different users for transcoded videos with different combinations of video bit rates and video resolutions under different video contents).
- sample data such as the preference probabilities of different users for transcoded videos with different combinations of video bit rates and video resolutions under different video contents.
- the preference perception model is trained, and the trained preference perception model is configured in the code table updating device.
- the source stream video data is input into the trained preference awareness model.
- the preference perception model After receiving the source stream video data, the preference perception model performs data analysis on the source stream video data and outputs a video quality distribution table.
- the video quality distribution table reflects the user's preference for different second video bit rates and The predicted value of the preference probability corresponding to the combination of the second video resolution.
- the preference perception model provided by this solution is trained based on perception data sets corresponding to sample video data of different content types, that is, using the sample video data in the perception data set as input, the corresponding video quality distribution ( sample preference probability distribution) as output for preference perception
- the model is trained.
- the perceptual data set is constructed based on the distribution law of the human eye's preference for videos of different content types in the resolution dimension, and the perceptual data set records corresponding sample video data at different third video bit rates and third video resolutions.
- the sample preference probability corresponding to the combination can be indoor live broadcasts, outdoor live broadcasts, game live broadcasts, screen recording text, animation, screen recording text, meaningless videos (such as still scenes, videos with a single tone and lack of details), etc.
- test users such as ordinary video users, non-professionals
- SCACJ Stimulus Comparison Adjectival Categorical Judgment
- the test users can Select the transcoded video (or videos) with the best quality from each parameter group ⁇ B i , R 0 ⁇ N ⁇ , and record its corresponding video resolution.
- the sample preference probability distribution is as follows: Shown:
- the user's preferred video level also changes from low-definition to high-definition; for meaningless live broadcasts, users prefer video levels under various video bit rates. The preference is not obvious; for live screen recordings with a lot of text, users prefer high-definition video files.
- the preference perception model provided by this solution is built based on the residual structure. Through the residual structure, deeper features of the video data can be extracted, and the third video code rate and the third video resolution are different. Predictions of preference probabilities under combinations are more accurate. And the fully connected layer of the preference-aware model provided by this solution maps the video feature vector of the sample video data according to the set size. The set size is based on the combination of the sample video data in different third video bit rates and third video resolutions. The size of the distribution table is determined.
- the residual structure uses a "short-circuit connection" method based on the stacked structure of multiple convolutional layers to sum the input and output, so that the preference perception model can
- the network learns the residual features of the video data.
- Video feature learning of the preference-aware model is easier than directly learning the original features of the video data, which improves the training efficiency of the preference-aware model.
- the preference-aware model first passes a convolution layer (including the convolution function conv + batch normalization function bn + activation function relu for a series of sample video data input ) Mining the shallow features of each sample video frame of the sample video data (sample video frames that can be extracted from the sample video data for feature extraction at set time intervals), and pooling through the maximum pooling layer (max pool) operate. Then, through several stacked residual structures (residual structures 0 ⁇ N), deeper features of the sample video frames are extracted, and a 512-dimensional feature vector is obtained after performing a pooling operation through the average pooling layer (avgpool).
- a convolution layer including the convolution function conv + batch normalization function bn + activation function relu for a series of sample video data input .
- the convolutional neural network of the preference perception model extracts semantic features in the two-dimensional space of the video.
- the feature vector will be calculated in the mean calculation layer (mean by frame) in the frame dimension. Average calculation, the calculated average will be used as the feature vector of the entire video clip corresponding to the sample video data.
- the feature vector will be mapped through the fully connected layer (FC layer) into a vector with a length of (M+1) ⁇ (N+1), which corresponds to (M+1) third video code rates and (N+1)
- FC layer fully connected layer
- the feature vectors are rearranged and mapped (softmax operation) to obtain the prediction Measured video quality distribution table.
- the preference-aware model when training the preference-aware model in this solution, uses the KL divergence loss function as the model loss function, and optimizes the preference-aware model based on the stochastic gradient descent algorithm.
- this scheme uses KL divergence to measure the difference between the predicted distribution and the true distribution.
- the preference-aware model's prediction of the video quality distribution table is closer to the true value. For example, assume that the video quality distribution predicted by the preference-aware model network under the third video bit rate B i Real video quality distribution Then the KL divergence is:
- the total loss of the preference-aware model is:
- q i is the predicted preference probability corresponding to the i-th third video resolution corresponding to the third video bit rate B i
- p i is the real preference probability corresponding to the i-th video resolution corresponding to the real video bit rate.
- S103 Update the code table based on the video quality distribution table.
- the code table records the combination of the first video bit rate and the first video resolution of different video gears.
- the code table is updated according to the video quality distribution table to compare the first video bit rate and the first video resolution of different video gears recorded in the code table.
- the combination is updated.
- the preference perception model predicts the optimal video resolution corresponding to the video bit rate of each video gear. If the video resolution is different from the first video resolution in the original code table, replace the first video in the original code table.
- the first video resolution corresponding to the code rate restart the encoder and use the new first video resolution for transcoding; otherwise, the original code table will still be used and returned, waiting for the next time the source stream video data is obtained and output by the preference awareness model Video quality distribution table, once again determine whether the code table needs to be updated.
- the preference perception model analyzes the source stream video data to obtain a video quality distribution table, which can be based on the different second video bit rates and second video resolutions recorded in the video quality distribution table.
- the predicted preference probability corresponding to the combination of rates is updated, and the combination of the first video bit rate and the first video resolution of different video gears recorded in the code table is updated to obtain a combination of bit rate and resolution that is more suitable for the current source stream video.
- the data code table improves the flexibility of code table determination and effectively ensures the user's video playback experience.
- the preference perception model is trained through perception data sets corresponding to sample video data of different content types, and a content-adaptive preference perception model is obtained through training, which can flexibly output corresponding video quality distribution tables for different video contents and accurately predict Users' subjective feelings about videos of different content types, the code table update is more in line with users' actual experience.
- FIG. 4 shows a flow chart of another code table update method provided by the embodiment of the present application.
- This code table update method is a specific embodiment of the above code table update method.
- the code table update method includes:
- S202 Input the source stream video data into the trained preference perception model.
- the preference perception model analyzes the source stream video data and outputs a video quality distribution table.
- the video quality distribution table records different second video bit rates and second video resolutions. The predicted preference probability corresponding to the combination of rates.
- S203 Traverse each video gear in the code table, and determine the code rate row corresponding to each video gear in the code table in the video quality distribution table.
- the code rate row includes different second video resolutions under the second video code rate. The corresponding predicted preference probability is below.
- the code table provided by this solution records the combination of the first video bit rate and the first video resolution of different video gears. For example, after obtaining the video quality distribution table output by the preference perception model, traverse each video gear in the current code table, and determine the video quality distribution table in each video gear in the current code table. The code rate line corresponding to the first code rate.
- the code rate row records the predicted preference probabilities corresponding to different second video resolutions under the second video code rate in the video quality distribution table, and the corresponding predicted preference probabilities under all second video resolutions in the same code rate row.
- the sum of predicted preference probabilities is a set value (such as 1 or 100%).
- the determination of the code rate row corresponding to each video gear in the code table may be to determine the second video code rate in the code table that is consistent with the first video code rate, or to determine the code rate that is consistent with the first video code rate.
- the second video bitrate with the closest rate is shown in the schematic diagram of a code rate row determination process provided in Figure 5. This solution determines the bit rate row corresponding to each video gear in the code table in the video quality distribution table, including:
- S2031 Determine whether there is a second video bit rate in the video quality distribution table that is consistent with the first video bit rate in the code table.
- the preference perception model After obtaining the video quality distribution table output by the preference perception model, traverse the first video bit rate corresponding to each video gear in the current code table, and determine whether each video bit rate is consistent with it in the video quality distribution table.
- the second video bitrate After obtaining the video quality distribution table output by the preference perception model, traverse the first video bit rate corresponding to each video gear in the current code table, and determine whether each video bit rate is consistent with it in the video quality distribution table. The second video bitrate.
- the bit rate row corresponding to the second video bit rate is directly determined to be the corresponding bit rate of the video gear.
- Code rate line If there is no second video bit rate that is consistent with the first video bit rate in the video quality distribution table, then the bit rate row of the second video bit rate that is closest to the first video bit rate corresponding to the video gear is determined. It is the code rate line corresponding to the video gear.
- the first video code rates of the different video gears in the code table are calculated according to the predicted preference probabilities corresponding to each second video resolution in the code rate row. and the first video resolution combination is updated. For example, the second video resolution corresponding to the maximum predicted preference probability in the code rate row is replaced with the corresponding first video resolution in the code table, so that the first video code rate and the first video resolution corresponding to the video gear in the code table are The combination is more in line with the user's video viewing experience.
- the first video resolution can be updated based on the dominant predicted preference probability in the code rate row to obtain a first video resolution that is more in line with the user's subjective experience.
- this solution includes: for different video gears in the code table, determine the corresponding Whether there is a second video resolution in the code rate line that meets the encoding parameter update condition; in response to whether there is a second video resolution that meets the encoding parameter update condition The second video resolution of the condition is used to update the first video resolution of the corresponding video gear in the code table using the second video resolution.
- the code rate row For each video gear in the current code table, it is determined in the corresponding code rate row whether there is a second video resolution that meets the encoding parameter update condition. Among them, whether there is a second video resolution in the code rate row that meets the encoding parameter update condition can be determined based on the maximum predicted preference probability in the code rate row, that is, determined from the predicted preference probability of each second video resolution in the code rate row. Maximum predicted preference probability, and determine whether the maximum predicted preference probability reaches the set probability threshold. When the maximum predicted preference probability reaches the set probability threshold, it can be determined that the code rate line has a second video resolution that meets the coding parameter update condition.
- This solution controls the degree of conservatism of the code table update algorithm by setting a probability threshold, making it more flexible and adaptable to a variety of different application scenarios, with greater flexibility.
- the second video resolution corresponding to the maximum predicted high probability in the code rate row is used to update the first video resolution corresponding to the video gear in the code table.
- Rate when using the second video resolution corresponding to the maximum predicted high probability in the code rate row to update the first video resolution corresponding to the video gear in the code table, first determine the second video corresponding to the maximum predicted high probability Whether the resolution is consistent with the first video resolution of the corresponding video gear. If it is consistent, there is no need to modify the first video resolution. If it is inconsistent, the first video resolution of the corresponding video gear is modified to the maximum predicted high value. The second video resolution corresponding to the probability.
- the preference perception model analyzes the source stream video data to obtain a video quality distribution table, which can be based on the different second video bit rates and second video resolutions recorded in the video quality distribution table.
- the predicted preference probability corresponding to the combination of rates is updated, and the combination of the first video bit rate and the first video resolution of different video gears recorded in the code table is updated to obtain a combination of bit rate and resolution that is more suitable for the current source stream video.
- the data code table improves the flexibility of code table determination and effectively ensures the user's video playback experience.
- the code rate row corresponding to each first video code rate in the code table is flexibly determined, and the first video resolution is dynamically updated according to the code rate row corresponding to each first video code rate in the code table, more precisely.
- the combination of the first video bit rate and the first video resolution corresponding to each video gear is dynamically determined, which improves the flexibility of video transcoding and the subjective quality of the video.
- the comparison between the rate and the set probability threshold determines the second video resolution that meets the encoding parameter update conditions, realizes the optimal encoding resolution decision, and optimizes the resolution parameters in the live transcoding code table.
- Figure 6 is a schematic structural diagram of a code table updating device provided by an embodiment of the present application.
- the code table updating device includes a data acquisition module 61 , a data analysis module 62 and a code table updating module 63 .
- the data acquisition module 61 is configured to acquire source stream video data;
- the data analysis module 62 is configured to input the source stream video data into a trained preference perception model, and the preference perception model analyzes the source stream video data and outputs a video quality distribution table , the video quality distribution table records predicted preference probabilities corresponding to different combinations of second video bit rates and second video resolutions;
- the code table update module 63 is configured to update the code table based on the video quality distribution table, and the code table records have different The combination of the first video bit rate and the first video resolution of the video file.
- the preference perception model analyzes the source stream video data to obtain a video quality distribution table, which can be based on the different second video bit rates and second video resolutions recorded in the video quality distribution table.
- the predicted preference probability corresponding to the combination of rates is updated, and the combination of the first video bit rate and the first video resolution of different video gears recorded in the code table is updated to obtain a combination of bit rate and resolution that is more suitable for the current source stream video.
- the data code table improves the flexibility of code table determination and effectively ensures the user's video playback experience.
- the code table update module 63 is specifically configured as:
- the code rate row includes the corresponding second video resolution under the second video code rate. predicted preference probability
- the combination of the first video bit rate and the first video resolution of different video gears in the code table is updated based on the corresponding bit rate row.
- the code table update module 63 determines the code rate row corresponding to each video gear in the code table in the video quality distribution table, it is configured as follows:
- the bit rate of the second video bit rate that is closest to the first video bit rate corresponding to the video gear is rowed. Determine the bit rate line corresponding to the video file.
- the code table update module 63 when updating the combination of the first video bit rate and the first video resolution of different video gears in the code table based on the corresponding bit rate row, is configured as follows:
- the second video resolution is used to update the first video resolution corresponding to the video gear in the code table.
- the preference perception model is trained based on perception data sets corresponding to sample video data of different content types.
- the perception data set records corresponding sample video data at different third video bit rates and third video resolutions.
- the preference-aware model is built based on the residual structure.
- the preference-aware model uses the KL divergence loss function as the model loss function, and optimizes the preference-aware model based on the stochastic gradient descent algorithm.
- the preference-aware model The fully connected layer maps the video feature vectors of the sample video data according to the set size, and the set size is determined based on the size of the distribution table corresponding to the sample video data in different combinations of third video bit rates and third video resolutions.
- the embodiment of the present application also provides a code table updating device, which can integrate the code table updating device provided by the embodiment of the present application.
- Figure 7 is a schematic structural diagram of a code table updating device provided by an embodiment of the present application.
- the code table updating device includes: an input device 73, an output device 74, a memory 72 and one or more processors 71; a memory 72 for storing one or more programs; when one or more programs are Or multiple processors 71 execute, so that one or more processors 71 implement the code table update method provided in the above embodiment.
- the code table updating device, equipment and computer provided above can be used to execute the code table updating method provided in any of the above embodiments, and have corresponding functions and beneficial effects.
- Embodiments of the present application also provide a storage medium that stores computer-executable instructions. When executed by a computer processor, the computer-executable instructions are used to perform the code table updating method provided in the above embodiments.
- the embodiments of the present application provide a storage medium that stores computer-executable instructions.
- the computer-executable instructions are not limited to the code table update method provided above, and can also execute the code table update method provided by any embodiment of the present application. related operations.
- the code table updating device, equipment and storage medium provided in the above embodiments can execute the code table updating method provided in any embodiment of this application.
- Provided code table update method for technical details that are not described in detail in the above embodiments, please refer to any embodiments of this application. Provided code table update method.
- various aspects of the method provided by the present disclosure can also be implemented in the form of a program product, which includes program code.
- program code is used to cause the above computer to The device performs the steps in the methods described above in this specification according to various exemplary embodiments of the present disclosure.
- the computer device can perform the code table updating method described in the embodiments of the present disclosure.
- the program product can use any combination of one or more readable media.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Networks & Wireless Communication (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
本申请实施例提供了一种码表更新方法、装置、设备及存储介质。本申请实施例提供的技术方案通过将源流视频数据输入到偏好感知模型中,由偏好感知模型对源流视频数据进行分析得到视频质量分布表,可根据视频质量分布表中记录的不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率,对码表中记录的不同视频档位的第一视频码率和第一视频分辨率的组合进行更新,得到码率和分辨率的组合更适配当前源流视频数据的码表,提高码表确定的灵活性,有效保证用户视频播放体验。
Description
本申请要求在2022年07月19日提交中国专利局,申请号为202210855856.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本申请实施例涉及视频处理技术领域,尤其涉及一种码表更新方法、装置、设备及存储介质。
随着信息技术的飞速发展,直播、点播、短视频等视频应用已经深入大众的生活。由于不同的视频源可能采用不同的编码标准,而视频服务商的用户群体又非常庞大,不同的用户拥有不同的终端(例如手机、机顶盒等)处理能力以及不同的网络带宽状况,因此视频服务厂商往往会对视频源进行转码以满足不同用户群体的需求。
视频转码是指将已经压缩编码的视频流按照另一种(或多种)编码格式重新压缩的过程,转码会涉及视频分辨率、码率、帧率、帧结构等参数的改变,使得视频能够满足不同用户群体的要求。目前传统的视频转码通常采用“一转多”的方式,即按照一套固定的码表将源视频转码为一系列不同码率、分辨率的视频,然后根据用户的终端设备和网络情况下发对应档位的视频。固定码表能够保证大多数视频内容在码率受限的情况下获得较好的质量,但是固定码表方案的灵活性较低,容易导致码率和分辨率不适配,影响用户视频播放体验的情况。
发明内容
本申请实施例提供一种码表更新方法、装置、设备及存储介质,以解决相关技术中固定码表方案的灵活性较低,导致码率和分辨率不适配,影响用户视频播放体验的技术问题,提高码表确定的灵活性,提高用户视频播放体验。
在第一方面,本申请实施例提供了一种码表更新方法,包括:
获取源流视频数据;
将所述源流视频数据输入至训练好的偏好感知模型,由所述偏好感知模型对所述源流视频数据进行分析并输出视频质量分布表,所述视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率;
基于所述视频质量分布表更新码表,所述码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。
在第二方面,本申请实施例提供了一种码表更新装置,包括数据获取模块、数据分析模块和码表更新模块,其中:
所述数据获取模块,配置为获取源流视频数据;
所述数据分析模块,配置为将所述源流视频数据输入至训练好的偏好感知模型,由所述偏好感知模型对所述源流视频数据进行分析并输出视频质量分布表,所述视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率;
所述码表更新模块,配置为基于所述视频质量分布表更新码表,码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。
在第三方面,本申请实施例提供了一种码表更新设备,包括:存储器以及一个或多个处理器;
所述存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的码表更新方法。
在第四方面,本申请实施例提供了一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如第一方面所述的码表更新方法。
在第五方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中,设备的至少一个处理器从计算机可读存储介质读取并执行计算机程序,使得设备执行如第一方面所述的码表更新方法。
本申请实施例通过将源流视频数据输入到偏好感知模型中,由偏好感知模型对源流视频数据进行分析得到视频质量分布表,可根据视频质量分布表中记录的不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率,对码表
中记录的不同视频档位的第一视频码率和第一视频分辨率的组合进行更新,得到码率和分辨率的组合更适配当前源流视频数据的码表,提高码表确定的灵活性,有效保证用户视频播放体验。
图1是本申请实施例提供的一种码表更新方法的流程图;
图2是本申请实施例提供的一种残差结构示意图;
图3是本申请实施例提供的一种偏好感知模型的网络结构示意图;
图4是本申请实施例提供的另一种码表更新方法的流程图;
图5是本申请实施例提供的一种码率行确定流程示意图;
图6是本申请实施例提供的一种码表更新装置的结构示意图;
图7是本申请实施例提供的一种码表更新设备的结构示意图。
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图对本申请具体实施例作进一步的详细描述。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时上述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。上述处理可以对应于方法、函数、规程、子例程、子程序等等。
图1给出了本申请实施例提供的一种码表更新方法的流程图,本申请实施例提供的码表更新方法可以由码表更新装置来执行,该码表更新装置可以通过硬件和/或软件的方式实现,并集成在码表更新设备(例如编码服务器)中。
下述以码表更新装置执行码表更新方法为例进行描述。参考图1,该码表更新方法包括:
S101:获取源流视频数据。
其中,源流视频数据为转码前的视频数据,在按照码表对源流视频数据进行视频转码(Video Transcoding)后将得到多个视频档位(不同的视频档位对应不同的视频码率和视频分辨率的组合)的转码视频数据。其中。原始的码表可由编解码服务器(CS,codec server)根据用户所在的区域进行下发得到。
视频转码可理解为将已经压缩编码的视频流按照另一种(或多种)编码格式重新压缩的过程,例如将H.264格式的码流转码为HEVC格式。视频转码还涉及视频码率和视频分辨率等参数的改变,使得视频能够满足不同的视频播放要求。例如降低源流视频的视频分辨率,将高清视频转码为低清视频,以适应低端手机的解码能力,或者是降低源流视频的码率,减小视频流的体积,以适应网络带宽受限的传输场景。
在相关技术中,对源流视频数据进行视频转码是基于固定的码表进行的,即将源流视频数据按照码表中固定的码率和分辨率的组合将源流视频转码成多个固定视频档位的转码视频数据,然后根据用户的终端设备和网络情况来决定分发给用户的视频档位。固定码表往往是由开发人员凭借经验制定的,可保证大多数视频内容在码率受限的情况下获得较好的视频质量。但是固定码表受应用场景的限制较大,基于固定码表转码得到的视频数据并不能满足不同用户的视频观看需求。例如对于带有严重颗粒噪声的高清视频,使用较高的码率(例如5800k)转码得到的视频依然会存在编码块效应,并不能实现较好的观看体验。而对于简单的视频(例如卡通动画),不需要较高的码率即可编码一个质量极高的视频(例如1080P的视频)了,而是用较高的码率会造成较大的码率浪费。在相关技术中,还存在通过穷举的方法获取一系列既定分辨率下的率失真曲线(RD曲线),通过多个率失真曲线的包络线得到最佳码表的per-title编码方案,但是这种编码方案对于实时性要求高的服务(例如直播转码)来说,由于高实时性要求的服务对时效要求较高,无法使用穷举法得到码表,可操作性差,并且这种编码方案使用的是峰值信噪比(PSNR)作为衡量视频质量的指标,然而峰值信噪比并不能准确反映用户的主观质量感受,难以保证用户的视频观看体验。
为了解决上述技术问题,本方案提供视频内容相关的偏好感知模型,对码表进行动态更新,减少了固定码表方案下固定视频码率和视频分辨率的模式,更符合视频内容和用户感受,保证用户视频观看体验。同时,使用深度学习的偏好感知模型来对不同视频码率下的最优视频分辨率进行预测,可以实时更新
码表,不需要使用穷举法进行多次编码,可操作性更高。并且基于用户的主观视频质量感受预测对应视频内容下不同视频码率对视频分辨率的偏好,码表的更新更符合用户的实际观看体验。
示例性的,编解码服务器将初始的码表提供给视频服务器VS(video server),转码服务器中的码表更新装置从视频服务器获取码表,并按照码表实时对终端设备(例如视频直播场景中的主播端)提供的源流视频数据进行视频转码得到多个视频档位(例如超清、高清、全高清)的转码视频数据。码表更新装置按照设定周期获取源流视频数据(满足分辨率要求的源流视频数据,例如达到720p的源流视频数据),获取的源流视频数据将提供给偏好感知模型进行分析以确定各个视频档位最优的视频分辨率。本方案按照设定周期获取源流视频数据进行码表的更新,保证在高实时性要求场景下,在用户切换场景(例如视频直播切换场景)时能及时动态更新码表,保证用户的观看体验。
S102:将源流视频数据输入至训练好的偏好感知模型,由偏好感知模型对源流视频数据进行分析并输出视频质量分布表,视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率。
本方案提供的偏好感知模型用于对输入的视频数据进行分析并输出对应的视频质量分布表,视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率,其中预测偏好概率可理解为用户对输入的视频数据在不同视频码率和视频分辨率的组合下的偏好概率的预测值。
偏好感知模型可通过卷积神经网络提取视频特征,并通过收集好的样本数据(例如不同用户在不同视频内容下,对不同视频码率和视频分辨率的组合的转码视频的偏好概率)对偏好感知模型进行训练,并将训练好的偏好感知模型配置在码表更新装置中。
示例性的,在按设定周期获取源流视频数据后,将源流视频数据输入到训练好的偏好感知模型中。偏好感知模型在接收到源流视频数据后,对源流视频数据进行数据分析并输出视频质量分布表,该视频质量分布表反映了当前内容类型的源流视频数据中,用户对不同第二视频码率和第二视频分辨率的组合对应的偏好概率的预测值。
在一个可能的实施例中,本方案提供的偏好感知模型基于不同内容类型的样本视频数据对应的感知数据集进行训练得到,即以感知数据集中的样本视频数据为输入,对应的视频质量分布(样本偏好概率分布)作为输出对偏好感知
模型进行训练。其中感知数据集基于人眼对不同内容类型的视频在分辨率维度的偏好程度的分布律来构建,并且感知数据集记录有对应样本视频数据在不同第三视频码率和第三视频分辨率的组合对应的样本偏好概率。样本视频数据的内容类型可以是室内直播、室外直播、游戏直播、录屏文字、动画、录屏文字、无意义视频(例如静止场景、色调单一、缺乏细节的视频)等。
示例性的,定义一系列视频码率梯度B和视频分辨率梯度R,即B={B0,B1,…,BM},R={R0,R1,…,RN}。获取多个内容类型的视频源集V(例如直播内容视频源),对视频源集V中的每一个视频源,采用视频码率Bi∈B,视频分辨率Rj∈R的编码参数进行视频转码,遍历所有可能的参数组{Bi,Rj}得到转码视频集VT。并让多个测试用户(例如普通视频用户、非专业人士)基于刺激-比较法(SCACJ,Stimulus ComparisonAdjectival Categorical Judgement)对转码视频集VT中的转码视频进行主观质量评价,让测试用户从每个参数组{Bi,R0~N}中挑选质量最好的一个(或多个)转码视频,并记录其对应的视频分辨率。示例性的,在第三视频码率梯度B={200kbps,300kbps,…,1800kbps},第三视频分辨率梯度为R={360p,480p,540p,720p}情况下,样本偏好概率分布如下表所示:
最后汇总所有测试用户的评价结果,计算特定视频码率Bi下各个视频分辨率
被选择的偏好频率从而确定测试用户对该组视频源的主观质量感知分布。其中偏好频率越大(越接近1),就代表测试用户越偏好该视频分辨率的转码视频。因此对于任意的视频源Vk∈V,最终感知数据集中视频质量分布(样本偏好概率分布)的标签为可以理解的是,对于不同内容类型的转码视频,用户在不同码率下对不同分辨率的主观感受也是不同的。例如对于室内直播,随着视频码率由低到高,用户的偏好的视频档位也从低清档变为高清档;对于无意义直播,用户在多种视频码率下对视频档位的偏好并不明显;而对于有大量文字的录屏直播,用户更偏向于高清档的视频档位。
在一个可能的实施例中,本方案提供的偏好感知模型基于残差结构进行搭建得到,通过残差结构可提取视频数据更深层次的特征,对不同第三视频码率和第三视频分辨率的组合下的偏好概率的预测更准确。并且本方案提供的偏好感知模型的全连接层按照设定尺寸将样本视频数据的视频特征向量进行映射,设定尺寸基于样本视频数据在不同第三视频码率和第三视频分辨率的组合对应的分布表尺寸进行确定。如图2提供的一种残差结构示意图所示,残差结构在多个卷积层堆叠结构的基础上,采用“短路连接”的方式,将输入和输出进行求和,使得偏好感知模型在训练过程中的网络学习的是视频数据的残差特征,偏好感知模型的视频特征学习相比于直接学习视频数据的原始特征更加容易,提高偏好感知模型的训练效率。如图3提供的一种偏好感知模型的网络结构示意图所示,偏好感知模型对于输入的一系列样本视频数据,先经过一个卷积层(包括卷积函数conv+批量归一化函数bn+激活函数relu)挖掘样本视频数据每个样本视频帧(可按照设定时间间隔从样本视频数据中抽取进行特征提取的样本视频数帧)的浅层特征,并通过最大池化层(max pool)进行池化操作。然后经过若干个堆叠的残差结构(残差结构0~N),提取样本视频帧更深层次的特征,经过平均池化层(avgpool)进行池化操作后得到512维的特征向量。至此,偏好感知模型的卷积神经网络提取的是视频二维空间上的语义特征,为了进一步挖掘视频时间域的特征,特征向量会在均值计算层(mean by frame)中进行在帧维度上的平均值计算,计算得到的均值将作为样本视频数据对应的整个视频片段的特征向量。最终,特征向量会经过全连接层(FC层)映射为长度为(M+1)×(N+1)的向量,恰好对应(M+1)种第三视频码率和(N+1)种第三视频分辨率的组合,对特征向量进行重新排列和映射操作(softmax操作)后即得到预
测的视频质量分布表。
在一个可能的实施例中,本方案在对偏好感知模型进行训练时,偏好感知模型利用KL散度损失函数作为模型损失函数,并基于随机梯度下降算法对偏好感知模型进行优化。与通用的分类网络使用交叉熵作为损失函数相比,本方案采用KL散度来衡量预测分布和真实分布之间的差异,偏好感知模型对视频质量分布表的预测更接近真实值。例如,假设第三视频码率Bi下偏好感知模型网络预测的视频质量分布真实的视频质量分布则KL散度为:
对应的,偏好感知模型的总损失为:
其中qi为第三视频码率Bi对应的第i个第三视频分辨率对应的预测偏好概率,pi为真实的视频码率对应的第i个视频分辨率对应的真实偏好概率。在偏好感知模型的总损失在设定的损失阈值时,认为偏好感知模型训练完成,并将偏好感知模型配置到码表更新装置中。
S103:基于视频质量分布表更新码表,码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。
示例性的,在确定源流视频数据对应的视频质量分布表后,根据视频质量分布表更新码表,以对码表中记录的不同视频档位的第一视频码率和第一视频分辨率的组合进行更新。
在一个可能的实施例中,在更新码表后,若码表中第一视频码率和第一视频分辨率的组合与之前码表中的第一视频码率和第一视频分辨率的组合发生了变化,则重启转码服务器中的编码器,并利用更新后的码表进行视频转码。即由偏好感知模型预测出各视频档位对应视频码率的最优视频分辨率,如果该视频分辨率与原码表中的第一视频分辨率不同,则替换掉原码表中第一视频码率对应的第一视频分辨率,重启编码器并采用新的第一视频分辨率进行转码;否则仍采用原码表并返回,等候下一次获取源流视频数据并由偏好感知模型输出
视频质量分布表,再次判断是否需要更新码表。
上述,通过将源流视频数据输入到偏好感知模型中,由偏好感知模型对源流视频数据进行分析得到视频质量分布表,可根据视频质量分布表中记录的不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率,对码表中记录的不同视频档位的第一视频码率和第一视频分辨率的组合进行更新,得到码率和分辨率的组合更适配当前源流视频数据的码表,提高码表确定的灵活性,有效保证用户视频播放体验。同时,通过不同内容类型的样本视频数据对应的感知数据集对偏好感知模型进行训练,训练得到内容自适应的偏好感知模型,可灵活地对不同的视频内容输出对应的视频质量分布表,准确预测用户对不同内容类型的视频的主观感受,码表的更新更加贴合用户的实际体验。
在上述实施例的基础上,图4给出了本申请实施例提供的另一种码表更新方法的流程图,该码表更新方法是对上述码表更新方法的具体化。参考图4,该码表更新方法包括:
S201:获取源流视频数据。
S202:将源流视频数据输入至训练好的偏好感知模型,由偏好感知模型对源流视频数据进行分析并输出视频质量分布表,视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率。
S203:遍历码表中各个视频档位,在视频质量分布表中确定与码表中各个视频档位对应的码率行,码率行包括在第二视频码率下的不同第二视频分辨率下对应的预测偏好概率。
其中,本方案提供的码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。示例性的,在获取偏好感知模型输出的视频质量分布表后,遍历当前的码表中的各个视频档位,并在上述视频质量分布表中确定当前的码表中的各个视频档位中的第一码率所对应的码率行。
其中码率行记录有在视频质量分布表中,对应第二视频码率下的不同第二视频分辨率下对应的预测偏好概率,并且在同一码率行的所有第二视频分辨率下对应的预测偏好概率的和为设定值(例如1或100%)。
在一个可能的实施例中,与码表中各个视频档位对应的码率行的确定可以是确定码表中与第一视频码率一致的第二视频码率,或者与与第一视频码率最接近的第二视频码率。基于此,如图5提供的一种码率行确定流程示意图所示,
本方案在视频质量分布表中确定与码表中各个视频档位对应的码率行时,包括:
S2031:确定视频质量分布表中是否存在码表中的第一视频码率一致的第二视频码率。
S2032:在视频质量分布表中存在码表中的第一视频码率一致的第二视频码率时,将第二视频码率的码率行确定为视频档位对应的码率行。
S2033:在视频质量分布表中不存在码表中的第一视频码率一致的第二视频码率时,将与视频档位对应的第一视频码率最接近的第二视频码率的码率行确定为视频档位对应的码率行。
示例性的,在获取偏好感知模型输出的视频质量分布表后,遍历当前码表中各个视频档位对应的第一视频码率,确定每个视频码率在视频质量分布表中是否存在与其一致的第二视频码率。
对于一个视频档位,若在视频质量分布表中存在与第一视频码率一致的第二视频码率,则直接将该第二视频码率对应的码率行确定为该视频档位对应的码率行。若在视频质量分布表中不存在与第一视频码率一致的第二视频码率,则将与该视频档位对应的第一视频码率最接近的第二视频码率的码率行确定为视频档位对应的码率行。即遍历当前码表中的每个视频档位下的第一视频码率和第一视频分辨率的集合{Bold,Rold};选取视频质量分布表中对应的码率行若Bold∈B,则若则取B中与Bold最接近的码率行,
S204:基于对应的码率行更新码表中不同视频档位的第一视频码率和第一视频分辨率的组合。
示例性的,在确定码表中不同视频档位对应的码率行后,根据码率行中各个第二视频分辨率对应的预测偏好概率对码表中不同视频档位的第一视频码率和第一视频分辨率的组合进行更新。例如将码率行中最大预测偏好概率对应的第二视频分辨率替换掉码表中对应的第一视频分辨率,使得码表中对应视频档位的第一视频码率和第一视频分辨率的组合更符合用户的视频观看感受。
在一个可能的实施例中,可基于码率行中占主导地位的预测偏好概率对第一视频分辨率进行更新,得到更符合用户主观感受的第一视频分辨率。基于此,本方案在基于对应的码率行更新码表中不同视频档位的第一视频码率和第一视频分辨率的组合时,包括:对于码表中不同视频档位,确定对应的码率行是否存在符合编码参数更新条件的第二视频分辨率;响应于存在符合编码参数更新
条件的第二视频分辨率,利用第二视频分辨率更新码表中对应视频档位的第一视频分辨率。
示例性的,对于当前码表中的各个视频档位,在对应的码率行中确定是否存在符合编码参数更新条件的第二视频分辨率。其中,码率行中是否存在符合编码参数更新条件的第二视频分辨率可基于码率行中的最大预测偏好概率进行判断,即在码率行各个第二视频分辨率的预测偏好概率中确定最大预测偏好概率,并确定最大预测偏好概率是否达到设定概率阈值。在最大预测偏好概率达到设定概率阈值时,可确定该码率行存在符合编码参数更新条件的第二视频分辨率。本方案通过设定概率阈值控制码表更新算法的保守程度,更灵活的适应多种不同的应用场景,灵活性更强。
在码率行中存在符合编码参数更新条件的第二视频分辨率时,利用该码率行中最大预测偏高概率对应的第二视频分辨率更新码表中对应视频档位的第一视频分辨率。其中,在,利用该码率行中最大预测偏高概率对应的第二视频分辨率更新码表中对应视频档位的第一视频分辨率时,先判断最大预测偏高概率对应的第二视频分辨率是否与对应视频档位的第一视频分辨率一致,若一致则不需要对第一视频分辨率进行修改,若不一致则将对应视频档位的第一视频分辨率修改为最大预测偏高概率对应的第二视频分辨率。而在码率行中不存在符合编码参数更新条件的第二视频分辨率时,不需要对该视频档位的第一视频码率和第一视频分辨率进行修改。在完成对码表全部视频档位的更新后,若码表中存在修改的第一视频分辨率,则重启编码器,基于更新后的码表对源流视频数据进行转码,若码表中的第一视频分辨率均未修改,则保持当前的编码器和码表,等待下一次获取源流视频数据分析是否需要更新码表。
上述,通过将源流视频数据输入到偏好感知模型中,由偏好感知模型对源流视频数据进行分析得到视频质量分布表,可根据视频质量分布表中记录的不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率,对码表中记录的不同视频档位的第一视频码率和第一视频分辨率的组合进行更新,得到码率和分辨率的组合更适配当前源流视频数据的码表,提高码表确定的灵活性,有效保证用户视频播放体验。同时,灵活地确定码表中每个第一视频码率对应的码率行,根据码表中每个第一视频码率对应的码率行对第一视频分辨率进行动态更新,更精细地动态确定每个视频档位对应的第一视频码率和第一视频分辨率的组合,提高了视频转码的灵活度和视频主观质量。并基于最大预测偏好概
率与设定概率阈值的比较情况确定符合编码参数更新条件的第二视频分辨率,实现最优的编码分辨率决策,对直播转码码表中的分辨率参数进行优化,在带宽不变的前提下提高线上视频的主观质量,为用户带来更好的视频观看体验。
图6是本申请实施例提供的一种码表更新装置的结构示意图。参考图6,该码表更新装置包括数据获取模块61、数据分析模块62和码表更新模块63。
其中,数据获取模块61,配置为获取源流视频数据;数据分析模块62,配置为将源流视频数据输入至训练好的偏好感知模型,由偏好感知模型对源流视频数据进行分析并输出视频质量分布表,视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率;码表更新模块63,配置为基于视频质量分布表更新码表,码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。
上述,通过将源流视频数据输入到偏好感知模型中,由偏好感知模型对源流视频数据进行分析得到视频质量分布表,可根据视频质量分布表中记录的不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率,对码表中记录的不同视频档位的第一视频码率和第一视频分辨率的组合进行更新,得到码率和分辨率的组合更适配当前源流视频数据的码表,提高码表确定的灵活性,有效保证用户视频播放体验。
在一个可能的实施例中,码表更新模块63具体配置为:
遍历码表中各个视频档位,在视频质量分布表中确定与码表中各个视频档位对应的码率行,码率行包括在第二视频码率下的不同第二视频分辨率下对应的预测偏好概率;
基于对应的码率行更新码表中不同视频档位的第一视频码率和第一视频分辨率的组合。
在一个可能的实施例中,码表更新模块63在视频质量分布表中确定与码表中各个视频档位对应的码率行时,配置为:
在视频质量分布表中存在码表中的第一视频码率一致的第二视频码率时,将第二视频码率的码率行确定为视频档位对应的码率行;
在视频质量分布表中不存在码表中的第一视频码率一致的第二视频码率时,将与视频档位对应的第一视频码率最接近的第二视频码率的码率行确定为视频档位对应的码率行。
在一个可能的实施例中,码表更新模块63在基于对应的码率行更新码表中不同视频档位的第一视频码率和第一视频分辨率的组合时,配置为:
对于码表中不同视频档位,确定对应的码率行是否存在符合编码参数更新条件的第二视频分辨率;
响应于存在符合编码参数更新条件的第二视频分辨率,利用第二视频分辨率更新码表中对应视频档位的第一视频分辨率。
在一个可能的实施例中,在码率行中的最大预测偏好概率达到设定概率阈值时,码率行存在符合编码参数更新条件的第二视频分辨率。
在一个可能的实施例中,偏好感知模型基于不同内容类型的样本视频数据对应的感知数据集进行训练得到,感知数据集记录有对应样本视频数据在不同第三视频码率和第三视频分辨率的组合对应的样本偏好概率。
在一个可能的实施例中,偏好感知模型基于残差结构进行搭建得到,偏好感知模型利用KL散度损失函数作为模型损失函数,并基于随机梯度下降算法对偏好感知模型进行优化,偏好感知模型的全连接层按照设定尺寸将样本视频数据的视频特征向量进行映射,设定尺寸基于样本视频数据在不同第三视频码率和第三视频分辨率的组合对应的分布表尺寸进行确定。
值得注意的是,上述码表更新装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请实施例的保护范围。
本申请实施例还提供了一种码表更新设备,该码表更新设备可集成本申请实施例提供的码表更新装置。图7是本申请实施例提供的一种码表更新设备的结构示意图。参考图7,该码表更新设备包括:输入装置73、输出装置74、存储器72以及一个或多个处理器71;存储器72,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器71执行,使得一个或多个处理器71实现如上述实施例提供的码表更新方法。上述提供的码表更新装置、设备和计算机可用于执行上述任意实施例提供的码表更新方法,具备相应的功能和有益效果。
本申请实施例还提供一种存储计算机可执行指令的存储介质,计算机可执行指令在由计算机处理器执行时用于执行如上述实施例提供的码表更新方法。
当然,本申请实施例所提供的一种存储计算机可执行指令的存储介质,其计算机可执行指令不限于如上提供的码表更新方法,还可以执行本申请任意实施例所提供的码表更新方法中的相关操作。上述实施例中提供的码表更新装置、设备及存储介质可执行本申请任意实施例所提供的码表更新方法,未在上述实施例中详尽描述的技术细节,可参见本申请任意实施例所提供的码表更新方法。
在一些可能的实施方式中,本公开提供的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当上述程序产品在计算机设备上运行时,程序代码用于使上述计算机设备执行本说明书上述描述的根据本公开各种示例性实施方式的方法中的步骤,例如,计算机设备可以执行本公开实施例所记载的码表更新方法。其中,程序产品可以采用一个或多个可读介质的任意组合。
Claims (11)
- 一种码表更新方法,其中,包括:获取源流视频数据;将所述源流视频数据输入至训练好的偏好感知模型,由所述偏好感知模型对所述源流视频数据进行分析并输出视频质量分布表,所述视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率;基于所述视频质量分布表更新码表,所述码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。
- 根据权利要求1所述的码表更新方法,其中,所述基于所述视频质量分布表更新码表,包括:遍历码表中各个视频档位,在所述视频质量分布表中确定与所述码表中各个视频档位对应的码率行,所述码率行包括在第二视频码率下的不同第二视频分辨率下对应的预测偏好概率;基于对应的所述码率行更新所述码表中不同视频档位的第一视频码率和第一视频分辨率的组合。
- 根据权利要求2所述的码表更新方法,其中,所述在所述视频质量分布表中确定与所述码表中各个视频档位对应的码率行,包括:在所述视频质量分布表中存在所述码表中的第一视频码率一致的第二视频码率时,将所述第二视频码率的码率行确定为所述视频档位对应的码率行;在所述视频质量分布表中不存在所述码表中的第一视频码率一致的第二视频码率时,将与所述视频档位对应的第一视频码率最接近的第二视频码率的码率行确定为所述视频档位对应的码率行。
- 根据权利要求2所述的码表更新方法,其中,所述基于对应的所述码率行更新所述码表中不同视频档位的第一视频码率和第一视频分辨率的组合,包括:对于所述码表中不同视频档位,确定对应的所述码率行是否存在符合编码参数更新条件的第二视频分辨率;响应于存在符合编码参数更新条件的第二视频分辨率,利用所述第二视频分辨率更新所述码表中对应视频档位的第一视频分辨率。
- 根据权利要求4所述的码表更新方法,其中,在所述码率行中的最大预测偏好概率达到设定概率阈值时,所述码率行存在符合编码参数更新条件的第二视频分辨率。
- 根据权利要求1所述的码表更新方法,其中,所述偏好感知模型基于不同 内容类型的样本视频数据对应的感知数据集进行训练得到,所述感知数据集记录有对应样本视频数据在不同第三视频码率和第三视频分辨率的组合对应的样本偏好概率。
- 根据权利要求6所述的码表更新方法,其中,所述偏好感知模型基于残差结构进行搭建得到,所述偏好感知模型利用KL散度损失函数作为模型损失函数,并基于随机梯度下降算法对所述偏好感知模型进行优化,所述偏好感知模型的全连接层按照设定尺寸将所述样本视频数据的视频特征向量进行映射,所述设定尺寸基于样本视频数据在不同第三视频码率和第三视频分辨率的组合对应的分布表尺寸进行确定。
- 一种码表更新装置,其中,包括数据获取模块、数据分析模块和码表更新模块,其中:所述数据获取模块,配置为获取源流视频数据;所述数据分析模块,配置为将所述源流视频数据输入至训练好的偏好感知模型,由所述偏好感知模型对所述源流视频数据进行分析并输出视频质量分布表,所述视频质量分布表中记录有不同第二视频码率和第二视频分辨率的组合对应的预测偏好概率;所述码表更新模块,配置为基于所述视频质量分布表更新码表,码表记录有不同视频档位的第一视频码率和第一视频分辨率的组合。
- 一种码表更新设备,其中,包括:存储器以及一个或多个处理器;所述存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7任一项所述的码表更新方法。
- 一种存储计算机可执行指令的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-7任一项所述的码表更新方法。
- 一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1-7任一项所述的码表更新方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210855856.1 | 2022-07-19 | ||
CN202210855856.1A CN115379291B (zh) | 2022-07-19 | 2022-07-19 | 一种码表更新方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024017106A1 true WO2024017106A1 (zh) | 2024-01-25 |
Family
ID=84062724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/106919 WO2024017106A1 (zh) | 2022-07-19 | 2023-07-12 | 一种码表更新方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115379291B (zh) |
WO (1) | WO2024017106A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115379291B (zh) * | 2022-07-19 | 2023-12-26 | 百果园技术(新加坡)有限公司 | 一种码表更新方法、装置、设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150036757A1 (en) * | 2013-07-31 | 2015-02-05 | Divx, Llc | Systems and methods for adaptively applying a deblocking filter |
CN109286825A (zh) * | 2018-12-14 | 2019-01-29 | 北京百度网讯科技有限公司 | 用于处理视频的方法和装置 |
CN109660807A (zh) * | 2017-10-10 | 2019-04-19 | 优酷网络技术(北京)有限公司 | 一种视频图像转码方法及装置 |
CN110719457A (zh) * | 2019-09-17 | 2020-01-21 | 北京达佳互联信息技术有限公司 | 一种视频编码方法、装置、电子设备及存储介质 |
CN111327865A (zh) * | 2019-11-05 | 2020-06-23 | 杭州海康威视系统技术有限公司 | 视频传输方法、装置及设备 |
CN115379291A (zh) * | 2022-07-19 | 2022-11-22 | 百果园技术(新加坡)有限公司 | 一种码表更新方法、装置、设备及存储介质 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101175210B (zh) * | 2006-10-30 | 2010-08-11 | 中国科学院计算技术研究所 | 用于视频预测残差系数解码的熵解码方法及熵解码装置 |
EP2429190A1 (en) * | 2010-09-13 | 2012-03-14 | NTT DoCoMo, Inc. | Method and apparatus for transferring a video stream |
EP2533538A3 (en) * | 2011-06-10 | 2013-03-20 | Research In Motion Limited | Method and system to reduce modelling overhead for data compression |
KR101549183B1 (ko) * | 2014-04-30 | 2015-09-02 | 서울대학교산학협력단 | 내용 기반 tv 프로그램 추천 시스템, 장치 및 방법 |
CN104093072B (zh) * | 2014-06-30 | 2017-06-16 | 京东方科技集团股份有限公司 | 一种视频信息播放系统和方法 |
US10516909B2 (en) * | 2016-07-09 | 2019-12-24 | N. Dilip Venkatraman | Method and system for recommending dynamic, adaptive and non-sequentially assembled videos |
CN110300003B (zh) * | 2018-03-21 | 2021-01-12 | 华为技术有限公司 | 数据处理方法以及客户端 |
CN110958446B (zh) * | 2018-09-27 | 2022-08-12 | 中兴通讯股份有限公司 | 视频业务质量评估方法、装置、设备、及可读存储介质 |
CN111107395B (zh) * | 2019-12-31 | 2021-08-03 | 广州市百果园网络科技有限公司 | 一种视频转码的方法、装置、服务器和存储介质 |
US20210360233A1 (en) * | 2020-05-12 | 2021-11-18 | Comcast Cable Communications, Llc | Artificial intelligence based optimal bit rate prediction for video coding |
CN111669627B (zh) * | 2020-06-30 | 2022-02-15 | 广州市百果园信息技术有限公司 | 一种视频码率的确定方法、装置、服务器和存储介质 |
-
2022
- 2022-07-19 CN CN202210855856.1A patent/CN115379291B/zh active Active
-
2023
- 2023-07-12 WO PCT/CN2023/106919 patent/WO2024017106A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150036757A1 (en) * | 2013-07-31 | 2015-02-05 | Divx, Llc | Systems and methods for adaptively applying a deblocking filter |
CN109660807A (zh) * | 2017-10-10 | 2019-04-19 | 优酷网络技术(北京)有限公司 | 一种视频图像转码方法及装置 |
CN109286825A (zh) * | 2018-12-14 | 2019-01-29 | 北京百度网讯科技有限公司 | 用于处理视频的方法和装置 |
CN110719457A (zh) * | 2019-09-17 | 2020-01-21 | 北京达佳互联信息技术有限公司 | 一种视频编码方法、装置、电子设备及存储介质 |
CN111327865A (zh) * | 2019-11-05 | 2020-06-23 | 杭州海康威视系统技术有限公司 | 视频传输方法、装置及设备 |
CN115379291A (zh) * | 2022-07-19 | 2022-11-22 | 百果园技术(新加坡)有限公司 | 一种码表更新方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115379291A (zh) | 2022-11-22 |
CN115379291B (zh) | 2023-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102235590B1 (ko) | 비디오를 처리하기 위한 방법 및 장치 | |
WO2023134523A1 (zh) | 内容自适应视频编码方法、装置、设备和存储介质 | |
WO2021135983A1 (zh) | 视频转码的方法、装置、服务器和存储介质 | |
US20220030244A1 (en) | Content adaptation for streaming | |
CN110072119B (zh) | 一种基于深度学习网络的内容感知视频自适应传输方法 | |
US9445136B2 (en) | Signaling characteristics of segments for network streaming of media data | |
CN111277826B (zh) | 一种视频数据处理方法、装置及存储介质 | |
CN113438501B (zh) | 视频压缩方法、装置、计算机设备和存储介质 | |
TW202207708A (zh) | 視訊處理裝置及視訊串流的處理方法 | |
CN113115067A (zh) | 直播系统、视频处理方法及相关装置 | |
US20200204804A1 (en) | Transcoding Media Content Using An Aggregated Quality Score | |
WO2024017106A1 (zh) | 一种码表更新方法、装置、设备及存储介质 | |
KR20170135069A (ko) | QoE 분석 기반 비디오 프레임 관리 방법 및 그 장치 | |
US20150117545A1 (en) | Layered Video Encoding and Decoding | |
CN114245209B (zh) | 视频分辨率确定、模型训练、视频编码方法及装置 | |
CN104918077B (zh) | 一种视频传输方法、装置及系统 | |
WO2023134524A1 (zh) | 一种视频编码配置方法、系统、设备及存储介质 | |
Micó-Enguídanos et al. | Per-title and per-segment CRF estimation using DNNs for quality-based video coding | |
CN113452996A (zh) | 一种视频编码、解码方法及装置 | |
CN111818338B (zh) | 一种异常显示检测方法、装置、设备及介质 | |
Qin et al. | Content adaptive downsampling for low bitrate video coding | |
CN116491115A (zh) | 用于视频编码的具有反馈控制的速率控制机器学习模型 | |
CN114025190A (zh) | 多码率调度方法和多码率调度装置 | |
CN114641793A (zh) | 图像提供设备及其图像提供方法和显示设备及其显示方法 | |
US20110235700A1 (en) | Apparatus and method for generalized fgs truncation of svc video with user preference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23842171 Country of ref document: EP Kind code of ref document: A1 |