WO2021129435A1 - 视频清晰度评估模型训练方法、视频推荐方法及相关装置 - Google Patents

视频清晰度评估模型训练方法、视频推荐方法及相关装置 Download PDF

Info

Publication number
WO2021129435A1
WO2021129435A1 PCT/CN2020/135998 CN2020135998W WO2021129435A1 WO 2021129435 A1 WO2021129435 A1 WO 2021129435A1 CN 2020135998 W CN2020135998 W CN 2020135998W WO 2021129435 A1 WO2021129435 A1 WO 2021129435A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
training
definition
original
videos
Prior art date
Application number
PCT/CN2020/135998
Other languages
English (en)
French (fr)
Inventor
陈建强
刘汇川
刘运
Original Assignee
百果园技术(新加坡)有限公司
陈建强
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 陈建强 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2021129435A1 publication Critical patent/WO2021129435A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/251Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies

Definitions

  • the embodiments of the present application relate to the technical field of video recommendation, for example, to a video definition evaluation model training method, a video definition evaluation model training device, a video recommendation method, a video recommendation device, equipment, and a storage medium.
  • the unreferenced video definition evaluation technology has made great progress, but the unreferenced video definition evaluation method based on neural networks requires a large amount of manual annotation of video data when training the neural network, and it is necessary to mark each video The clarity of the data is clearly scored.
  • the neural network is deepened and the amount of neural network parameters increases sharply, it is necessary to manually label the video data for training, which is very labor intensive.
  • the embodiments of this application provide a video definition evaluation model training method, a video definition evaluation model training device, a video recommendation method, a video recommendation device, equipment, and a storage medium, so as to solve the need for a large number of training video definition evaluation models in related technologies.
  • the problem of human annotation of video data is a problem of human annotation of video data.
  • the embodiment of the present application provides a method for training a video definition evaluation model, including:
  • the training video pair and the label are used to train the model to obtain a video definition evaluation model.
  • the embodiment of the application provides a video recommendation method, including:
  • the video definition evaluation model is trained by the video definition evaluation model training method described in any embodiment of the present application.
  • the embodiment of the present application provides a video clarity evaluation model training device, including:
  • the original video acquisition module is set to acquire multiple original videos
  • a training video pair obtaining module configured to obtain training video pairs including videos with different definitions based on the multiple original videos
  • a label labeling module configured to label the videos in the training video pair to obtain the label of the training video pair
  • the model training module is configured to train the model using the training video pair and the label to obtain a video definition evaluation model.
  • An embodiment of the present application provides a video recommendation device, including:
  • the video acquisition module to be recommended is set to acquire multiple videos to be recommended;
  • the model prediction module is configured to input multiple to-be-recommended videos into the video definition evaluation model to obtain the definition score of each to-be-recommended video;
  • a target video determining module configured to determine a target video from the multiple to-be-recommended videos based on the definition score of each to-be-recommended video
  • a video push module configured to push the target video to the user
  • the video definition evaluation model is trained by the video definition evaluation model training method described in any embodiment of the present application.
  • An embodiment of the present application provides a device, and the device includes:
  • One or more processors are One or more processors;
  • Storage device set to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the video definition evaluation model training method and/or video recommendation according to any embodiment of the present application method.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the method for training the video definition evaluation model and/or the video recommendation described in any of the embodiments of the present application is implemented. method.
  • FIG. 1 is a flow chart of the steps of a method for training a video definition evaluation model provided by Embodiment 1 of the present application;
  • FIG. 2 is a flowchart of the steps of a method for training a video definition evaluation model provided by the second embodiment of the present application;
  • FIG. 3 is a flow chart of the steps of a method for training a video definition evaluation model provided by Embodiment 3 of the present application;
  • FIG. 4 is a flowchart of steps of a method for training a video definition evaluation model provided by Embodiment 4 of the present application;
  • FIG. 5 is a flowchart of steps of a video recommendation method provided by Embodiment 5 of the present application.
  • FIG. 6 is a structural block diagram of a video clarity evaluation model training device provided by Embodiment 6 of the present application.
  • FIG. 7 is a structural block diagram of a video recommendation device provided by Embodiment 7 of the present application.
  • FIG. 8 is a structural block diagram of a device provided in Embodiment 8 of the present application.
  • FIG. 1 is a flowchart of the steps of a method for training a video definition evaluation model provided by Embodiment 1 of this application.
  • the embodiment of this application is applicable to the case of training a video definition evaluation model.
  • This method can be implemented by the video definition of this application.
  • the evaluation model training device is implemented.
  • the video clarity evaluation model training device can be implemented by hardware or software and integrated into the equipment provided in the embodiment of the present application.
  • the video definition evaluation model training method of the embodiment of the present application may include the following steps:
  • the original video may be a short video.
  • the original video may be a short video captured from multiple types of live broadcast platforms and short video platforms.
  • the original video can also be a video on a multi-type movie playback platform.
  • the format of the original video can be rm, rmvb, mp4 and other formats. The embodiments of this application do not impose restrictions on the source and format of the original video.
  • S102 Obtain training video pairs including videos with different definitions based on the multiple original videos.
  • the training video pair is used to train the video definition evaluation model, and the training video pair includes two videos with different definitions.
  • the image quality evaluation parameters of multiple original videos can be obtained, and then the multiple original videos are divided into multiple quality grades according to the image quality evaluation parameters, and one original video is extracted from each quality grade to form a video group ,
  • the image quality evaluation parameters of multiple original videos in the video group are not the same, that is, the definitions are not the same, then for any video group, any two original videos in the video group can be composed of videos with different definitions.
  • Training video pair Or for each original video, the original video is transcoded, blurred, etc., to obtain a processed video with a lower definition than the original video, and the original video and the processed video are formed into a training video pair.
  • the training video pair since the training video pair includes two videos, and the definitions of the two videos are not the same, during manual labeling, the two videos of the training video pair can be marked with the higher definition video and the definition. Low videos are labeled.
  • a high-definition video can be labeled with a high-definition label
  • a low-definition video can be labeled with a low-definition label, so as to obtain the label of the training video pair.
  • video A is a high-definition video
  • video B is a low-definition video. You can label video A with label 1, and video B with label -1, and the training video pair (A, B) has the label ( 1, -1).
  • the training video pair with high definition can be labeled according to the quality level.
  • the original video is obviously a high-definition video
  • the processed video is a low-definition video
  • the training video pair and the labels of the training video pair can be used as training data, and the training data can be used to train the model.
  • the model can be a variety of neural networks, and the trained model is the video definition evaluation model.
  • the video definition evaluation model can perform definition evaluation of the video to be evaluated, for example, to evaluate the definition score of the video to be evaluated.
  • the original video with higher definition is marked in the training video pair, so that it is only necessary to determine that the training video has a higher definition during manual labeling.
  • High original video there is no need to score the definition of each original video, which improves the efficiency of manual labeling and saves the cost of manually labeling training data.
  • FIG. 2 is a flow chart of the steps of a method for training a video definition evaluation model provided by the second embodiment of the application.
  • the embodiment of the present application is described on the basis of the foregoing first embodiment.
  • the video definition evaluation model training method of the embodiment of the present application may include the following steps:
  • S202 Perform image processing on the multiple original videos to obtain processed videos corresponding to the multiple original videos.
  • the original video may be transcoded to obtain a transcoded video, and the definition of the transcoded video is lower than the definition of the original video.
  • Transcoding can be a transformation of the original video encoding format.
  • the encoding format of the video can be H264, H265, etc.
  • the original video can be encoded into a video in the H263 format to get the transcoded video Transcoded video.
  • the definition of the transcoded video after transcoding may be set to be lower than the definition of the original video.
  • the original video may be blurred to obtain a blurred video, so that the definition of the blurred video is lower than the definition of the original video.
  • the original video and the transcoded video corresponding to the original video can be determined as a training video pair, or after the original video is blurred, the original video and the blurred
  • the processed video forms a training video pair, and the definition of the original video in the formed training video pair is higher than that of the transcoded video or the blurred video.
  • the embodiment of the present application forms a training video pair based on the original video and the video processed by the original video, which reduces the number of obtaining original videos and reduces the difficulty of obtaining training data.
  • An original video in each training video pair is marked as a high-definition video, and the original video processed video in each training video pair is marked as a low-definition video, to obtain a video of the training video pair. label.
  • the original video in the training video pair can be determined to have a higher definition
  • the original video in the training video pair can be marked with a high-definition label, and the video obtained after processing the original video can be marked with a low-definition label, so as to obtain the label of the training video pair.
  • the label of the original video in the training video pair can be assigned to 1, and the label of the video obtained after the original video processing is 0, then (1, 0) is the label of the training video pair, or the original video in the training video pair can be assigned directly
  • the resolution of is 10, and the resolution obtained after the original video processing is 5, then (10, 5) is the label of the training video pair.
  • the embodiments of this application are based on video pairs as training data, and the labeling is only for each training video pair, and the original video is compared with the processed video, and the original video is directly labeled as a high-definition video, which improves the efficiency of manual labeling. , Saving the cost of manual labeling.
  • the encoding information may be the resolution, bit rate, bit rate and other information set during encoding of each video.
  • ffmpeg Fast Forward moving pictures expert group
  • the model parameters of the video definition evaluation model can be initialized.
  • the initialized video definition evaluation model can include a convolutional layer and a fully connected layer.
  • a training video pair is randomly extracted, and the training video pair is input to the initial video definition evaluation model.
  • the video feature is extracted from the convolutional layer, and the video feature and the encoding information of each video in the one training video pair are input into the fully connected layer to obtain the definition score of each video in the one training video pair, and the definition score is adopted Calculate the loss rate with the label of the one training video pair.
  • the loss rate does not meet the preset conditions, use the loss rate to calculate the gradient; use the gradient to adjust the model parameters, return to randomly extract a training video pair, and combine the one training video pair
  • the video features are extracted from the convolutional layer of the input initial video definition evaluation model, and the model is iterated again until the loss rate meets the preset condition.
  • a deep learning network can be constructed as a model, for example, a network containing J three-dimensional (3D) convolutional layers and K fully connected layers can be constructed, and finally a network that outputs the video definition score through sigmoid as the initialization
  • the video definition evaluation model of, when each round of model iterative training, a training video pair consisting of the original video and the transcoded video (assuming that the training video pair is composed of the original video and the transcoded video) is transferred to the convolutional layer for extraction Video features, and then integrate the extracted video features with the resolution, bit rate, and bit rate of the training video pair and send them to the fully connected layer to obtain the definition scores of the original video and the transcoded video under the current model, respectively.
  • calculate the loss rate based on the following formula:
  • L(x1, x2; ⁇ ) is the loss rate
  • x1 and x2 are the two videos in the training video pair
  • f(x2; ⁇ ) is the definition score of video x2 under the current model parameter ⁇
  • f( x1; ⁇ ) is the definition score of video x1 under the current model parameter ⁇
  • is a constant.
  • the gradient can be calculated by the following gradient calculation formula:
  • the preset condition can be that the loss rate converges or is less than the preset value.
  • the preset value Indicates the gradient of the model parameter ⁇ .
  • the training video pair is a step of extracting video features from the convolutional layer of the initial video definition evaluation model with randomly extracted training video pairs, and continuing to iterate the model after adjusting the model parameters until the loss rate meets the preset condition.
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • SVM Support Vector Machine
  • other loss functions and gradient algorithms can also be used to train to obtain the video definition evaluation model.
  • the embodiment of the present application does not limit the manner of training the video definition evaluation model.
  • the original video is processed by image processing to obtain the processed video
  • the original video and the processed video are used to form a training video pair
  • the original video in the training video pair is labeled as the label of the training video pair
  • the training video is used Pair and label pair model training to obtain a video definition evaluation model.
  • labeling only the training video pair is used, and the original video is compared with the processed video, and the original video is directly labeled as a higher-resolution video.
  • Each video gives a sharpness score, which improves the efficiency of manual labeling and saves the cost of manual labeling.
  • FIG. 3 is a flow chart of the steps of a method for training a video definition evaluation model provided in the third embodiment of the application.
  • the embodiment of the present application is described on the basis of the foregoing first embodiment.
  • the video definition evaluation model training method of the embodiment of the present application may include the following steps:
  • the image quality evaluation parameters of the original video can be obtained through the image quality evaluation (Natural image quality evaluator, NIQE) algorithm.
  • the NIQE quality evaluation model does not require the subjective evaluation scores of the original image, but extracts image features from the original image library, and then uses multiple Gaussian model is used for modeling, so that image quality evaluation parameters can be obtained.
  • the range of image quality evaluation parameters for each quality grade can be set, and the quality grade to which each original video belongs is determined according to the image quality evaluation parameters of each original video, thereby classifying multiple original videos into multiple quality grades.
  • the quality level can include multiple original videos, for example, it can include n original videos with a total of n quality levels from 1 to n, each quality level n can include m original videos, and the range of image quality evaluation parameters for any two quality levels Disjoint, for example, the image quality evaluation parameter range of level 1 is 15-30, the image quality evaluation parameter range of level 2 is 31-55, etc., and so on, where n and m are positive integers greater than 1.
  • the original video is divided into a total of n quality levels of grades 1-n, and each quality grade n can include m original videos, and then one original video can be extracted from the grades 1-n to form a video group.
  • the video group includes n original videos.
  • the original videos in the video group come from videos of different quality grades. For each video group, any two original videos can form a training video pair, so that the definition of one original video in the video pair is higher than that of the other original video. Clarity.
  • the quality grade to which the original video belongs can be attached, and for a training video pair, based on the quality grades attached to the two original videos in the training video pair, the original video with high definition can be determined
  • A11 indicates that the original video A11 is the original video in the first video group, which comes from the first quality level
  • A18 indicates that the original video A18 is the original video in the first video group.
  • the image quality evaluation parameter range of the first quality level is (80-90)
  • the image quality evaluation parameter range of the eighth quality level is (10-20)
  • the original video A11 has high definition
  • the original video of the original video A11 is marked with a high-definition label
  • the original video A18 is a low-definition original video
  • the original video A18 is a low-definition label, so that the label of the selected video pair (A11, A18) is obtained .
  • the encoding information can be the resolution, bit rate, bit rate and other encoding information set during encoding of each video.
  • ffmpeg Fast Forward moving pictures expert group Extract coding information such as resolution, bit rate, and bit rate of each video.
  • the model parameters of the video definition evaluation model may be initialized, and the initialized video definition evaluation model may include a convolutional layer and a fully connected layer, a training video pair is randomly extracted, and the one training The video pair is input to the convolutional layer of the initial video definition evaluation model.
  • the video feature is extracted from the convolutional layer, and the video feature and the coding information of the one training video pair are input into the fully connected layer to obtain the definition of each video in the one training video pair.
  • Score use the definition score and the label of the training video pair to calculate the loss rate. If the loss rate does not meet the preset conditions, use the loss rate to calculate the gradient, use the gradient to adjust the model parameters, and return to randomly extract a training video pair.
  • the one training video extracts video features from the convolutional layer of the input initial video definition evaluation model, and iterates the model again until the loss rate meets the preset condition.
  • the training process of the video definition evaluation model can refer to S206, which will not be described in detail here.
  • the multiple original videos are divided into multiple quality grades according to the image quality evaluation parameters, and the operation of extracting an original video from each quality grade to form a video group is performed multiple times.
  • Each two original videos are extracted from each video group to form multiple training video pairs, and the high-definition videos of the training video pairs can be labeled according to the quality grade, and the video definition evaluation model can be trained through the training video pairs and the tags. .
  • the annotation is only for the video pair, and the quality grade of the original video is used to determine the high-definition video in the training video pair, the original video with high definition is directly labeled, without the need to give the definition of each original video Score, improve the efficiency of manual labeling, and save the cost of manual labeling.
  • FIG. 4 is a flow chart of the steps of a video definition evaluation model training method provided in the fourth embodiment of the application.
  • the embodiment of the present application is described on the basis of the third embodiment.
  • the video definition evaluation model training method of the embodiment of the present application may include the following steps:
  • S404 Perform multiple times to extract an original video from the original videos of each quality grade to obtain a video group.
  • S406 Label the original video with high definition and the original video with low definition based on the quality grade to which the original video in each training video pair belongs, to obtain a label of each training video pair.
  • S401-S408 can refer to S301-S308 in the third embodiment, which will not be described in detail here.
  • multiple original videos in a video group can be randomly extracted and input into the video definition evaluation model to evaluate the video definition
  • the model parameters of the model are fine-tuned.
  • a video group can be randomly extracted, and the video group can be input into the video definition evaluation model to obtain the first definition score of each original video in the video group.
  • the video group of the video definition evaluation model calculates the second definition score of each original video of the video group based on the label of the training video pair obtained by extracting every two original videos from the video group, and uses each of the video groups The second definition score and the first definition score of the original video are used to calculate the loss rate. When the loss rate does not meet the preset conditions, the loss rate is used to adjust the video definition evaluation model, and return to randomly extract a video group.
  • the one video group is input into the step of obtaining the first definition score of each original video in the video group in the video definition evaluation model, until the loss rate meets the preset condition.
  • the label of the training video pair can be the voting of the training video to the original video with high definition, and for each original video in the video group, the number of votes obtained by the original video can be counted, and the training video in the video group can be obtained For the total number of votes, the ratio of the number of votes to the total number of votes is calculated as the second definition score of the original video.
  • the loss rate can be calculated by the following formula:
  • L(y (i) , z (i) ) represents the loss rate
  • y (i) is the set of the second definition scores of the original video i in the video group calculated after manual annotation
  • z (i) is the video
  • the set of the first definition score of the original video i in the video group is obtained from the group input video definition evaluation model, Calculate the probability of the original video j being the top one in the video group for the set y(i) based on the second definition score, Calculate the probability of the original video j being the top one in the video group for the set z(i) based on the first definition score.
  • the calculation formula for the probability that the original video j in the video group is ranked first in the video group is as follows:
  • s j is the definition score of the original video in the video group
  • n is The number of original videos in the video group, through the above formula (4), the probability that an original video j is ranked first under the artificially annotated second definition score set y (i) can be calculated Or calculate the probability that an original video j is ranked first under the set z (i) of the first definition score output by the video definition evaluation model
  • the loss rate After calculating the loss rate after each iteration, you can use the loss rate to calculate the gradient, use the gradient to fine-tune the parameters of the video definition evaluation model, and return to randomly extract a video group, and input the video group into the video definition evaluation model Obtain the first definition score of each original video in the one video group to re-iterate the model until the loss rate meets a preset condition, where the preset condition can be that the loss rate is less than the threshold, or the gradient calculated by the loss rate is constant.
  • the embodiment of the application randomly extracts multiple video groups and adjusts the model parameters of the video definition evaluation model according to the multiple randomly extracted video groups. Because in each video group, it is based on extracting every two original videos from the video group. Calculate the second definition score of each original video with the labels of the obtained training video pairs. The second definition score of each original video and the first definition score predicted by the video definition evaluation model are used to calculate the loss rate. When the preset conditions are not met, the loss rate is used to adjust the video definition evaluation model, which realizes manual intervention to fine-tune the model training and improves the accuracy of the video definition evaluation model to evaluate the definition score of the video, making the video definition evaluation model Has stronger robustness.
  • a video group after fine-tuning the model parameters of the video definition evaluation model, a video group can be randomly extracted, and the one video group can be input into the video definition evaluation model to output the third part of each original video in the video group.
  • the definition score for example, a video group that has not been used during the fine-tuning of the model can be extracted, and this one video group can be input into the video definition evaluation model.
  • S411 Calculate the order preservation rate of the video definition evaluation model based on the third definition score and the fourth definition score of each original video in the one video group.
  • the fourth definition score is that in the video group, the definition score of each original video is calculated using the labels of the training video pairs obtained by extracting every two original videos from the video group.
  • the calculation process of the fourth definition score is the same as the calculation in S409
  • the process of scoring the second clarity is the same and will not be described in detail here.
  • the original videos in a video group may be sorted based on the fourth definition score of each original video in the video group to obtain the first ranking, and the first order may be obtained based on the score of each original video in the video group.
  • the third definition score sorts the original videos in the video group to obtain the second sort. Based on the first sort, the number of sort errors of the original videos that are sorted incorrectly in the second sort is counted, and the number of sort errors and the total sort are calculated.
  • the ratio of the quantity calculate the difference between 1 and the ratio and use the difference as the order-preserving rate.
  • the video group can be sorted in descending order according to the definition score. Assume that the video group contains 4 videos, original video A, original video B, original video C, and original video D. Assume that the first order is ABCD, and the second order is ACBD. , There is no sorting error (bad case) for A, C is a bad case for B, and there is no bad case for D, and so on to calculate all bad cases and the total case (all cases), that is, guarantee The sequence rate is 1-(bad case/all case).
  • the order-preserving rate expresses the accuracy rate of sorting multiple videos by inputting the multiple videos into the video clarity evaluation model to obtain the multiple video clarity scores, and sorting according to the clarity scores.
  • the sequence rate reflects the generalization accuracy of the video definition evaluation model.
  • S412 Determine whether the sequence preservation rate is greater than a preset threshold.
  • sequence preservation rate After the sequence preservation rate is calculated, it can be judged whether the sequence preservation rate is greater than the preset threshold. If the sequence preservation rate is greater than the preset threshold, it means that the trained video definition evaluation model has high accuracy in predicting the definition score, then Perform S413 to end the model training. If the sequence preservation rate is not greater than the preset threshold, return to S409 to continue to fine-tune the video definition evaluation model until the sequence preservation rate is greater than the preset threshold.
  • the multiple original videos are divided into multiple quality grades according to the image quality evaluation parameters, and one original video is extracted from each quality grade to form a video group.
  • two original videos are randomly extracted to form a training video pair, and the high-definition video of the training video pair can be labeled according to the quality grade, and the video definition evaluation model can be trained through the training video pair and the label. Since the labeling is only for the video pair, and the quality grade of the original video is used to determine the high-definition video of the training video pair, the original video with high definition is directly labeled, which improves the efficiency of manual labeling and saves manual labeling the cost of.
  • the video group input to the video definition evaluation model based on the video
  • the group extracts the labels of the training video pairs obtained from every two original videos to calculate the second definition score of each original video.
  • the second definition score and the first definition score of each original video are used to calculate the loss rate.
  • the loss rate is used to adjust the video definition evaluation model, which realizes manual intervention model training, improves the accuracy of the video definition evaluation model to evaluate the definition score of the video, and makes the video definition evaluation model have Stronger robustness.
  • the fourth definition score calculates the sequence preservation rate of the video definition evaluation model.
  • the sequence preservation rate is less than the preset threshold, continue to randomly extract the video group to adjust the video definition evaluation model, and verify the video definition obtained by training through the sequence preservation rate
  • the evaluation model improves the robustness and generalization accuracy of the video definition evaluation model.
  • FIG. 5 is a flowchart of the steps of a video recommendation method provided in the fifth embodiment of the application.
  • the embodiment of this application is applicable to the situation of recommending videos to users.
  • the method can be executed by the video recommendation apparatus implemented in this application.
  • the device may be implemented by hardware or software, and integrated into the device provided in the embodiment of the present application.
  • the video recommendation method of the embodiment of the present application may include the following steps:
  • the embodiment of the application may obtain multiple videos to be recommended when a video recommendation event is detected.
  • the video recommendation event may be a preset event.
  • the preset event may be the detection of a user logging into a live broadcast platform or a short video platform, and a detection The user browses the video list, detects that the user enters a keyword to search for the video, the current time is the preset time, etc.
  • multiple videos to be recommended can be obtained.
  • the video can be played based on the user's history, multiple videos similar to the historical played video can be obtained, or based on user input Searching for keywords recalls multiple videos.
  • the embodiment of the present application does not impose restrictions on the triggering event of acquiring multiple to-be-recommended videos and how to acquire multiple to-be-recommended videos.
  • S502 Input multiple to-be-recommended videos into a video definition evaluation model to obtain a definition score of each to-be-recommended video.
  • the video definition evaluation model of the embodiment of this application can be trained by the video definition evaluation model training method provided in any one of Embodiments 1 to 4.
  • S503 Determine a target video from the multiple to-be-recommended videos based on the definition score of each to-be-recommended video.
  • the multiple to-be-recommended videos may be sorted in descending order according to the definition scores of the multiple-to-be-recommended videos, and based on the network quality of the users to be recommended, a certain sorting range is determined as the target video, For example, if the network quality of the user to be recommended is good, the top N videos can be determined as the target video, and if the network quality of the user to be recommended is poor, the video with the lower ranking is determined as the target video.
  • multiple videos to be recommended can be divided into different grades according to the definition score, and each grade is associated with corresponding network quality parameters, and multiple videos in the corresponding grades can be selected as the target according to the network quality parameters of the users to be recommended. video.
  • the target video can also choose the way to determine the target video according to the actual business scenario. For example, for a service that annotates low-quality video, multiple videos that are ranked later or whose definition score is lower than a preset threshold can be determined as the target.
  • the video uses a low-standard mark to mark the target video, and the embodiment of the present application does not impose restrictions on the manner of determining the target video.
  • the target video can be pushed to the client used by the user to display the title, thumbnail, etc. of the target video on the client, so that the user can browse the target video.
  • the plurality of videos to be recommended are input into the video definition evaluation model to obtain the definition score of each video to be recommended, and the definition score is determined from the plurality of videos to be recommended Target video, push the target video to the user, because the video definition evaluation model is used to evaluate the definition score of the video to be recommended, it avoids the problem of subjective influence on the video definition scoring manually, and establishes a unified video definition score
  • the definition score obtained is objective and accurate, which improves the accuracy of video recommendation.
  • Fig. 6 is a structural block diagram of a video definition evaluation model training device provided in the sixth embodiment of the present application.
  • the video definition evaluation model training device of the embodiment of the present application may specifically include the following modules: an original video acquisition module 601, configured to acquire multiple original videos; and a training video pair acquisition module 602, configured based on the A plurality of original videos obtain training video pairs including videos with different definitions; the label labeling module 603 is configured to label the videos in the training video pair to obtain the label of the training video pair; the model training module 604 is configured to set In order to train the model using the training video pair and the label, a video definition evaluation model is obtained.
  • an original video acquisition module 601 configured to acquire multiple original videos
  • a training video pair acquisition module 602 configured based on the A plurality of original videos obtain training video pairs including videos with different definitions
  • the label labeling module 603 is configured to label the videos in the training video pair to obtain the label of the training video pair
  • the model training module 604 is configured to set In order to train the model
  • the training video pair acquisition module 602 includes: an image quality evaluation parameter acquisition submodule, configured to acquire image quality evaluation parameters of the multiple original videos; and a binning submodule, configured to be based on the multiple original videos
  • the video group generation sub-module is set to perform the operation of extracting an original video from the original videos of each quality grade to obtain a video group multiple times
  • the training video pair extraction sub-module is set to extract every two original videos from each video group to obtain multiple training video pairs.
  • the label labeling module 603 includes: a first labeling sub-module configured to perform processing on the high-definition original video and the low-definition original video based on the quality grade to which the original video in each training video pair belongs. Labeling to obtain the label of each training video pair.
  • the training video pair acquisition module 602 includes: a video processing sub-module configured to perform image processing on each original video to obtain a processed video corresponding to each original video; training video pair generation The sub-module is configured to use each original video and the video obtained by performing image processing on each original video to form a training video pair.
  • the video processing sub-module includes: a transcoding unit configured to perform transcoding processing on each original video to obtain a transcoded video, and the definition of the transcoded video is lower than that of each original video.
  • the definition of the original video; or, the blur processing unit is set to perform blur processing on each original video to obtain a blurred video, and the definition of the blurred video is lower than the clarity of each original video degree.
  • the label labeling module 603 includes: a first labeling submodule, configured to label the original video in each training video pair as a high-definition video, and the original video in each training video pair is processed The subsequent video is marked as a low-definition video, and the label of each training video pair is obtained.
  • the model training module 604 includes: an encoding information extraction sub-module configured to extract the encoding information of each video in the training video pair; a model training sub-module configured to use the training video pair, the The coding information and the label train the model to obtain a video definition evaluation model.
  • the number of training video pairs is multiple;
  • the model training submodule includes: an initialization model unit configured to initialize the model parameters of the video definition evaluation model, and the video definition evaluation model includes a convolutional layer and a full Connection layer;
  • the training video pair input unit is set to randomly extract a training video pair, and input the training video pair into the convolutional layer of the initial video definition evaluation model to extract video features, and combine the video features
  • the coding information of the pair of training videos is input into the fully connected layer to obtain the definition score of each video in the pair of training videos;
  • the loss rate calculation unit is set to use the definition score and the one training video
  • the correct label calculates the loss rate;
  • the gradient calculation unit is set to use the loss rate to calculate the gradient if the loss rate does not meet the preset conditions;
  • the model parameter adjustment unit is set to use the gradient to adjust the model parameters and return to training The video pair input unit until the loss rate meets the preset condition.
  • the device further includes: an adjustment module configured to randomly extract multiple video groups, and adjust the model parameters of the video definition evaluation model according to the multiple randomly extracted video groups.
  • the adjustment module includes: a first definition score evaluation sub-module, configured to randomly extract a video group, and input the one video group into the video definition evaluation model to obtain the one video The first definition score of each original video in the group; a second definition score calculation sub-module configured to calculate the one video group based on the label of the training video pair obtained by extracting every two original videos from the one video group The second definition score of each original video in the video group; the loss rate calculation sub-module is set to use the second definition score of each original video in the one video group and the second definition score of each original video in the one video group.
  • the first definition score of the video calculates the loss rate; the model adjustment sub-module is set to use the loss rate to adjust the video definition evaluation model when the loss rate does not meet the preset condition, and return to the first The definition score evaluation sub-module until the loss rate meets the preset condition.
  • the label of the training video pair is a vote of the training video to the original video with high definition
  • the second definition score calculation submodule includes: a counting unit for counting the number of votes, set to be based on the The label of the training video pair obtained by a video group counts the number of votes obtained by the original video; the total number of votes obtaining unit is configured to obtain the total number of votes of the training video pair in the video group; the second definition score calculation unit , Set to calculate the ratio of the number of votes to the total number of votes as the second definition score of each original video.
  • the above-mentioned device further includes: a third definition score evaluation module, configured to randomly extract a video group, and input the one video group into the video definition evaluation model to obtain each video group in the one video group.
  • the third definition score of the original videos is configured to calculate the clarity of the video based on the third definition score and the fourth definition score of each original video in the one video group
  • the fourth definition score is that in the one video group, the label of the training video pair obtained by extracting every two original videos from the video group is used to calculate the sequence preservation rate of the degree evaluation model.
  • the order-preserving rate judgment module is configured to determine whether the order-preserving rate is greater than a preset threshold, and when the order-preserving rate is less than or equal to the preset threshold, return to the third clarity score evaluation module.
  • the order-preserving rate calculation module includes: a first sorting sub-module configured to sort each original video based on the fourth definition score of each original video in the one video group to obtain the first A sorting; a second sorting submodule, configured to sort each original video based on the third definition score of each original video in the one video group to obtain a second sorting; a sorting error count submodule, Set to use the first ranking as a benchmark to count the number of incorrectly sorted original videos in the second ranking; the ratio calculation sub-module is set to calculate the ratio of the number of errors in the sort to the total number of sorts; keep The sequence rate calculation sub-module is set to calculate the difference between 1 and the ratio, and use the difference as the sequence preservation rate.
  • the video clarity evaluation model training device provided by the embodiment of the present application can execute the video clarity evaluation model training method described in any one of Embodiment 1 to Embodiment 4 of this application, and has functional modules corresponding to the execution method.
  • FIG. 7 is a structural block diagram of a video recommendation device provided in the seventh embodiment of the present application.
  • the video recommendation device of the embodiment of the present application may include the following modules:
  • the to-be-recommended video acquisition module 701 is configured to acquire multiple to-be-recommended videos; the model prediction module 702 is configured to input multiple to-be-recommended videos into the video definition evaluation model to obtain the definition score of each to-be-recommended video; the target video is determined The module 703 is configured to determine a target video from the multiple to-be-recommended videos based on the definition score of each video to be recommended; the video push module 704 is configured to push the target video to the user; wherein, the video The definition evaluation model is trained by the video definition evaluation model training method in the above embodiment.
  • the video recommendation device provided by the embodiment of the present application can execute the video recommendation method described in the fifth embodiment of the present application, and has functional modules corresponding to the execution method.
  • the device may include: a processor 80, a memory 81, a display screen 82 with a touch function, an input device 83, an output device 84, and a communication device 85.
  • the number of processors 80 in the device may be one or more.
  • One processor 80 is taken as an example in FIG. 8.
  • the processor 80, the memory 81, the display screen 82, the input device 83, the output device 84, and the communication device 85 of the device may be connected by a bus or other means. In FIG. 8, the connection by a bus is taken as an example.
  • the memory 81 can be used to store software programs, computer-executable programs, and modules, such as the program instructions/modules corresponding to the video definition evaluation model training methods described in Embodiment 1 to Embodiment 4 of the present application (For example, the original video acquisition module 501, the training video pair acquisition module 502, the label annotation module 503, and the model training module 504 in the video definition evaluation model training device of the fifth embodiment above), or as described in the fifth embodiment of the present application
  • the program instructions/modules corresponding to the video recommendation method for example, the to-be-recommended video acquisition module 601, the model prediction module 602, the target video determination module 603, and the video push module 604 in the video recommendation device of the sixth embodiment above).
  • the memory 81 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating device and an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like.
  • the memory 81 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 80 may include a memory remotely provided with respect to the processor 80, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the display screen 82 is a display screen 82 with a touch function, which may be a capacitive screen, an electromagnetic screen or an infrared screen.
  • the display screen 82 is set to display data according to instructions of the processor 80, and is also set to receive touch operations on the display screen 82 and send corresponding signals to the processor 80 or other devices.
  • the display screen 82 is an infrared screen, it also includes an infrared touch frame.
  • the infrared touch frame is arranged around the display screen 82 and can also be used to receive infrared signals and send the infrared signals to the processor. 80 or other equipment.
  • the communication device 85 is configured to establish a communication connection with other devices, and it may be a wired communication device and/or a wireless communication device.
  • the input device 83 may be configured to receive input digital or character information, and to generate key signal input related to user settings and function control of the device, and may also be a camera set to obtain images and a sound pickup device to obtain audio data.
  • the output device 84 may include audio equipment such as a speaker. It should be noted that the composition of the input device 83 and the output device 84 can be set according to actual conditions.
  • the processor 80 executes various functional applications and data processing of the device by running the software programs, instructions, and modules stored in the memory 81, that is, realizes the above-mentioned video definition evaluation model training method and/or video recommendation method.
  • the processor 80 when the processor 80 executes one or more programs stored in the memory 81, it implements the video definition evaluation model training method and/or the video recommendation method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the device can execute the video definition evaluation model training method and/or as described in the above method embodiment. Or video recommendation method.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, The names of multiple functional units are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of the present application.
  • multiple parts of this application can be implemented by hardware, software, firmware, or a combination thereof.
  • multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution device.
  • a suitable instruction execution device For example, if it is implemented by hardware, as in another embodiment, it can be implemented by any one or a combination of the following technologies known in the art: Discrete logic circuits, ASICs with suitable combinational logic gate circuits, Programmable Gate Array (PGA), Field Programmable Gate Array (FPGA), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

本申请实施例公开了一种视频清晰度评估模型训练方法、视频推荐方法及相关装置,视频清晰度评估模型训练方法包括:获取多个原始视频;基于所述多个原始视频获得包括清晰度不同的视频的训练视频对;对所述训练视频对中的视频进行标注,得到所述训练视频对的标签;采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。

Description

视频清晰度评估模型训练方法、视频推荐方法及相关装置
本申请要求在2019年12月27日提交中国专利局、申请号为201911380270.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及视频推荐技术领域,例如涉及一种视频清晰度评估模型训练方法、视频清晰度评估模型训练装置、视频推荐方法、视频推荐装置、设备和存储介质。
背景技术
随着网络技术的发展,如今短视频在人们的生活中无处不在,然而,短视频在拍摄、传输或者存储的过程中清晰度会受到不同程度的损伤,因此,如何评估视频的清晰度一直以来是一个较棘手的问题。
伴随着神经网络的兴起,无参考视频清晰度评估技术得到了长足进步,但是基于神经网络的无参考视频清晰度评估方法在训练神经网络时,需要大量的人工标注视频数据,需要对每个视频数据的清晰度进行明确的打分,在神经网络加深、神经网络参数量急剧增加的情况下,需要人工对训练用的视频数据进行大量的标注,非常消耗人力。
发明内容
本申请实施例提供一种视频清晰度评估模型训练方法、视频清晰度评估模型训练装置、视频推荐方法、视频推荐装置、设备和存储介质,以解决相关技术中训练视频清晰度评估模型时需要大量人力标注视频数据的问题。
本申请实施例提供了一种视频清晰度评估模型训练方法,包括:
获取多个原始视频;
基于所述多个原始视频获得包括清晰度不同的视频的训练视频对;
对所述训练视频对中的视频进行标注,得到所述训练视频对的标签;
采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。
本申请实施例提供了一种视频推荐方法,包括:
获取多个待推荐视频;
将多个待推荐视频输入视频清晰度评估模型中获得每个待推荐视频的清晰度得分;
基于每个待推荐视频的清晰度得分从所述多个待推荐视频中确定出目标视频;
将所述目标视频推送至所述用户;
其中,所述视频清晰度评估模型通过本申请任一实施例所述的视频清晰度评估模型训练方法所训练。
本申请实施例提供了一种视频清晰度评估模型训练装置,包括:
原始视频获取模块,设置为获取多个原始视频;
训练视频对获取模块,设置为基于所述多个原始视频获得包括清晰度不同的视频的训练视频对;
标签标注模块,设置为对所述训练视频对中的视频进行标注,得到所述训练视频对的标签;
模型训练模块,设置为采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。
本申请实施例提供了一种视频推荐装置,包括:
待推荐视频获取模块,设置为获取多个待推荐视频;
模型预测模块,设置为将多个待推荐视频输入视频清晰度评估模型中以获得每个待推荐视频的清晰度得分;
目标视频确定模块,设置为基于每个待推荐视频的清晰度得分从所述多个待推荐视频中确定出目标视频;
视频推送模块,设置为将所述目标视频推送至所述用户;
其中,所述视频清晰度评估模型通过本申请任一实施例所述的视频清晰度评估模型训练方法所训练。
本申请实施例提供了一种设备,所述设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请任一实施例所述的视频清晰度评估模型训练方法和/或视频推荐方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请任一实施例所述的视频清晰度评估模型训练方法和/或视频推荐方法。
附图说明
图1是本申请实施例一提供的一种视频清晰度评估模型训练方法的步骤流程图;
图2是本申请实施例二提供的一种视频清晰度评估模型训练方法的步骤流程图;
图3是本申请实施例三提供的一种视频清晰度评估模型训练方法的步骤流程图;
图4是本申请实施例四提供的一种视频清晰度评估模型训练方法的步骤流程图;
图5是本申请实施例五提供的一种视频推荐方法的步骤流程图;
图6是本申请实施例六提供的一种视频清晰度评估模型训练装置的结构框图;
图7是本申请实施例七提供的一种视频推荐装置的结构框图;
图8是本申请实施例八提供的一种设备的结构框图。
具体实施方式
下面结合附图和实施例对本申请进行说明。可以理解的是,此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
实施例一
图1为本申请实施例一提供的一种视频清晰度评估模型训练方法的步骤流程图,本申请实施例可适用于训练视频清晰度评估模型的情况,该方法可以由本申请实施的视频清晰度评估模型训练装置来执行,该视频清晰度评估模型训练装置可以由硬件或软件来实现,并集成在本申请实施例所提供的设备中。如图1所示,本申请实施例的视频清晰度评估模型训练方法可以包括如下步骤:
S101、获取多个原始视频。
在本申请实施例中,原始视频可以是短视频,例如,原始视频可以是从多类直播平台、短视频平台上抓取的短视频。原始视频还可以是多类影片播放平 台上的视频。原始视频的格式可以为rm、rmvb、mp4等格式。本申请实施例对原始视频的来源和格式均不加以限制。
S102、基于所述多个原始视频获得包括清晰度不同的视频的训练视频对。
训练视频对用于训练视频清晰度评估模型,训练视频对包括清晰度不相同的两个视频。在本申请实施例中,可以获取多个原始视频的图像质量评价参数,然后按照图像质量评价参数将多个原始视频分为多个质量档次,从每个质量档次抽取一个原始视频组成一个视频组,该视频组中多个原始视频的图像质量评价参数不相同,即清晰度不相同,则对于任意一个视频组,该视频组中任意两个原始视频均可组成包括清晰度不相同的视频的训练视频对。或者对于每个原始视频,对该原始视频进行转码、模糊化等处理得到清晰度低于原始视频的处理后的视频,将该原始视频和处理后的视频组成一个训练视频对。
S103、对所述训练视频对中的视频进行标注,得到所述训练视频对的标签。
本申请实施例中,由于训练视频对中包括两个视频,并且两个视频的清晰度不相同,在人工标注时,可以对训练视频对的两个视频中清晰度比较高的视频和清晰度低的视频进行标注。可选地,可以对清晰度高的视频标注清晰度高的标签,以及对清晰度低的视频标注清晰度低的标签,从而得到训练视频对的标签,例如,对于训练视频对(A,B),视频A为清晰度高的视频,视频B为清晰度低的视频,则可以对视频A标注标签1,对视频B标注标签-1,得到训练视频对(A,B)的标签为(1,-1)。
可选地,对于从不同质量档次均抽取一个原始视频组成的视频组,在从该视频组中任意两个原始视频组成的训练视频对中,可以根据质量档次标注训练视频对中清晰度高的原始视频和清晰度低的原始视频,而对于由原始视频和原始视频处理后的视频组成的训练视频对,原始视频显然为清晰度高的视频,原始视频处理后的视频为清晰度低的视频,从而使得人工标注时仅需要确定训练视频对中清晰度更高的原始视频,无需对每个原始视频的清晰度打分,提高了人工标注的效率,节省了人工标注训练数据的成本。
S104、采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。
在本申请实施例中可以将训练视频对以及训练视频对的标签作为训练数据,采用该训练数据来训练模型,其中,模型可以为多种神经网络,训练好的模型即为视频清晰度评估模型,该视频清晰度评估模型可以对待评估的视频进行清晰度评估,例如评估出待评估视频的清晰度得分。
本申请实施例中,基于原始视频生成用于模型训练的训练视频对后,在训 练视频对中标注出清晰度较高的原始视频,使得在人工标注时仅需要确定训练视频对中清晰度更高的原始视频,无需对每个原始视频的清晰度打分,提高了人工标注的效率,节省人工标注训练数据的成本。
实施例二
图2为本申请实施例二提供的一种视频清晰度评估模型训练方法的步骤流程图,本申请实施例在前述实施例一的基础上进行说明。如图2所示,本申请实施例的视频清晰度评估模型训练方法可以包括如下步骤:
S201、获取多个原始视频。
S202、对所述多个原始视频进行图像处理,得到所述多个原始视频对应的处理后的视频。
在本申请的可选实施例中,可以对原始视频进行转码处理得到转码后的视频,转码后的视频的清晰度低于原始视频的清晰度。转码可以为原始视频编码格式的转变,例如,视频的编码格式可以为H264、H265等,当原始视频的编码格式为H264时,可以将原始视频编码为格式为H263的视频得到转码后的转码视频,在本申请实施例中,在对原始视频进行转码时,可以设置转码后的转码视频的清晰度低于原始视频的清晰度。
在本申请的另一可选实施例中,还可以对原始视频进行模糊处理得到模糊处理后的视频,使得模糊处理后的视频的清晰度低于原始视频的清晰度。
还可以对原始视频进行重编码等其他图像处理,使得处理后得到的视频的清晰度低于原始视频的清晰度,本申请实施例对原始视频的图像处理方式不加以限制。
S203、采用每个所述原始视频以及对所述每个原始视频进行图像处理后得到的视频组成一个训练视频对。
例如,对原始视频进行转码得到转码视频后,可以将原始视频以及该原始视频对应的转码视频确定为一个训练视频对,又或者对原始视频进行模糊处理后,可以将原始视频以及模糊处理后的视频组成一个训练视频对,在组成的训练视频对中原始视频的清晰度高于转码视频或者模糊处理化后的视频。本申请实施例基于原始视频以及对该原始视频处理后的视频组成训练视频对,减少了获取原始视频的数量,降低了获得训练数据的难度。
S204、将每个训练视频对中的原始视频标注为清晰度高的视频,所述每个训练视频对中的原始视频处理后的视频标注为清晰度低的视频,得到所述训练视频对的标签。
在本申请实施例中,由于对原始视频处理后得到的视频的清晰度低于原始视频的清晰度,对于每个训练视频对,可以将该训练视频对中的原始视频确定为清晰度较高的视频,可以对该训练视频对中的原始视频进行标注清晰度高的标签,对原始视频处理后得到的视频标注清晰度低的标签,从而得到训练视频对的标签。例如,可以赋予训练视频对中原始视频的标签为1,原始视频处理后得到的视频的标签为0,则(1,0)即为训练视频对的标签,或者直接赋予训练视频对中原始视频的清晰度为10,原始视频处理后得到的清晰度为5,则(10,5)即为训练视频对的标签。本申请实施例基于视频对作为训练数据,标注时仅针对每个训练视频对,并且采用原始视频与处理后的视频进行比较,直接标注原始视频为清晰度高的视频,提高了人工标注的效率,节省了人工标注的成本。
S205、提取所述训练视频对中每个视频的编码信息。
在本申请实施例中,编码信息可以是每个视频在编码时设定的分辨率、码率、比特(bit)率等信息,在实际应用中对于训练视频对中的每个视频,可以通过ffmpeg(Fast Forward moving pictures expert group)提取每个视频的分辨率、码率、比特率等编码信息。
S206、采用所述训练视频对、所述编码信息以及所述标签训练模型得到视频清晰度评估模型。
可以初始化视频清晰度评估模型的模型参数,初始化的视频清晰度评估模型可以包括卷积层和全连接层,随机提取一个训练视频对,将所述一个训练视频对输入初始视频清晰度评估模型的卷积层中提取视频特征,将视频特征和所述一个训练视频对中每个视频的编码信息输入全连接层中得到所述一个训练视频对中每个视频的清晰度得分,采用清晰度得分和所述一个训练视频对的标签计算损失率,如果损失率未满足预设条件,则采用损失率计算梯度;采用梯度调整模型参数,返回随机提取一个训练视频对,将所述一个训练视频对输入初始视频清晰度评估模型的卷积层中提取视频特征,重新对模型进行迭代,直到损失率满足预设条件。
在本申请的一个示例中,可以构建深度学习网络作为模型,例如构造包含J个三维(Three Dimensions,3D)卷积层和K个全连接层,最后通过sigmoid输出视频清晰度得分的网络作为初始化的视频清晰度评估模型,当每轮模型迭代训练时,将原始视频与转码后的视频(假设训练视频对由原始视频和转码视频组成)构成的一个训练视频对传送到卷积层提取视频特征,然后将提取的视频特征和训练视频对的分辨率、码率、bit率整合后传送到全连接层中,分别得到原始视频和转码后的视频在当前模型下的清晰度得分,并基于以下公式计算损失率:
L(x1,x2;θ)=max(0,f(x2;θ)-f(x1;θ)+ε)        (1)
上述公式中L(x1,x2;θ)为损失率,x1和x2为训练视频对中的两个视频,f(x2;θ)为在当前模型参数θ下视频x2的清晰度得分,f(x1;θ)为在当前模型参数θ下视频x1的清晰度得分,ε为常数。
如果损失率未满足预设条件,则可以通过以下梯度计算公式计算梯度:
Figure PCTCN2020135998-appb-000001
其中,预设条件可以为损失率收敛或者小于预设值等,上述公式2中
Figure PCTCN2020135998-appb-000002
表示对模型参数θ求梯度。
如果损失率未满足预设条件,例如,在损失率下对模型参数θ求得的梯度未收敛或者损失率未达到预设值等,则通过该梯度对模型参数θ进行调整,并返回随机提取训练视频对,将随机提取的训练视频对输入初始视频清晰度评估模型的卷积层中提取视频特征的步骤,继续对调整模型参数后的模型进行迭代直到损失率满足预设条件为止。
在实际应用中还可以通过卷积神经网络(Convolutional Neural Network,CNN)、循环神经网络(Recurrent Neural Network,RNN)等其他神经网络,或者支持向量机(Support Vector Machine,SVM)训练模型,在训练模型过程中还可以采用其他损失函数和梯度算法训练得到视频清晰度评估模型,本申请实施例对训练视频清晰度评估模型的方式不加以限制。
本申请实施例对原始视频进行图像处理得到处理后的视频,采用原始视频和处理后的视频组成一个训练视频对,对训练视频对中的原始视频进行标注作为训练视频对的标签,采用训练视频对和标签对模型训练,得到视频清晰度评估模型,在标注时仅针对训练视频对,并且采用原始视频与处理后的视频进行比较,直接标注原始视频为清晰度较高的视频,无需对每个视频给出清晰度得分,提高了人工标注的效率,节省了人工标注的成本。
将训练视频对中视频的分辨率、码率、bit率等编码信息结合神经网络提取的视频特征输入模型的全连接层进行训练,实现了神经网络提取的特征和人工提取的特征的结合,实现了采用人工获取到的视频的客观的编码信息干预模型训练,训练得到的视频清晰度评估模型的鲁棒性更高。
实施例三
图3为本申请实施例三提供的一种视频清晰度评估模型训练方法的步骤流程图,本申请实施例在前述实施例一的基础上进行说明。如图3所示,本申请实施例的视频清晰度评估模型训练方法可以包括如下步骤:
S301、获取多个原始视频。
S302、获取所述多个原始视频的图像质量评价参数。
可以通过图像质量评估(Natural image quality evaluator,NIQE)算法获得原始视频的图像质量评价参数,NIQE质量评价模型不需要原始图像的主观评价分数,而是在原始图像库中提取图像特征,再利用多元高斯模型进行建模,从而可以得到图像质量评价参数,图像质量评价参数越大,说明原始视频的清晰度越高。
S303、基于所述多个原始视频的图像质量评价参数将多个原始视频分为多个质量档次的原始视频。
可以设置每个质量档次的图像质量评价参数的范围,根据每个原始视频的图像质量评价参数确定每个原始视频所属的质量档次,从而将多个原始视频分类到多个质量档次中,每个质量档次可以包括多个原始视频,例如,可以包括档次1-n共n个质量档次的原始视频,每个质量档次n可以包括m个原始视频,任意两个质量档次的图像质量评价参数的范围不相交,示例性地,档次1的图像质量评价参数范围为15-30,档次2的图像质量评价参数范围为31-55等,以此类推,其中,n,m为大于1的正整数。
S304、多次执行从每个质量档次的原始视频中提取一个原始视频得到一个视频组的操作。
示例性地,原始视频共分为档次1-n共n个质量档次,每个质量档次n可以包括m个原始视频,则可以从档次1-n中均抽取出一个原始视频组成一个视频组,该视频组包括n个原始视频。
S305、从所述每个视频组中提取每两个原始视频得到多个训练视频对。
视频组中的原始视频来源于不同质量档次的视频,对于每个视频组,可以任意两个原始视频组成一个训练视频对,使得该视频对中一个原始视频的清晰度高于另一个原始视频的清晰度。
S306、基于每个训练视频对中所述原始视频所属的质量档次对清晰度高的原始视频和清晰度低的原始视频进行标注,得到所述每个训练视频对的标签。
对于每个视频组中的原始视频可以附带该原始视频所属的质量档次,则对于一个训练视频对,基于该训练视频对中的两个原始视频所附带的质量档次可以确定清晰度高的原始视频,例如对于训练视频对(A11,A18),A11表示原始视频A11为第一视频组中的原始视频,其来源于第一质量档次,A18表示原始视频A18为第一视频组中的原始视频,其来源于第八质量档次,假设第一质量档次的图像质量评价参数范围为(80-90),第八质量档次的图像质量评价参 数范围为(10-20),原始视频A11为清晰度高的原始视频,为原始视频A11标注清晰度高的标签,原始视频A18为清晰度低的原始视频,为原始视频A18标签清晰度低的标签,从而得到选了视频对(A11,A18)的标签。
S307、提取所述训练视频对中每个视频的编码信息。
编码信息可以是每个视频在编码时设定的分辨率、码率、比特率等编码信息,在实际应用中对于训练视频对中的每个视频,可以通过ffmpeg(Fast Forward moving pictures expert group)提取每个视频的分辨率、码率、比特率等编码信息。
S308、采用所述训练视频对、所述编码信息以及所述标签对模型进行训练得到视频清晰度评估模型。
在本申请的可选实施例中,可以初始化视频清晰度评估模型的模型参数,初始化的视频清晰度评估模型可以包括卷积层和全连接层,随机提取一个训练视频对,将所述一个训练视频对输入初始视频清晰度评估模型的卷积层中提取视频特征,将视频特征和所述一个训练视频对的编码信息输入全连接层中得到所述一个训练视频对中每个视频的清晰度得分,采用清晰度得分和所述一个训练视频对的标签计算损失率,如果损失率未满足预设条件,则采用损失率计算梯度,采用梯度调整模型参数,返回随机提取一个训练视频对,将所述一个训练视频对输入初始视频清晰度评估模型的卷积层中提取视频特征,重新对模型进行迭代,直到损失率满足预设条件。
视频清晰度评估模型的训练过程可参考S206,在此不再详述。
本申请实施例获取原始视频的图像质量评价参数后,根据图像质量评价参数将多个原始视频分为多个质量档次,多次执行从每个质量档次抽取一个原始视频组成一个视频组的操作,从每个视频组中抽取每两个原始视频组成多个训练视频对,并能够根据质量档次标注出训练视频对中清晰度高的视频得到标签,通过训练视频对以及标签训练视频清晰度评估模型。由于在标注时仅针对视频对,并且采用原始视频所属的质量档次进行比较确定训练视频对中清晰度高的视频,直接标注出清晰度高的原始视频,无需给出每个原始视频的清晰度得分,提高了人工标注的效率,节省了人工标注的成本。
实施例四
图4为本申请实施例四提供的一种视频清晰度评估模型训练方法的步骤流程图,本申请实施例在前述实施例三的基础上进行说明。如图4所示,本申请实施例的视频清晰度评估模型训练方法可以包括如下步骤:
S401、获取多个原始视频。
S402、获取所述多个原始视频的图像质量评价参数。
S403、基于所述多个原始视频的图像视频评价参数将多个原始视频分为多个质量档次的原始视频。
S404、多次执行从每个质量档次的原始视频中提取一个原始视频得到一个视频组。
S405、从每个视频组中提取每两个原始视频得到多个训练视频对。
S406、基于每个训练视频对中所述原始视频所属的质量档次对清晰度高的原始视频和清晰度低的原始视频进行标注,得到所述每个训练视频对的标签。
S407、提取所述训练视频对中每个视频的编码信息。
S408、采用所述训练视频对、所述编码信息以及所述标签对模型进行训练,得到视频清晰度评估模型。
在本申请实施例中,S401-S408可以参考实施例三中的S301-S308,在此不再详述。
S409、随机提取多个视频组,并根据随机提取的多个视频组对所述视频清晰度评估模型的模型参数进行调整。
在将多个原始视频分为多个视频组以及基于每个视频组生成训练视频对后,可以随机提取一个视频组中的多个原始视频输入到视频清晰度评估模型中以对视频清晰度评估模型的模型参数进行微调。
在本申请的可选实施例中,可以随机提取一个视频组,并将该一个视频组输入视频清晰度评估模型中得到该一个视频组中每个原始视频的第一清晰度得分,针对输入至视频清晰度评估模型的视频组,基于从该视频组提取每两个原始视频得到的训练视频对的标签计算该视频组每个原始视频的第二清晰度得分,采用该视频组中的每个原始视频的第二清晰度得分和第一清晰度得分计算损失率,在损失率未满足预设条件时,采用损失率对视频清晰度评估模型进行调整,并返回随机提取一个视频组,并将该一个视频组输入视频清晰度评估模型中得到视频组中每个原始视频的第一清晰度得分的步骤,直到损失率满足预设条件。
训练视频对的标签可以为训练视频对中清晰度高的原始视频的投票,则可以针对视频组内的每个原始视频,统计该原始视频所获得的投票数,并获取视频组内对训练视频对的总投票数,计算投票数和总投票数的比值作为原始视频的第二清晰度得分。
例如,对于每个视频组所生成的训练视频对,该训练视频对中原始视频被标注为清晰度高的视频时,该原始视频的投票数为1,如此累加得到该原始视频 的投票数,例如,视频组生成的训练视频对包括(A,B)、(A,C)、(A,D)、(B,C)、(B,D)以及(C,D),其中,在(A,B)、(A,C)中原始视频A被标注为清晰度高的视频,则原始视频A的投票数为2,由于每个训练视频对进行一次投票,总投票数为视频组中所生成的训练视频对的数量,则第一视频组的总投票数为6,由此可以计算原始视频A的第二清晰度得分为2/6=0.33。
在将一个视频组中的原始视频输入视频清晰度评估模型中得到视频组中每个原始视频的第一清晰度得分后,可以通过以下公式计算损失率:
Figure PCTCN2020135998-appb-000003
其中,L(y (i),z (i))表示损失率,y (i)为人工标注后计算的视频组中原始视频i的第二清晰度得分的集合,z (i)为将视频组输入视频清晰度评估模型中得到视频组中原始视频i的第一清晰度得分的集合,
Figure PCTCN2020135998-appb-000004
为基于第二清晰度得分的集合y (i)计算原始视频j在视频组中排在最前(top one)的概率,
Figure PCTCN2020135998-appb-000005
为基于第一清晰度得分的集合z (i)计算原始视频j在视频组中排在最前(top one)的概率。i=1,2,…,n;j=1,2,…,n;n为n为视频组中原始视频的个数。
可选地,视频组中原始视频j在视频组中排在最前的概率计算公式如下:
Figure PCTCN2020135998-appb-000006
上述公式(4)中,s j为原始视频在视频组中的清晰度得分,s k是视频组中原始视频k的清晰度得分,其中k=(1,2,….n),n为视频组中原始视频的个数,通过上述公式(4)可以计算出一个原始视频j在人工标注的第二清晰度得分的集合y (i)下的排在最前的概率
Figure PCTCN2020135998-appb-000007
或者计算出一个原始视频j在视频清晰度评估模型输出的第一清晰度得分的集合z (i)下的排在最前的概率
Figure PCTCN2020135998-appb-000008
在计算得到每轮迭代后的损失率后,可以采用损失率计算梯度,采用梯度微调视频清晰度评估模型的参数,并返回随机提取一个视频组,并将该一个视频组输入视频清晰度评估模型中得到该一个视频组中每个原始视频的第一清晰 度得分,以对模型重新迭代直到损失率满足预设条件,其中预设条件可以为损失率小于阈值,或者通过损失率计算的梯度为常量。
本申请实施例随机提取多个视频组并根据随机提取的多个视频组对视频清晰度评估模型的模型参数进行调整,由于在每个视频组中,基于从该视频组提取每两个原始视频得到的训练视频对的标签计算每个原始视频的第二清晰度得分,采用每个原始视频的第二清晰度得分和视频清晰度评估模型预测的第一清晰度得分计算损失率,在损失率未满足预设条件时,采用损失率对视频清晰度评估模型进行调整,实现了人工干预微调模型训练,提高了视频清晰度评估模型评估视频的清晰度得分的准确度,使得视频清晰度评估模型具有更强的鲁棒性。
S410、随机提取一个视频组,并将所述一个视频组输入所述视频清晰度评估模型中,得到所述一个视频组中每个原始视频的第三清晰度得分。
本申请实施例中,在微调视频清晰度评估模型的模型参数后,可以随机提取一个视频组,并将该一个视频组输入视频清晰度评估模型以输出该视频组中每个原始视频的第三清晰度得分,例如可以提取微调模型时未使用过的视频组,并将该一个视频组输入视频清晰度评估模型中。
S411、基于所述一个视频组中每个原始视频的所述第三清晰度得分和第四清晰度得分计算所述视频清晰度评估模型的保序率。
第四清晰度得分为在视频组中,采用从视频组提取每两个原始视频得到的训练视频对的标签计算每个原始视频的清晰度得分,第四清晰度得分的计算过程与S409中计算第二清晰度得分的过程相同,在此不再详述。
在本申请的一个示例中,可以基于一个视频组中每个原始视频的第四清晰度得分对该视频组中的原始视频进行排序得到第一排序,基于该一个视频组中每个原始视频的第三清晰度得分对该视频组中的原始视频进行排序得到第二排序,以第一排序为基准,统计出第二排序中排序错误的原始视频的排序出错数量,计算排序出错数量和排序总数量的比值,计算1与该比值的差值并将差值作为保序率。
例如,可以以清晰度得分对视频组进行降序排序,假设视频组包含原始视频A、原始视频B、原始视频C、原始视频D共4个视频,假设第一排序为ABCD,第二排序为ACBD,则对A来说没有排序出错(bad case),对B来说C是bad case,对D来说没有bad case,以此类推计算所有的bad case和总的case(all case),即保序率为1-(bad case/all case)。
本申请实施例中,保序率表达了对于多个视频,将该多个视频输入视频清 晰度评估模型后得到多个视频的清晰度得分,按照该清晰度得分进行排序的准确率,该保序率反映了视频清晰度评估模型的泛化精度。
S412、判断所述保序率是否大于预设阈值。
在计算得到保序率之后,可以判断该保序率是否大于预设阈值,若该保序率大于预设阈值,则说明训练得到的视频清晰度评估模型预测清晰度得分的准确度高,则执行S413,结束模型训练,若该保序率不大于预设阈值,则返回S409继续对视频清晰度评估模型进行微调,直到保序率大于预设阈值。
本申请实施例获取原始视频的图像质量评价参数后,根据图像质量评价参数将多个原始视频分为多个质量档次,从每个质量档次抽取一个原始视频组成一个视频组,在每个视频组中,任意抽取两个原始视频组成训练视频对,并能够根据质量档次标注出训练视频对中清晰度高的视频得到标签,通过训练视频对以及标签训练视频清晰度评估模型。由于在标注时仅针对视频对,并且采用原始视频所属的质量档次进行比较确定训练视频对中清晰度高的视频,直接标注清晰度高的原始视频,提高了人工标注的效率,节省了人工标注的成本。
随机提取视频组,并将提取的视频组输入至视频清晰度评估模型中得到视频组中每个原始视频的第一清晰度得分,针对输入至视频清晰度评估模型的视频组,基于从该视频组提取每两个原始视频得到的训练视频对的标签计算每个原始视频的第二清晰度得分,采用每个原始视频的第二清晰度得分和第一清晰度得分计算损失率,在损失率未满足预设条件时,采用损失率对视频清晰度评估模型进行调整,实现了人工干预模型训练,提高了视频清晰度评估模型评估视频的清晰度得分的准确度,使得视频清晰度评估模型具有更强的鲁棒性。
随机提取视频组,并将提取的视频组输入所述视频清晰度评估模型中得到视频组中每个原始视频的第三清晰度得分,基于视频组中每个原始视频的第三清晰度得分和第四清晰度得分计算视频清晰度评估模型的保序率,在保序率小于预设阈值时继续随机提取视频组对视频清晰度评估模型进行调整,通过保序率验证训练得到的视频清晰度评估模型,提高了视频清晰度评估模型的鲁棒性和泛化精度。
实施例五
图5为本申请实施例五提供的一种视频推荐方法的步骤流程图,本申请实施例可适用于向用户推荐视频的情况,该方法可以由本申请实施的视频推荐装置来执行,该视频推荐装置可以由硬件或软件来实现,并集成在本申请实施例所提供的设备中。如图5所示,本申请实施例的视频推荐方法可以包括如下步骤:
S501、获取多个待推荐视频。
本申请实施例可以在检测到视频推荐事件时获取多个待推荐视频,其中,视频推荐事件可以为预设事件,例如,预设事件可以为检测到用户登录直播平台或者短视频平台、检测到用户浏览视频列表、检测到用户输入关键词搜索视频、当前时间为预设时间等。当检测到视频推荐事件时,可以获取多个待推荐视频,例如,检测到用户登录事件时,可以基于用户的历史播放视频,获取与历史播放视频相似的多个视频,又或者基于用户输入的搜索关键字召回多个视频,本申请实施例对获取多个待推荐视频的触发事件以及如何获取多个待推荐视频不加以限制。
S502、将多个待推荐视频输入视频清晰度评估模型中获得每个待推荐视频的清晰度得分。
本申请实施例的视频清晰度评估模型可以通过实施例一到实施例四任一实施例所提供的视频清晰度评估模型训练方法所训练,当将多个待推荐视频输入视频清晰度评估模型后,可以获得多个待推荐视频的清晰度得分。
S503、基于每个待推荐视频的清晰度得分从所述多个待推荐视频中确定出目标视频。
在本申请的可选实施例中,可以按照多个待推荐视频的清晰度得分对多个待推荐视频进行降序排序,并基于待推荐用户的网络质量确定出一定排序范围的视频作为目标视频,例如,待推荐用户的网络质量良好,可以将排序在前的N个视频确定为目标视频,在待推荐用户的网络质量差的情况下,确定排序比较靠后的视频作为目标视频。例如,还可以根据清晰度得分将多个待推荐视频划分到不同的档次中,每个档次关联相应的网络质量参数,以根据待推荐用户的网络质量参数选择相应档次中的多个视频作为目标视频。当然,本领域技术人员还可以根据实际业务场景选择确定目标视频的方式,例如,对于标注低质量视频业务,可以将排序在后的或者清晰度得分低于预设阈值的多个视频确定为目标视频以对目标视频标注低标标识,本申请实施例对确定目标视频的方式不加以限制。
S504、将所述目标视频推送至用户。
可以将目标视频推送至用户所使用的客户端,以在客户端展示目标视频的标题、缩略图等,使得用户可以浏览目标视频。
本申请实施例获取多个待推荐视频后,将多个待推荐视频输入视频清晰度评估模型中获得每个待推荐视频的清晰度得分,并基于清晰度得分从多个待推荐视频中确定出目标视频,将目标视频推送给用户,由于采用视频清晰度评估 模型评估待推荐视频的清晰度得分,避免了人工对视频的清晰度打分受主观性影响的问题,为视频清晰度打分建立了统一的打分标准,得到的清晰度得分客观准确,提高了视频推荐的精确度。
本申请实施例的视频清晰度评估模型在训练时,基于原始视频生成用于模型训练的训练视频对后,在每个训练视频对中标注出清晰度较高的原始视频,人工标注时仅需要确定训练视频对中清晰度更高的原始视频,无需对每个原始视频的清晰度打分,提高了人工标注的效率,节省了人工标注训练数据的成本,能够获得大量训练数据来有效地训练视频清晰度评估模型,使得视频清晰度评估模型能够广泛应用于视频清晰度评估中。
实施例六
图6是本申请实施例六提供的一种视频清晰度评估模型训练装置的结构框图。如图6所示,本申请实施例的视频清晰度评估模型训练装置具体可以包括如下模块:原始视频获取模块601,设置为获取多个原始视频;训练视频对获取模块602,设置为基于所述多个原始视频获得包括清晰度不同的视频的训练视频对;标签标注模块603,设置为对所述训练视频对中的视频进行标注,得到所述训练视频对的标签;模型训练模块604,设置为采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。
可选地,所述训练视频对获取模块602包括:图像质量评价参数获取子模块,设置为获取所述多个原始视频的图像质量评价参数;分档子模块,设置为基于所述多个原始视频的图像视频评价参数将多个原始视频分为多个质量档次的原始视频;视频组生成子模块,设置为多次执行从每个质量档次的原始视频中提取一个原始视频得到一个视频组的操作;训练视频对提取子模块,设置为从每个视频组中提取每两个原始视频得到多个训练视频对。
可选地,所述标签标注模块603包括:第一标注子模块,设置为基于每个训练视频对中所述原始视频所属的质量档次对清晰度高的原始视频和清晰度低的原始视频进行标注,得到所述每个训练视频对的标签。
在一实施例中,所述训练视频对获取模块602包括:视频处理子模块,设置为对每个原始视频进行图像处理,得到所述每个原始视频对应的处理后的视频;训练视频对生成子模块,设置为采用每个原始视频以及对所述每个原始视频进行图像处理后得到的视频组成一个训练视频对。
可选地,所述视频处理子模块包括:转码单元,设置为对每个原始视频进行转码处理得到转码后的视频,所述转码后的视频的清晰度低于所述每个原始视频的清晰度;或者,模糊处理单元,设置为对每个原始视频进行模糊处理, 得到模糊处理后的视频,所述模糊处理后的视频的清晰度低于所述每个原始视频的清晰度。
可选地,所述标签标注模块603包括:第一标注子模块,设置为将每个训练视频对中的原始视频标注为清晰度高的视频,所述每个训练视频对中的原始视频处理后的视频标注为清晰度低的视频,得到所述每个训练视频对的标签。
可选地,所述模型训练模块604包括:编码信息提取子模块,设置为提取所述训练视频对中每个视频的编码信息;模型训练子模块,设置为采用所述训练视频对、所述编码信息以及所述标签对模型进行训练,得到视频清晰度评估模型。
可选地,训练视频对的数量为多个;所述模型训练子模块包括:初始化模型单元,设置为初始化视频清晰度评估模型的模型参数,所述视频清晰度评估模型包括卷积层和全连接层;训练视频对输入单元,设置为随机提取一个训练视频对,并将所述一个训练视频对输入所述初始视频清晰度评估模型的卷积层中提取视频特征,以及将所述视频特征和所述一个训练视频对的编码信息输入全连接层中得到所述一个训练视频对中每个视频的清晰度得分;损失率计算单元,设置为采用所述清晰度得分和所述一个训练视频对的标签计算损失率;梯度计算单元,设置为如果所述损失率未满足预设条件,则采用所述损失率计算梯度;模型参数调整单元,设置为采用所述梯度调整模型参数,返回训练视频对输入单元,直到所述损失率满足预设条件。
可选地,所述装置还包括:调整模块,设置为随机提取多个视频组,根据随机提取的多个视频组对所述视频清晰度评估模型的模型参数进行调整。
可选地,所述调整模块包括:第一清晰度得分评估子模块,设置为随机提取一个视频组,并将所述一个视频组输入至所述视频清晰度评估模型中,得到所述一个视频组中每个原始视频的第一清晰度得分;第二清晰度得分计算子模块,设置为基于从所述一个视频组提取每两个原始视频得到的训练视频对的标签计算所述一个视频组中的每个原始视频的第二清晰度得分;损失率计算子模块,设置为采用所述一个视频组中的每个原始视频的第二清晰度得分和所述一个视频组中的每个原始视频的第一清晰度得分计算损失率;模型调整子模块,设置为在所述损失率未满足预设条件时,采用所述损失率对所述视频清晰度评估模型进行调整,并返回第一清晰度得分评估子模块直到所述损失率满足预设条件。
可选地,所述训练视频对的标签为所述训练视频对中清晰度高的原始视频的投票,所述第二清晰度得分计算子模块包括:投票数统计单元,设置为基于从所述一个视频组得到的训练视频对的标签统计所述原始视频所获得的投票 数;总投票数获取单元,设置为获取所述视频组内训练视频对的总投票数;第二清晰度得分计算单元,设置为计算所述投票数和所述总投票数的比值作为所述每个原始视频的第二清晰度得分。
可选地,上述装置还包括:第三清晰度得分评估模块,设置为随机提取一个视频组,并将所述一个视频组输入所述视频清晰度评估模型中,得到所述一个视频组中每个原始视频的第三清晰度得分;保序率计算模块,设置为基于所述一个视频组中每个原始视频的所述第三清晰度得分和所述第四清晰度得分计算所述视频清晰度评估模型的保序率,所述第四清晰度得分为在所述一个视频组中,采用从所述视频组提取每两个原始视频得到的训练视频对的标签计算所述每个原始视频的清晰度得分;保序率判断模块,设置为判断所述保序率是否大于预设阈值,在所述保序率小于或等于预设阈值时,返回第三清晰度得分评估模块。
可选地,所述保序率计算模块包括:第一排序子模块,设置为基于所述一个视频组中每个原始视频的第四清晰度得分对所述每个原始视频进行排序,得到第一排序;第二排序子模块,设置为基于所述一个视频组中每个原始视频的第三清晰度得分对所述每个原始视频进行排序,得到第二排序;排序出错数量统计子模块,设置为以所述第一排序为基准,统计出所述第二排序中排序错误的原始视频的排序出错数量;比值计算子模块,设置为计算所述排序出错数量和排序总数量的比值;保序率计算子模块,设置为计算1与所述比值的差值,将所述差值作为保序率。
本申请实施例所提供的视频清晰度评估模型训练装置可执行本申请实施例一到实施例四任一所述视频清晰度评估模型训练方法,具备执行方法相应的功能模块。
实施例七
图7是本申请实施例七提供的一种视频推荐装置的结构框图,如图7所示,本申请实施例的视频推荐装置可以包括如下模块:
待推荐视频获取模块701,设置为获取多个待推荐视频;模型预测模块702,设置为将多个待推荐视频输入视频清晰度评估模型中获得每个待推荐视频的清晰度得分;目标视频确定模块703,设置为基于每个待推荐视频的清晰度得分从所述多个待推荐视频中确定出目标视频;视频推送模块704,设置为将所述目标视频推送给用户;其中,所述视频清晰度评估模型通过上述实施例中的视频清晰度评估模型训练方法所训练。
本申请实施例所提供的视频推荐装置可执行本申请实施例五所述视频推荐 方法,具备执行方法相应的功能模块。
实施例八
参照图8,示出了本申请一个示例中的一种设备的结构示意图。如图8所示,该设备可以包括:处理器80、存储器81、具有触摸功能的显示屏82、输入装置83、输出装置84以及通信装置85。该设备中处理器80的数量可以是一个或者多个,图8中以一个处理器80为例。该设备的处理器80、存储器81、显示屏82、输入装置83、输出装置84以及通信装置85可以通过总线或者其他方式连接,图8中以通过总线连接为例。
存储器81作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例一到实施例四所述的视频清晰度评估模型训练方法对应的程序指令/模块(例如,上述实施例五的视频清晰度评估模型训练装置中的原始视频获取模块501、训练视频对获取模块502、标签标注模块503和模型训练模块504),或如本申请实施例五所述的视频推荐方法对应的程序指令/模块(例如,上述实施例六的视频推荐装置中的待推荐视频获取模块601、模型预测模块602、目标视频确定模块603和视频推送模块604)。存储器81可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作装置、至少一个功能所需的应用程序;存储数据区可存储根据设备的使用所创建的数据等。此外,存储器81可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器80可包括相对于处理器80远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
显示屏82为具有触摸功能的显示屏82,其可以是电容屏、电磁屏或者红外屏。一般而言,显示屏82设置为根据处理器80的指示显示数据,还设置为接收作用于显示屏82的触摸操作,并将相应的信号发送至处理器80或其他装置。可选的,当显示屏82为红外屏时,其还包括红外触摸框,该红外触摸框设置在显示屏82的四周,其还可以用于接收红外信号,并将该红外信号发送至处理器80或者其他设备。
通信装置85,设置为与其他设备建立通信连接,其可以是有线通信装置和/或无线通信装置。
输入装置83可设置为接收输入的数字或者字符信息,以及产生与设备的用户设置以及功能控制有关的键信号输入,还可以是设置为获取图像的摄像头以及获取音频数据的拾音设备。输出装置84可以包括扬声器等音频设备。需要说明的是,输入装置83和输出装置84的组成可以根据实际情况设定。
处理器80通过运行存储在存储器81中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述所述的视频清晰度评估模型训练方法和/或视频推荐方法。
在一实施例中,处理器80执行存储器81中存储的一个或多个程序时,实现本申请实施例提供的视频清晰度评估模型训练方法和/或视频推荐方法。
本申请实施例还提供一种计算机可读存储介质,所述存储介质中的指令由设备的处理器执行时,使得设备能够执行如上述方法实施例所述的视频清晰度评估模型训练方法和/或视频推荐方法。
对于装置、设备、存储介质实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本申请可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多个指令用以使得一台计算机设备(可以是机器人,个人计算机,服务器,或者网络设备等)执行本申请任意实施例所述的视频清晰度评估模型训练方法和/或视频推荐方法。
上述视频清晰度评估模型训练装置和视频推荐装置中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
应当理解,本申请的多个部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行装置执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(Programmable Gate Array,PGA),现场可编程门阵列(Field Programmable Gate Array,FPGA)等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。

Claims (18)

  1. 一种视频清晰度评估模型训练方法,包括:
    获取多个原始视频;
    基于所述多个原始视频获得包括清晰度不同的视频的训练视频对;
    对所述训练视频对中的视频进行标注,得到所述训练视频对的标签;
    采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。
  2. 根据权利要求1所述的方法,其中,所述基于所述多个原始视频获得包括清晰度不同的视频的训练视频对,包括:
    获取所述多个原始视频的图像质量评价参数;
    基于所述多个原始视频的图像视频评价参数将多个原始视频分为多个质量档次的原始视频;
    多次执行从每个质量档次的原始视频中提取一个原始视频得到一个视频组的步骤;
    从所述每个视频组中提取每两个原始视频得到多个训练视频对。
  3. 根据权利要求2所述的方法,其中,所述对所述训练视频对中的视频进行标注,得到所述训练视频对的标签,包括:
    基于每个训练视频对中所述原始视频所属的质量档次对清晰度高的原始视频和清晰度低的原始视频进行标注,得到所述每个训练视频对的标签。
  4. 根据权利要求1所述的方法,其中,所述基于所述多个原始视频获得包括清晰度不同的视频的训练视频对,包括:
    对所述多个原始视频进行图像处理,得到所述多个原始视频对应的处理后的视频;
    采用每个原始视频以及对所述每个原始视频进行图像处理后得到的视频组成一个训练视频对。
  5. 根据权利要求4所述的方法,其中,所述对所述多个原始视频进行图像处理,得到所述多个原始视频对应的处理后的视频,包括:
    对每个原始视频进行转码处理得到转码后的视频,所述转码后的视频的清晰度低于每个原始视频的清晰度,或者;
    对每个原始视频进行模糊处理,得到模糊处理后的视频,所述模糊处理后的视频的清晰度低于每个原始视频的清晰度。
  6. 根据权利要求4所述的方法,其中,所述对所述训练视频对中的视频进行标注,得到所述训练视频对的标签包括:
    将每个训练视频对中的原始视频标注为清晰度高的视频,所述每个训练视频对中的原始视频处理后的视频标注为清晰度低的视频,得到所述每个训练视频对的标签。
  7. 根据权利要求1-6任一项所述的方法,其中,所述采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型,包括:
    提取所述训练视频对中每个视频的编码信息;
    采用所述训练视频对、所述编码信息以及所述标签对模型进行训练,得到视频清晰度评估模型。
  8. 根据权利要求7所述的方法,其中,所述训练视频对的数量为多个;
    所述采用所述训练视频对、所述编码信息以及所述标签对模型进行训练,得到视频清晰度评估模型,包括:
    初始化视频清晰度评估模型的模型参数,所述视频清晰度评估模型包括卷积层和全连接层;
    随机提取一个训练视频对,并将所述一个训练视频对输入所述初始视频清晰度评估模型的卷积层中提取视频特征;
    将所述视频特征和所述一个训练视频对的编码信息输入全连接层中得到所述一个视频对中每个视频的清晰度得分;
    采用所述清晰度得分和所述一个训练视频对的标签计算损失率;
    如果所述损失率未满足预设条件,则采用所述损失率计算梯度;
    采用所述梯度调整模型参数,返回随机提取训练视频对,并将随机提取的训练视频对输入所述初始视频清晰度评估模型的卷积层中提取视频特征,直到所述损失率满足预设条件。
  9. 根据权利要求3所述的方法,在采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型之后,还包括:
    随机提取多个视频组,并根据随机提取多个视频组对所述视频清晰度评估模型的模型参数进行调整。
  10. 根据权利要求9所述的方法,其中,所述随机提取多个视频组,并根据随机提取多个视频组对所述视频清晰度评估模型的模型参数进行调整,包括:
    随机提取一个视频组,并将所述一个视频组输入至所述视频清晰度评估模 型中,得到所述一个视频组中每个原始视频的第一清晰度得分;
    基于从所述一个视频组提取每两个原始视频得到的训练视频对的标签计算所述一个视频组中的每个原始视频的第二清晰度得分;
    采用所述一个视频组中的每个原始视频的第二清晰度得分和所述一个视频组中的每个原始视频的第一清晰度得分计算损失率;
    在所述损失率未满足预设条件的情况下,采用所述损失率对所述视频清晰度评估模型进行调整,并返回随机提取一个视频组,并将所述一个视频组输入至所述视频清晰度评估模型中,得到所述一个视频组中每个原始视频的第一清晰度得分,直到所述损失率满足预设条件。
  11. 根据权利要求10所述的方法,其中,所述训练视频对的标签为所述训练视频对中清晰度高的原始视频的投票,基于从所述一个视频组提取每两个原始视频得到的训练视频对的标签计算所述一个视频组中的每个原始视频的第二清晰度得分,包括:
    基于从所述一个视频组得到的训练视频对的标签统计所述一个视频组内的每个原始视频所获得的投票数;
    获取所述一个视频组内训练视频对的总投票数;
    计算所述投票数和所述总投票数的比值作为所述每个原始视频的第二清晰度得分。
  12. 根据权利要求9-11任一项所述的方法,还包括:
    随机提取一个视频组,并将所述一个视频组输入所述视频清晰度评估模型中,得到所述一个视频组中每个原始视频的第三清晰度得分;
    基于所述一个视频组中每个原始视频的所述第三清晰度得分和第四清晰度得分计算所述视频清晰度评估模型的保序率,所述第四清晰度得分为在所述一个视频组中,采用从所述一个视频组提取每两个原始视频得到的训练视频对的标签计算的所述每个原始视频的清晰度得分;
    判断所述保序率是否大于预设阈值;
    响应于所述保序率不大于预设阈值的判断结果,返回随机提取多个视频组,并根据随机提取的多个视频组对所述视频清晰度评估模型的模型参数进行调整。
  13. 根据权利要求12所述的方法,其中,所述基于所述一个视频组中每个原始视频的所述第三清晰度得分和所述第四清晰度得分计算所述视频清晰度评估模型的保序率,包括:
    基于所述一个视频组中每个原始视频的第四清晰度得分对所述一个视频组中的原始视频进行排序,得到第一排序;
    基于所述一个视频组中每个原始视频的第三清晰度得分对所述一个视频组中的原始视频进行排序,得到第二排序;
    以所述第一排序为基准,统计出所述第二排序中排序错误的原始视频的排序出错数量;
    计算所述排序出错数量和排序总数量的比值;
    计算1与所述比值的差值,将所述差值作为保序率。
  14. 一种视频推荐方法,包括:
    获取多个待推荐视频;
    将所述多个待推荐视频输入视频清晰度评估模型中获得每个待推荐视频的清晰度得分;
    基于每个待推荐视频的清晰度得分从所述多个待推荐视频中确定出目标视频;
    将所述目标视频推送给用户;
    其中,所述视频清晰度评估模型通过权利要求1-13任一项所述的视频清晰度评估模型训练方法所训练。
  15. 一种视频清晰度评估模型训练装置,包括:
    原始视频获取模块,设置为获取多个原始视频;
    训练视频对获取模块,设置为基于所述多个原始视频获得包括清晰度不同的视频的训练视频对;
    标签标注模块,设置为对所述训练视频对中视频进行标注,得到所述训练视频对的标签;
    模型训练模块,设置为采用所述训练视频对和所述标签对模型进行训练,得到视频清晰度评估模型。
  16. 一种视频推荐装置,包括:
    待推荐视频获取模块,设置为获取多个待推荐视频;
    模型预测模块,设置为将所述多个待推荐视频输入视频清晰度评估模型中获得每个待推荐视频的清晰度得分;
    目标视频确定模块,设置为基于每个待推荐视频的清晰度得分从所述多个 待推荐视频中确定出目标视频;
    视频推送模块,设置为将所述目标视频推送给用户;
    其中,所述视频清晰度评估模型通过权利要求1-13任一项所述的视频清晰度评估模型训练方法所训练。
  17. 一种设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-13中任一项所述的视频清晰度评估模型训练方法和如权利要求14所述的视频推荐方法中的至少之一。
  18. 一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现如权利要求1-13中任一项所述的视频清晰度评估模型训练方法和如权利要求14所述的视频推荐方法中的至少之一。
PCT/CN2020/135998 2019-12-27 2020-12-14 视频清晰度评估模型训练方法、视频推荐方法及相关装置 WO2021129435A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911380270.9A CN111163338B (zh) 2019-12-27 2019-12-27 视频清晰度评估模型训练方法、视频推荐方法及相关装置
CN201911380270.9 2019-12-27

Publications (1)

Publication Number Publication Date
WO2021129435A1 true WO2021129435A1 (zh) 2021-07-01

Family

ID=70558712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135998 WO2021129435A1 (zh) 2019-12-27 2020-12-14 视频清晰度评估模型训练方法、视频推荐方法及相关装置

Country Status (2)

Country Link
CN (1) CN111163338B (zh)
WO (1) WO2021129435A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023056896A1 (zh) * 2021-10-08 2023-04-13 钉钉(中国)信息技术有限公司 清晰度的确定方法、装置及设备

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111163338B (zh) * 2019-12-27 2022-08-12 广州市百果园网络科技有限公司 视频清晰度评估模型训练方法、视频推荐方法及相关装置
CN111597361B (zh) * 2020-05-19 2021-09-14 腾讯科技(深圳)有限公司 多媒体数据处理方法、装置、存储介质及设备
CN111767428A (zh) * 2020-06-12 2020-10-13 咪咕文化科技有限公司 视频推荐方法、装置、电子设备及存储介质
CN111814759B (zh) * 2020-08-24 2020-12-18 腾讯科技(深圳)有限公司 人脸质量标签值的获取方法、装置、服务器及存储介质
CN112367518B (zh) * 2020-10-30 2021-07-13 福州大学 一种输电线路无人机巡检图像质量评价方法
CN113038165B (zh) * 2021-03-26 2023-07-25 腾讯音乐娱乐科技(深圳)有限公司 确定编码参数组的方法、设备及存储介质
CN113743448B (zh) * 2021-07-15 2024-04-30 上海朋熙半导体有限公司 模型训练数据获取方法、模型训练方法和装置
CN116506622B (zh) * 2023-06-26 2023-09-08 瀚博半导体(上海)有限公司 模型训练方法及视频编码参数优化方法和装置
CN117041625B (zh) * 2023-08-02 2024-04-19 成都梵辰科技有限公司 一种超高清视频图像质量检测网络构建方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040056907A1 (en) * 2002-09-19 2004-03-25 The Penn State Research Foundation Prosody based audio/visual co-analysis for co-verbal gesture recognition
CN104318562A (zh) * 2014-10-22 2015-01-28 百度在线网络技术(北京)有限公司 一种用于确定互联网图像的质量的方法和装置
CN107833214A (zh) * 2017-11-03 2018-03-23 北京奇虎科技有限公司 视频清晰度检测方法、装置、计算设备及计算机存储介质
CN109831680A (zh) * 2019-03-18 2019-05-31 北京奇艺世纪科技有限公司 一种视频清晰度的评价方法及装置
CN110413840A (zh) * 2019-07-10 2019-11-05 网易(杭州)网络有限公司 一种对视频确定标签的神经网络、方法、介质和计算设备
US20190370608A1 (en) * 2018-05-31 2019-12-05 Seoul National University R&Db Foundation Apparatus and method for training facial locality super resolution deep neural network
CN111163338A (zh) * 2019-12-27 2020-05-15 广州市百果园网络科技有限公司 视频清晰度评估模型训练方法、视频推荐方法及相关装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659806B (zh) * 2017-08-22 2019-08-16 华为技术有限公司 视频质量的评估方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040056907A1 (en) * 2002-09-19 2004-03-25 The Penn State Research Foundation Prosody based audio/visual co-analysis for co-verbal gesture recognition
CN104318562A (zh) * 2014-10-22 2015-01-28 百度在线网络技术(北京)有限公司 一种用于确定互联网图像的质量的方法和装置
CN107833214A (zh) * 2017-11-03 2018-03-23 北京奇虎科技有限公司 视频清晰度检测方法、装置、计算设备及计算机存储介质
US20190370608A1 (en) * 2018-05-31 2019-12-05 Seoul National University R&Db Foundation Apparatus and method for training facial locality super resolution deep neural network
CN109831680A (zh) * 2019-03-18 2019-05-31 北京奇艺世纪科技有限公司 一种视频清晰度的评价方法及装置
CN110413840A (zh) * 2019-07-10 2019-11-05 网易(杭州)网络有限公司 一种对视频确定标签的神经网络、方法、介质和计算设备
CN111163338A (zh) * 2019-12-27 2020-05-15 广州市百果园网络科技有限公司 视频清晰度评估模型训练方法、视频推荐方法及相关装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023056896A1 (zh) * 2021-10-08 2023-04-13 钉钉(中国)信息技术有限公司 清晰度的确定方法、装置及设备

Also Published As

Publication number Publication date
CN111163338B (zh) 2022-08-12
CN111163338A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
WO2021129435A1 (zh) 视频清晰度评估模型训练方法、视频推荐方法及相关装置
CN108537134B (zh) 一种视频语义场景分割及标注方法
WO2022116888A1 (zh) 一种视频数据处理方法、装置、设备以及介质
US10452919B2 (en) Detecting segments of a video program through image comparisons
CN108140032B (zh) 用于自动视频概括的设备和方法
KR101967086B1 (ko) 비디오 스트림들의 엔티티 기반 시간적 세그먼트화
CN109344884B (zh) 媒体信息分类方法、训练图片分类模型的方法及装置
WO2020119350A1 (zh) 视频分类方法、装置、计算机设备和存储介质
WO2023280065A1 (zh) 一种面向跨模态通信系统的图像重建方法及装置
KR20210134528A (ko) 비디오 처리 방법, 장치, 전자 기기, 저장 매체 및 컴퓨터 프로그램
WO2020253127A1 (zh) 脸部特征提取模型训练方法、脸部特征提取方法、装置、设备及存储介质
EP2568429A1 (en) Method and system for pushing individual advertisement based on user interest learning
US20200349385A1 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
US20090263014A1 (en) Content fingerprinting for video and/or image
CN112312231B (zh) 一种视频图像编码方法、装置、电子设备及介质
CN109964221B (zh) 使用镜头持续时间相关来确定视频之间的相似性
WO2022121485A1 (zh) 图像的多标签分类方法、装置、计算机设备及存储介质
US10769208B2 (en) Topical-based media content summarization system and method
CN110489574B (zh) 一种多媒体信息推荐方法、装置和相关设备
CN110298270B (zh) 一种基于跨模态重要性感知的多视频摘要方法
US9971940B1 (en) Automatic learning of a video matching system
EP3874404A1 (en) Video recognition using multiple modalities
CN110958467A (zh) 视频质量预测方法和装置及电子设备
CN111078944B (zh) 视频内容热度预测方法和装置
CN110769259A (zh) 一种视频目标跟踪轨迹内容的图像数据压缩方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20904902

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20904902

Country of ref document: EP

Kind code of ref document: A1