CN110704674A - Video playing integrity prediction method and device - Google Patents

Video playing integrity prediction method and device Download PDF

Info

Publication number
CN110704674A
CN110704674A CN201910845413.2A CN201910845413A CN110704674A CN 110704674 A CN110704674 A CN 110704674A CN 201910845413 A CN201910845413 A CN 201910845413A CN 110704674 A CN110704674 A CN 110704674A
Authority
CN
China
Prior art keywords
user
video
data
video playing
playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910845413.2A
Other languages
Chinese (zh)
Other versions
CN110704674B (en
Inventor
许良武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Biying Technology Co ltd
Jiangsu Suning Cloud Computing Co ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN201910845413.2A priority Critical patent/CN110704674B/en
Publication of CN110704674A publication Critical patent/CN110704674A/en
Priority to CA3153598A priority patent/CA3153598A1/en
Priority to PCT/CN2020/097861 priority patent/WO2021042826A1/en
Application granted granted Critical
Publication of CN110704674B publication Critical patent/CN110704674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content

Abstract

The invention discloses a method and a device for predicting video playing integrity, and belongs to the technical field of big data and deep learning. The method comprises the following steps: inputting data to be tested of a user video playing characteristic vector; calculating through a preset video playing integrity prediction model; and outputting the video playing integrity value of the data to be detected, wherein the preset video playing integrity prediction model is obtained by training user video playing training data, and the user video playing characteristic vector at least comprises a user characteristic vector and a video characteristic vector. According to the invention, by introducing a playing integrity improvement strategy, the video playing integrity of the user is predicted, the interest data closer to the reality of the user is obtained in the aspect of the watching duration which is an important information stream, and the identification accuracy of the interest of the user is improved, so that the recommendation true relevance is improved, and the watching duration and the satisfaction of the user are greatly improved.

Description

Video playing integrity prediction method and device
Technical Field
The invention relates to the technical field of big data and deep learning, in particular to a method and a device for predicting video playing integrity.
Background
The video recommendation system is based on massive users and videos, based on big data analysis and artificial intelligence technology, the video recommendation system is constructed by researching interest preference of the users, high-quality videos interesting to the users are recommended to target users, the problem of information overload is solved, the effect of thousands of people is achieved, and the stay time and the satisfaction degree of the users are improved. The video recommendation system generally comprises a recall stage and a sorting stage, wherein the recall stage is to select a part of candidate sets from massive videos, and the sorting stage is to perform more accurate unified calculation on the candidate sets in the recall stage and screen a small amount of high-quality videos which are most interesting to users from the candidate sets.
At present, the number of registered users of some video playing platforms reaches hundreds of millions, the daily average UV (user number) exceeds ten million, and the daily average playing amount of a mobile terminal is higher. In order to enable users to find out interesting contents in massive videos. The recommendation system is constructed by collecting data of multiple dimensions (including basic information of the user, playing history of the user, attributes of the video, attributes of the environment and the like), and the user and the potentially favorite video are connected. Short video recommendation can use less information, generally only title and video category, and the current commonly used sequencing model adopts a CTR (Click-Through-Rate) estimation method. The title party can be promoted based on the click model, so that the stay time of the user cannot be increased, and the watching time and the satisfaction degree of the user are influenced. The watching duration is used as an important optimization target of the information flow, so that playing integrity optimization is urgently needed to be introduced into the short video sequencing model, and the real relevance of recommendation is improved, so that the watching duration and satisfaction of a user are improved.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for predicting video playing integrity, which predict video playing integrity of a user by introducing a playing integrity improvement policy, obtain more real interest data of the user in the aspect of viewing duration, which is an important information stream, and improve identification accuracy of user interest, thereby improving recommended real relevance and realizing great improvement of viewing duration and satisfaction of the user.
The technical scheme is as follows:
in one aspect, a method for predicting video playing integrity is provided, where the method includes:
inputting data to be tested of a user video playing characteristic vector;
calculating through a preset video playing integrity prediction model;
outputting the video playing integrity value of the data to be tested,
the preset video playing integrity prediction model is obtained through training of user video playing training data, and the user video playing characteristic vectors at least comprise user characteristic vectors and video characteristic vectors.
Further, the method further comprises:
collecting video playing information data of a user;
screening the user video playing information data to obtain a screening result;
and extracting the characteristics of the screening result to generate the data to be tested of the user video playing characteristic vector.
Further, collecting the user video playing information data comprises: acquiring user video playing information data comprising user information, user playing history information, video information and user client information; and/or the presence of a gas in the gas,
screening the user video playing information data to obtain a screening result, wherein the screening result comprises the following steps: screening the user video playing information data by using a multi-channel recall mode comprising user cooperation, user search, a theme model, hot recommendation, user portrait and a video label to obtain a screening result; and/or the presence of a gas in the gas,
extracting the characteristics of the screening result to generate the data to be tested of the user video playing characteristic vector, wherein the data to be tested comprises the following steps: and utilizing word vectors obtained by training a preset massive corpus through a word2vec model and IDF weight training, segmenting the video titles and video classification labels in the screening result to generate video word vectors, and then performing word vector calculation according to the user playing history information and time attenuation to generate user word vectors.
Further, the preset video playback integrity prediction model includes DNNs of three hidden layers.
Further, the preset video playing integrity prediction model is obtained by inputting and training the user video playing training data, wherein the user video playing training data is an independent variable, the user watching historical video playing integrity value is a dependent variable, and the user video playing training data is a feature vector of a historical user vector and a historical video vector combination constructed according to user playing historical information.
Further, the method further comprises:
and performing high-to-low sequencing operation on the video playing integrity value of the data to be tested to obtain a video sequencing result of top N, and recommending the video sequencing result to a corresponding user according to the priority level, wherein N is an integer greater than 1.
In another aspect, an apparatus for predicting video playing integrity is provided, the apparatus includes a model calculation module configured to:
inputting data to be tested of a user video playing characteristic vector, calculating through a preset video playing integrity prediction model, and outputting a video playing integrity value of the data to be tested, wherein the preset video playing integrity prediction model is obtained through training of user video playing training data, and the user video playing characteristic vector at least comprises a user characteristic vector and a video characteristic vector.
Furthermore, the device also comprises a data collection module, a data screening module and a vector generation module, wherein the data collection module collects the user video playing information data; the data screening module screens the user video playing information data to obtain a screening result; and the vector generation module is used for extracting the characteristics of the screening result and generating the data to be tested of the user video playing characteristic vector.
Further, the data collection module acquires the user video playing information data including user information, user playing history information, video information and user client information; and/or the presence of a gas in the gas,
the data screening module screens the video playing information data of the user by using a multi-channel recall mode comprising user cooperation, user search, a theme model, popular recommendation, a user portrait and a video tag to obtain a screening result; and/or the presence of a gas in the gas,
the vector generation module performs feature extraction on the screening result to generate data to be tested of the user video playing feature vector, and the data to be tested comprises: and utilizing word vectors obtained by training a preset massive corpus through a word2vec model and IDF weight training, segmenting the video titles and video classification labels in the screening result to generate video word vectors, and then performing word vector calculation according to the user playing history information and time attenuation to generate user word vectors.
Further, the device further comprises a data recommendation module, wherein the data recommendation module is used for performing high-to-low sorting operation on the video playing integrity value of the data to be tested, obtaining a video sorting result of top N, and recommending the video sorting result to a corresponding user according to the priority level, wherein N is an integer greater than 1.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1. by changing a traditional CTR estimation method, introducing a video playing integrity index, predicting the video playing integrity of different users through a trained preset video playing integrity prediction model, acquiring more real interest data of the users in the aspect of watching duration which is an important information stream through the prediction result of the video playing integrity, and improving the identification accuracy of the user interest, so that the recommended real relevance is improved, and the watching duration and the satisfaction degree of the users are greatly improved;
2. by means of user portrait vectorization representation, interest transfer of a user is reflected by combining time attenuation of user behaviors, hot videos and error videos are filtered in the user portrait process, influence on real interest of the user is avoided, and user portrait is more accurate;
3. by collecting related data such as user behavior data, video quality, video information and the like, vectorization expression of user characteristics, video attributes and the like is effectively carried out, video playing proportion, category proportion, other environmental information and the like are played in each time period, different characteristics and different data sources are fused by deep learning modeling and aiming at the possible playing completeness of videos which are not watched by a user, a short video recommendation sequencing model is applied, so that a good effect is obtained, and the average watching time of the user is prolonged;
4. by constructing user characteristics, video characteristics, context characteristics, client classification and other characteristics, adopting deep learning modeling, applying a playing integrity estimation mode to randomly selected 10% of user groups through AB Test, and comparing indexes such as CTR, daily average playing amount, user average playing integrity and the like through a final report. Finally, under the condition of CTR micro-drop, the average playing integrity and daily average playing quantity of a user are improved to a large extent;
5. a TF-IDF algorithm is adopted in the field of video recommendation, and key information of a video is effectively highlighted through an IDF value;
6. the real relevance of the recommendation is improved through the prediction of the playing integrity of the short video, and the increase of the stay time of the user is sought.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a video playing integrity prediction method according to an embodiment of the present invention;
fig. 2 is a flowchart of a video playing integrity prediction method according to another embodiment of the present invention;
FIG. 3 is a depiction of a preferred embodiment of feature engineering construction in step 203;
fig. 4 is a diagram illustrating a preferred implementation of a preset video playing integrity prediction model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for predicting video playing integrity according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a video playback integrity prediction apparatus according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It is to be noted that, in the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
According to the method and the device for predicting the video playing integrity, provided by the embodiment of the invention, the traditional CTR estimation method is changed, the index of the video playing integrity is introduced, the video playing integrity of different users is predicted through a trained preset video playing integrity prediction model, and the important information stream of the watching duration is obtained through the prediction result of the video playing integrity, so that the real interest data of the users are closer to the real interest data, the identification accuracy of the user interest is improved, the recommended real relevance is improved, and the watching duration and the satisfaction of the users are greatly improved. Therefore, the video playing integrity prediction method and device can be widely applied to various network video application scenes related to user interest mining, user requirement matching or user recommendation.
The following describes the video playing integrity prediction method and apparatus provided by the embodiments of the present invention in detail with reference to the specific embodiments and the accompanying drawings.
Fig. 1 is a flowchart of a video playing integrity prediction method according to an embodiment of the present invention. As shown in fig. 1, the method for predicting the completeness of audio playback includes the following steps:
101. inputting data to be tested of a user video playing characteristic vector;
102. calculating through a preset video playing integrity prediction model;
103. and outputting the video playing integrity value of the data to be detected.
The method is different from the traditional user technology that only a few pieces of collected information such as titles, video categories or video click rates are adopted, wherein the user video playing feature vector at least comprises a user feature vector and a video feature vector, the user feature comprises a user portrait, a user historical playing record or other information related to a user, and the video feature comprises a video category, a video duration, a video time, a video playing integrity record or other information related to a published video. Besides the user feature vector and the video feature vector, the user video playing feature vector may further include other video playing related information such as user client classification information. In addition, the preset video playing integrity prediction model is obtained by training video playing training data of a user, and the specifically adopted video playing integrity prediction model can be obtained by designing and constructing a corresponding deep learning model according to needs for training, or can be obtained by training any possible deep learning model in the prior art, and the embodiment of the invention is not particularly limited.
Fig. 2 is a flowchart of a video playing integrity prediction method according to another embodiment of the present invention. As shown in fig. 2, the method for predicting the video playing integrity includes the following steps:
the 201 collects user video playing information data.
Specifically, user video playing information data including user information, user playing history information, video information, and user client information is acquired.
The process is a data collection stage of user video playing information, wherein the user video playing information mainly comprises user information, user playing history information, video information and user client information, the user information mainly refers to user portrait information and comprises basic attribute information (gender, age and the like) of a user, the user playing history information comprises the percentage of each hour of user historical playing, the percentage of each type of video watched by the user and the like, and the client information comprises the type of user equipment, the type of an operator and the like. In addition, the user video playing information can also collect the time when the user watches each video, the user position information and other contextual information related to the user playing video secondary according to needs.
It should be noted that, in the process of collecting the video playing information data of the user in step 201, the process may be implemented in other ways besides the ways described in the above steps, and the specific ways are not limited in the embodiment of the present invention.
202. And screening the video playing information data of the user to obtain a screening result.
Specifically, screening the video playing information data of the user to obtain a screening result includes: and screening the video playing information data of the user by using a multi-channel recall mode comprising user cooperation, user search, a theme model, popular recommendation, user portrait and a video label to obtain a screening result.
The process is a recall stage of the user video playing information data rough screening, and preferably, the process mainly screens the video information in the user video playing information data. Because the video is huge in size and can reach millions of orders, the cost of data preprocessing by directly inputting the model is too high, and the time is very slow, so that some video information with higher quality or more possibly meeting the preference of the user can be roughly screened out through a recall stage. Recalls typically employ multi-channel recalls such as user collaboration, user search, topic models, topical recommendations, user portrayal and video tags, to select a portion of the desired candidate set from a large volume of video.
It should be noted that, in the step 202, the process of filtering the video playing information data of the user may be implemented in other ways besides the above-mentioned ways, and the specific way is not limited in the embodiment of the present invention.
203. And performing feature extraction on the screening result to generate to-be-detected data of the user video playing feature vector.
Specifically, the feature extraction is performed on the screening result to generate the data to be tested of the user video playing feature vector, and the method includes: word vectors obtained by training a preset massive corpus through a word2vec model and IDF weight training are utilized, word segmentation is carried out on video titles and video classification labels in screening results to generate video word vectors, then word vector calculation is carried out according to user playing history information and time attenuation, and user word vectors are generated. The user word vector and the video word vector correspond to the user feature vector and the video feature vector.
The process is a feature engineering stage, as shown in fig. 3, preferably, on a large corpus, a 200-dimensional word vector of each word is trained through word segmentation and a word2vec model, the potential meaning of the word is represented through a vectorization form so as to represent the relationship between the words, a video title is segmented and word vector representation of the video is obtained through calculation by combining information such as IDF obtained through training. And calculating the word vector representation of the user according to the word vector representation of the video played by the user history and the time attenuation, wherein in the process of calculating the user vector, the video labeled by the top3 of the user is counted according to the video label category and accounts for more than 10 percent. According to the play history analysis of the user, the video corresponding to the video tag with lower percentage is not the potential interest point of the user, the play of the video tag is often a hotspot video or the error point operation of the user, and the video tag can be discarded through feature extraction.
It should be noted that, in the process of performing feature extraction on the filtering result in step 203 to generate the user video playing feature vector, the process may be implemented in other ways besides the above-mentioned way, and the specific way is not limited in the embodiment of the present invention.
204. Inputting the data to be tested of the video playing characteristic vector of the user.
The preset video playing integrity prediction model is obtained by inputting user video playing training data into training, wherein the user video playing training data are independent variables, the user watching historical video playing integrity value is a dependent variable, and the user video playing training data are historical user vectors and feature vectors of historical video vector combinations constructed according to user playing historical information and are used for training to obtain the expected preset video playing integrity prediction model.
Preferably, the preset video playing integrity prediction model comprises DNNs of three hidden layers, and the input information of the input layer comprises word vector representation of the user (word vectors of each video are calculated by combining IDF weight with word segmentation of the video played by the user history, and then 200-dimensional word vectors calculated by comprehensively considering time attenuation), basic images (gender, age, and the like) of the user, the playing video proportion (by hour) of each time period, the video proportion of each category, and the like; word vectors (200 dimensions) of the video, quality of the video (average playing integrity, video heat and the like), video release time and video category; device type, operator type; a region; the current time period, etc.
It should be noted that, the data content and the form of the data to be measured of the video playing feature vector of the user are input in step 204, and the process may be implemented in other ways besides the ways described in the above steps, and the specific way is not limited in the embodiment of the present invention.
205. And calculating through a preset video playing integrity prediction model.
206. And outputting the video playing integrity value of the data to be detected.
Preferably, after the step 206, the following steps are further included:
and performing high-to-low sequencing operation on the video playing integrity value of the data to be tested to obtain a video sequencing result of topN, and recommending the video sequencing result to a corresponding user according to the priority level, wherein N is an integer greater than 1. It should be noted that, the steps of the video playing integrity value sorting operation may also be designed in the preset video playing integrity prediction model calculation flow as required, as shown in fig. 4, which is not particularly limited in the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a video playing integrity prediction apparatus according to an embodiment of the present invention, as shown in fig. 5, the video playing integrity prediction apparatus includes a model calculation module 1, and the model calculation module 1 is configured to: inputting data to be tested of a user video playing characteristic vector, calculating through a preset video playing integrity prediction model, and outputting a video playing integrity value of the data to be tested, wherein the preset video playing integrity prediction model is obtained through training of user video playing training data, and the user video playing characteristic vector at least comprises a user characteristic vector and a video characteristic vector.
Fig. 6 is a schematic structural diagram of a video playback integrity prediction apparatus according to another embodiment of the present invention. As shown in fig. 6, the video playback integrity prediction apparatus 2 includes a data collection module 21, a data filtering module 22, a vector generation module 23, a model calculation module 24, and a data recommendation module 25.
The data collection module 21 collects user video playing information data. Specifically, the data collection module 21 obtains user video playing information data including user information, user playing history information, video information, and user client information.
The data filtering module 22 filters the video playing information data of the user to obtain a filtering result. Specifically, the data filtering module 22 filters the video playing information data of the user by using a multi-channel recall mode including user cooperation, user search, a topic model, hot recommendation, user portrait and a video tag, and obtains a filtering result.
The vector generation module 23 performs feature extraction on the screening result to generate a user video playing feature vector. Specifically, the vector generation module 23 performs feature extraction on the screening result to generate the data to be tested of the user video playing feature vector, including: word vectors obtained by training a preset massive corpus through a word2vec model and IDF weight training are utilized, word segmentation is carried out on video titles and video classification labels in screening results to generate video word vectors, then word vector calculation is carried out according to user playing history information and time attenuation, and user word vectors are generated. The user word vector and the video word vector herein correspond to a user feature vector and a video feature vector described below.
The model calculation module 24 inputs data to be measured of the user video playing feature vector, calculates the data through a preset video playing integrity prediction model, and outputs a video playing integrity value of the data to be measured, wherein the preset video playing integrity prediction model is obtained through training of user video playing training data, and the user video playing feature vector at least comprises a user feature vector and a video feature vector.
And the data recommendation module 25 performs high-to-low sequencing operation on the video playing integrity value of the data to be tested to obtain a video sequencing result of topN, and recommends the video sequencing result to the corresponding user according to the priority level, wherein N is an integer greater than 1.
A preferred embodiment of the method and apparatus for predicting video playing integrity according to the embodiments of the present invention is described below.
First, the word segmentation tool of this embodiment has a word bank, and additionally, adds a huge corpus of internet news, hundred degree encyclopedia, wikipedia, and the like, which is obtained by a crawler system, as a supplementary word bank, performs word segmentation and word vector training for the corpus, and finally obtains a word vector representation of each word (the word vector dimension is 200 dimensions, which is determined by experimental effects, and then performs normalization on vectors).
And (3) under the language library, performing TF-IDF training to obtain an IDF value, normalizing, and then performing weight lifting on the supplementary word library to 1, wherein the weight lifting is similar to an attention mechanism, and more concentration is put on the words.
The video information table is shown in table 1 below, and carries video id, video title information, category label, video label information, release time, and the like. And (4) segmenting the video information, searching a word vector table of words, and combining with an IDF value table to perform weighting calculation to obtain word vector representation (performing normalization) of the current video.
Figure BDA0002192676750000111
Table 1 video information table
In the user portrait acquisition stage, i.e. the calculation process of the user word vector, the targeted user group is an active user, i.e. a user who has a certain play amount (e.g. plays more than 10 videos) in the latest period (e.g. the latest 30 days) and is active recently (e.g. has a play record in the latest 7 days). The word vector calculation of the user is refined according to the label category, for example, the number of video played by the user in a period is 100, wherein the number of sports is 60, the number of finance and economics is 20, the number of fun is 15, the number of society is 4, and the number of health is 1; in the process of user portrayal, users are portrayed under the label categories with the proportion of TOP3 exceeding 10%, main interest points of the users can be obtained through the method, and few error point operations and hot point videos which cannot represent the interest points of the users are eliminated. In this example, sports accounts for 60%, finance accounts for 20%, fun accounts for 15%, society accounts for 4%, and health accounts for 1%; therefore, the user needs to be represented in three dimensions of sports, finance and fun aiming at the current user, and the word vector representation of the corresponding dimension of the user needs to be calculated.
In the process of calculating the word vector of the user under different label categories of the user, the word vector representation of the user is calculated by combining time attenuation factors (such as 5 days of attenuation period, 0.95 attenuation coefficient, for example, the video played on the 12 th day before the current date spans two attenuation periods, and needs to be attenuated by 0.95^ 2).
In the characteristic engineering construction stage, a user word vector (200 dimensions), a video word vector (200 dimensions), a user watching category ratio, a user history playing hour ratio, a user gender, a user age (divided according to the ages of 20, 20-30, 30-40, 40-50 and 50, and on-hot coding), a current video classification label, a video duration (unit second), a video release time (days from the current time), a video average playing integrity (average playing integrity played by a user in the last 24 hours), a hotness grade (divided into 5 grades according to the playing times and one-hot coding), a user watching video time (day of the week and the current time period and performing one-hot coding), position information (performing one-hot coding according to provinces), a terminal type (one-hot coding), a user history playing time (unit) and a user history playing time (unit second), a user gender, a user age (divided according to the ages of 20, 20-30, 30-40, 40-50 and, Operator type (one-hot code).
And constructing the characteristics according to the playing records of the user in the last period (such as the last 30 days), and training the deep learning model by combining the playing integrity of the video played by the user.
And predicting the possible playing integrity of the target user for the unplayed video by the model aiming at the recommendation result set given to the user in the recall stage, and performing reverse arrangement according to the playing integrity to generate a final recommendation result set.
It should be noted that: the video playing integrity prediction apparatus provided in the foregoing embodiment is exemplified by only the division of the above functional modules when triggering a video playing integrity prediction service, and in practical applications, the above function allocation may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions. In addition, the video playing integrity prediction apparatus and the video playing integrity prediction method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.
All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.
In summary, the method and apparatus for predicting video playing integrity provided by the embodiments of the present invention have the following advantages, compared with the prior art:
1. by changing a traditional CTR estimation method, introducing a video playing integrity index, predicting the video playing integrity of different users through a trained preset video playing integrity prediction model, acquiring more real interest data of the users in the aspect of watching duration which is an important information stream through the prediction result of the video playing integrity, and improving the identification accuracy of the user interest, so that the recommended real relevance is improved, and the watching duration and the satisfaction degree of the users are greatly improved;
2. by means of user portrait vectorization representation, interest transfer of a user is reflected by combining time attenuation of user behaviors, hot videos and error videos are filtered in the user portrait process, influence on real interest of the user is avoided, and user portrait is more accurate;
3. by collecting related data such as user behavior data, video quality, video information and the like, vectorization expression of user characteristics, video attributes and the like is effectively carried out, video playing proportion, category proportion, other environmental information and the like are played in each time period, different characteristics and different data sources are fused by deep learning modeling and aiming at the possible playing completeness of videos which are not watched by a user, a short video recommendation sequencing model is applied, so that a good effect is obtained, and the average watching time of the user is prolonged;
4. by constructing user characteristics, video characteristics, context characteristics, client classification and other characteristics, adopting deep learning modeling, applying a playing integrity estimation mode to randomly selected 10% of user groups through AB Test, and comparing indexes such as CTR, daily average playing amount, user average playing integrity and the like through a final report. Finally, under the condition of CTR micro-drop, the average playing integrity and daily average playing quantity of a user are improved to a large extent;
5. a TF-IDF algorithm is adopted in the field of video recommendation, and key information of a video is effectively highlighted through an IDF value;
6. the real relevance of the recommendation is improved through the prediction of the playing integrity of the short video, and the increase of the stay time of the user is sought.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for predicting video playing integrity, the method comprising:
inputting data to be tested of a user video playing characteristic vector;
calculating through a preset video playing integrity prediction model;
outputting the video playing integrity value of the data to be tested,
the preset video playing integrity prediction model is obtained through training of user video playing training data, and the user video playing characteristic vectors at least comprise user characteristic vectors and video characteristic vectors.
2. The method of claim 1, further comprising:
collecting video playing information data of a user;
screening the user video playing information data to obtain a screening result;
and extracting the characteristics of the screening result to generate the data to be tested of the user video playing characteristic vector.
3. The method of claim 2,
collecting the user video playing information data, comprising: acquiring user video playing information data comprising user information, user playing history information, video information and user client information; and/or the presence of a gas in the gas,
screening the user video playing information data to obtain a screening result, wherein the screening result comprises the following steps: screening the user video playing information data by using a multi-channel recall mode comprising user cooperation, user search, a theme model, hot recommendation, user portrait and a video label to obtain a screening result; and/or the presence of a gas in the gas,
extracting the characteristics of the screening result to generate the data to be tested of the user video playing characteristic vector, wherein the data to be tested comprises the following steps: and utilizing word vectors obtained by training a preset massive corpus through a word2vec model and IDF weight training, segmenting the video titles and video classification labels in the screening result to generate video word vectors, and then performing word vector calculation according to the user playing history information and time attenuation to generate user word vectors.
4. The method of claim 1, wherein the predetermined video playback integrity prediction model comprises DNNs of three hidden layers.
5. The method of claim 4, wherein the preset video playing integrity prediction model is obtained by inputting and training the user video playing training data, wherein the user video playing training data is an independent variable, the user watching historical video playing integrity value is a dependent variable, and the user video playing training data is a feature vector of a historical user vector and a historical video vector combination constructed according to user playing history information.
6. The method of claim 1, further comprising:
and performing high-to-low sequencing operation on the video playing integrity value of the data to be tested to obtain a video sequencing result of topN, and recommending the video sequencing result to a corresponding user according to the priority level, wherein N is an integer greater than 1.
7. An apparatus for predicting video playback integrity, the apparatus comprising a model computation module configured to:
inputting data to be tested of a user video playing characteristic vector, calculating through a preset video playing integrity prediction model, and outputting a video playing integrity value of the data to be tested, wherein the preset video playing integrity prediction model is obtained through training of user video playing training data, and the user video playing characteristic vector at least comprises a user characteristic vector and a video characteristic vector.
8. The apparatus according to claim 7, further comprising a data collection module, a data filtering module and a vector generation module, wherein the data collection module collects user video playing information data; the data screening module screens the user video playing information data to obtain a screening result; and the vector generation module is used for extracting the characteristics of the screening result and generating the data to be tested of the user video playing characteristic vector.
9. The apparatus of claim 8,
the data collection module acquires user video playing information data comprising user information, user playing history information, video information and user client information; and/or the presence of a gas in the gas,
the data screening module screens the video playing information data of the user by using a multi-channel recall mode comprising user cooperation, user search, a theme model, popular recommendation, a user portrait and a video tag to obtain a screening result; and/or the presence of a gas in the gas,
the vector generation module performs feature extraction on the screening result to generate data to be tested of the user video playing feature vector, and the data to be tested comprises: and utilizing word vectors obtained by training a preset massive corpus through a word2vec model and IDF weight training, segmenting the video titles and video classification labels in the screening result to generate video word vectors, and then performing word vector calculation according to the user playing history information and time attenuation to generate user word vectors.
10. The device according to claim 7, further comprising a data recommendation module, wherein the data recommendation module is configured to perform a high-to-low sorting operation on the video playing integrity value of the data to be tested, obtain a video sorting result of topN, and recommend the video sorting result to a corresponding user according to a priority level, where N is an integer greater than 1.
CN201910845413.2A 2019-09-05 2019-09-05 Video playing integrity prediction method and device Active CN110704674B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910845413.2A CN110704674B (en) 2019-09-05 2019-09-05 Video playing integrity prediction method and device
CA3153598A CA3153598A1 (en) 2019-09-05 2020-06-24 Method of and device for predicting video playback integrity
PCT/CN2020/097861 WO2021042826A1 (en) 2019-09-05 2020-06-24 Video playback completeness prediction method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910845413.2A CN110704674B (en) 2019-09-05 2019-09-05 Video playing integrity prediction method and device

Publications (2)

Publication Number Publication Date
CN110704674A true CN110704674A (en) 2020-01-17
CN110704674B CN110704674B (en) 2022-11-25

Family

ID=69195102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910845413.2A Active CN110704674B (en) 2019-09-05 2019-09-05 Video playing integrity prediction method and device

Country Status (3)

Country Link
CN (1) CN110704674B (en)
CA (1) CA3153598A1 (en)
WO (1) WO2021042826A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538912A (en) * 2020-07-07 2020-08-14 腾讯科技(深圳)有限公司 Content recommendation method, device, equipment and readable storage medium
CN111565316A (en) * 2020-07-15 2020-08-21 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN111918136A (en) * 2020-07-04 2020-11-10 中信银行股份有限公司 Interest analysis method and device, storage medium and electronic equipment
CN112035740A (en) * 2020-08-19 2020-12-04 广州市百果园信息技术有限公司 Project use duration prediction method, device, equipment and storage medium
WO2021042826A1 (en) * 2019-09-05 2021-03-11 苏宁云计算有限公司 Video playback completeness prediction method and apparatus
CN112887795A (en) * 2021-01-26 2021-06-01 脸萌有限公司 Video playing method, device, equipment and medium
CN113132803A (en) * 2021-04-23 2021-07-16 Oppo广东移动通信有限公司 Video watching time length prediction method, device, storage medium and terminal
CN113312512A (en) * 2021-06-10 2021-08-27 北京百度网讯科技有限公司 Training method, recommendation device, electronic equipment and storage medium
CN113873330A (en) * 2021-08-31 2021-12-31 武汉卓尔数字传媒科技有限公司 Video recommendation method and device, computer equipment and storage medium
CN114339402A (en) * 2021-12-31 2022-04-12 北京字节跳动网络技术有限公司 Video playing completion rate prediction method, device, medium and electronic equipment
CN114339417A (en) * 2021-12-30 2022-04-12 未来电视有限公司 Video recommendation method, terminal device and readable storage medium
CN115086705A (en) * 2021-03-12 2022-09-20 北京字跳网络技术有限公司 Resource preloading method, device, equipment and storage medium
CN115082301A (en) * 2022-08-22 2022-09-20 中关村科学城城市大脑股份有限公司 Customized video generation method, device, equipment and computer readable medium
CN114339417B (en) * 2021-12-30 2024-05-10 未来电视有限公司 Video recommendation method, terminal equipment and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220936B (en) * 2021-06-04 2023-08-15 黑龙江广播电视台 Video intelligent recommendation method, device and storage medium based on random matrix coding and simplified convolutional network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227883A (en) * 2016-08-05 2016-12-14 北京聚爱聊网络科技有限公司 The temperature of a kind of content of multimedia analyzes method and apparatus
US20170085929A1 (en) * 2015-09-18 2017-03-23 Spotify Ab Systems, methods, and computer products for recommending media suitable for a designated style of use
WO2017219089A1 (en) * 2016-06-24 2017-12-28 Incoming Pty Ltd Selectively playing videos
CN107832437A (en) * 2017-11-16 2018-03-23 北京小米移动软件有限公司 Audio/video method for pushing, device, equipment and storage medium
US20190179852A1 (en) * 2017-12-12 2019-06-13 Shanghai Bilibili Technology Co., Ltd. Recommending and presenting comments relative to video frames
CN110012356A (en) * 2019-04-16 2019-07-12 腾讯科技(深圳)有限公司 Video recommendation method, device and equipment and computer storage medium
CN110059221A (en) * 2019-03-11 2019-07-26 咪咕视讯科技有限公司 Video recommendation method, electronic equipment and computer readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100165B (en) * 2014-05-20 2017-11-14 深圳市腾讯计算机系统有限公司 Network service recommends method and apparatus
CN106028071A (en) * 2016-05-17 2016-10-12 Tcl集团股份有限公司 Video recommendation method and system
CN106446052A (en) * 2016-08-31 2017-02-22 北京魔力互动科技有限公司 Video-on-demand program recommendation method based on user set
CN108460085A (en) * 2018-01-19 2018-08-28 北京奇艺世纪科技有限公司 A kind of video search sequence training set construction method and device based on user journal
CN108260008A (en) * 2018-02-11 2018-07-06 北京未来媒体科技股份有限公司 A kind of video recommendation method, device and electronic equipment
CN110704674B (en) * 2019-09-05 2022-11-25 苏宁云计算有限公司 Video playing integrity prediction method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170085929A1 (en) * 2015-09-18 2017-03-23 Spotify Ab Systems, methods, and computer products for recommending media suitable for a designated style of use
WO2017219089A1 (en) * 2016-06-24 2017-12-28 Incoming Pty Ltd Selectively playing videos
CN106227883A (en) * 2016-08-05 2016-12-14 北京聚爱聊网络科技有限公司 The temperature of a kind of content of multimedia analyzes method and apparatus
CN107832437A (en) * 2017-11-16 2018-03-23 北京小米移动软件有限公司 Audio/video method for pushing, device, equipment and storage medium
US20190179852A1 (en) * 2017-12-12 2019-06-13 Shanghai Bilibili Technology Co., Ltd. Recommending and presenting comments relative to video frames
CN110059221A (en) * 2019-03-11 2019-07-26 咪咕视讯科技有限公司 Video recommendation method, electronic equipment and computer readable storage medium
CN110012356A (en) * 2019-04-16 2019-07-12 腾讯科技(深圳)有限公司 Video recommendation method, device and equipment and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
雷曼: ""基于标签权重的协同过滤推荐算法"", 《计算机应用》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021042826A1 (en) * 2019-09-05 2021-03-11 苏宁云计算有限公司 Video playback completeness prediction method and apparatus
CN111918136A (en) * 2020-07-04 2020-11-10 中信银行股份有限公司 Interest analysis method and device, storage medium and electronic equipment
CN111538912A (en) * 2020-07-07 2020-08-14 腾讯科技(深圳)有限公司 Content recommendation method, device, equipment and readable storage medium
CN111565316A (en) * 2020-07-15 2020-08-21 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN112035740A (en) * 2020-08-19 2020-12-04 广州市百果园信息技术有限公司 Project use duration prediction method, device, equipment and storage medium
CN112887795A (en) * 2021-01-26 2021-06-01 脸萌有限公司 Video playing method, device, equipment and medium
CN115086705A (en) * 2021-03-12 2022-09-20 北京字跳网络技术有限公司 Resource preloading method, device, equipment and storage medium
CN113132803B (en) * 2021-04-23 2022-09-16 Oppo广东移动通信有限公司 Video watching time length prediction method, device, storage medium and terminal
CN113132803A (en) * 2021-04-23 2021-07-16 Oppo广东移动通信有限公司 Video watching time length prediction method, device, storage medium and terminal
CN113312512A (en) * 2021-06-10 2021-08-27 北京百度网讯科技有限公司 Training method, recommendation device, electronic equipment and storage medium
CN113312512B (en) * 2021-06-10 2023-10-31 北京百度网讯科技有限公司 Training method, recommending device, electronic equipment and storage medium
CN113873330A (en) * 2021-08-31 2021-12-31 武汉卓尔数字传媒科技有限公司 Video recommendation method and device, computer equipment and storage medium
CN113873330B (en) * 2021-08-31 2023-03-10 武汉卓尔数字传媒科技有限公司 Video recommendation method and device, computer equipment and storage medium
CN114339417A (en) * 2021-12-30 2022-04-12 未来电视有限公司 Video recommendation method, terminal device and readable storage medium
CN114339417B (en) * 2021-12-30 2024-05-10 未来电视有限公司 Video recommendation method, terminal equipment and readable storage medium
CN114339402A (en) * 2021-12-31 2022-04-12 北京字节跳动网络技术有限公司 Video playing completion rate prediction method, device, medium and electronic equipment
CN115082301A (en) * 2022-08-22 2022-09-20 中关村科学城城市大脑股份有限公司 Customized video generation method, device, equipment and computer readable medium
CN115082301B (en) * 2022-08-22 2022-12-02 中关村科学城城市大脑股份有限公司 Customized video generation method, device, equipment and computer readable medium

Also Published As

Publication number Publication date
CN110704674B (en) 2022-11-25
WO2021042826A1 (en) 2021-03-11
CA3153598A1 (en) 2021-03-11

Similar Documents

Publication Publication Date Title
CN110704674B (en) Video playing integrity prediction method and device
CN107944913B (en) High-potential user purchase intention prediction method based on big data user behavior analysis
CN103559206B (en) A kind of information recommendation method and system
WO2017096877A1 (en) Recommendation method and device
CN110019943B (en) Video recommendation method and device, electronic equipment and storage medium
CN108304512B (en) Video search engine coarse sorting method and device and electronic equipment
CN106326391B (en) Multimedia resource recommendation method and device
CN109511015B (en) Multimedia resource recommendation method, device, storage medium and equipment
CN104160712A (en) Computing similarity between media programs
CN111382307B (en) Video recommendation method, system and storage medium based on deep neural network
CN107944986A (en) A kind of O2O Method of Commodity Recommendation, system and equipment
Rupapara et al. Improving video ranking on social video platforms
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
CN106599047A (en) Information pushing method and device
CN112507163A (en) Duration prediction model training method, recommendation method, device, equipment and medium
CN112699310A (en) Cold start cross-domain hybrid recommendation method and system based on deep neural network
CN112100221A (en) Information recommendation method and device, recommendation server and storage medium
CN113239182A (en) Article recommendation method and device, computer equipment and storage medium
CN114371946A (en) Information push method and information push server based on cloud computing and big data
CN110569447B (en) Network resource recommendation method and device and storage medium
CN110516086B (en) Method for automatically acquiring movie label based on deep neural network
CN110188277B (en) Resource recommendation method and device
CN115618024A (en) Multimedia recommendation method and device and electronic equipment
CN112163163B (en) Multi-algorithm fused information recommendation method, device and equipment
Krishnamoorthy et al. TV shows popularity and performance prediction using CNN algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee after: Jiangsu Suning cloud computing Co.,Ltd.

Country or region after: China

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee before: Suning Cloud Computing Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240207

Address after: Room 3104, Building A5, No. 3 Gutan Avenue, Economic Development Zone, Gaochun District, Nanjing City, Jiangsu Province, 210000

Patentee after: Jiangsu Biying Technology Co.,Ltd.

Country or region after: China

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee before: Jiangsu Suning cloud computing Co.,Ltd.

Country or region before: China