CN104035994B - Prediction method of television play on-demand amount based on network data - Google Patents

Prediction method of television play on-demand amount based on network data Download PDF

Info

Publication number
CN104035994B
CN104035994B CN201410255632.2A CN201410255632A CN104035994B CN 104035994 B CN104035994 B CN 104035994B CN 201410255632 A CN201410255632 A CN 201410255632A CN 104035994 B CN104035994 B CN 104035994B
Authority
CN
China
Prior art keywords
play
microblogging
broadcast
data
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410255632.2A
Other languages
Chinese (zh)
Other versions
CN104035994A (en
Inventor
胡琴敏
徐晓枫
陈国梁
杜泽宇
罗念
钟哲凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201410255632.2A priority Critical patent/CN104035994B/en
Publication of CN104035994A publication Critical patent/CN104035994A/en
Application granted granted Critical
Publication of CN104035994B publication Critical patent/CN104035994B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The invention discloses a prediction method of television play on-demand amount based on network data. The prediction method is characterized in that the grabbed micro-blog numbers and the search times as well as related data of television plays are calculated by using a correlation analysis and single variable linear regression to obtain an initial features set, then a stepwise regression method is carried out on the initial feature set to obtain a feature set X and a feature set X<b>, a multiple linear regression method is carried out on the feature set X and the feature set X<b> to respectively obtain two prediction models before and after premieres of the television plays, then rankings of the television plays are predicted according to the sizes of predicted values. Compared with the prior art, according to the prediction method disclosed by the invention, the episode average on-demand amount of the television plays in an on-demand system in a period of the future time is predicated in advance, predicted results effectively reflect the popularity degree of the television plays, and the method is simple and good in accuracy, so that the basis can be provided for video operators on a decision of television play broadcast copyright purchase, and the strong support is provided for an online on-demand system on attracting users and increasing advertisement click rate.

Description

A kind of TV play program request amount Forecasting Methodology based on network data
Technical field
The present invention relates to Skills of Information Searching on Web field, specifically a kind of to be based on Sina weibo and Baidu search net The TV play program request amount Forecasting Methodology of network data.
Background technology
The prediction of video request program amount has important application in network data excavation field, and the high TV play of program request amount can The playback volume of advertisement is improved, look-ahead goes out the program request amount of TV play and has a wide range of applications in terms of advertising business extension. Program request amount after being reached the standard grade using TV play in Sina weibo and Baidu search exponent pair VOD system in following a period of time is carried out Prediction, and TV play and social networkies contact the focus for becoming research.Particularly by network data to online request system The prediction of middle TV play program request amount, broadcasts copyright to the purchase TV play of video operator and has played certain Decision-making Function, reduce The blindness input of copyright purchase.Additionally, more can comprehensively reflect use by Sina weibo and Baidu search exponent data Family is to TV play fancy grade.
At present, the prediction to video class resource program request amount is typically all adopted based on history order program data Forecasting Methodology and network The Forecasting Methodology of data.Prediction needs based on history order program data can just carry out pre- after certain period of time is broadcasted in TV play Survey.And based on the Forecasting Methodology of network data in, traditional method is then mainly predicted to the box office of film, with film Box office prediction compare, the impacted factor of TV play program request amount is more, do not account for social networkies and search data in difference Between put to TV play program request amount reflect degree difference.
Prior art can not be predicted to program request amount before TV play is reached the standard grade, and not over social networkies and was searched Rope engine data is predicted simultaneously, it is impossible to accurately predict program request amount, it is impossible to help video operator to broadcast in purchase TV play The decision-making of publishing right.
The content of the invention
The purpose of the present invention is a kind of TV play program request based on network data designed for the deficiencies in the prior art Amount Forecasting Methodology, using SPSS calculating instruments to the microblogging quantity related to TV play name and searching before and after the TV play first broadcast that captures Rope number of times and initial characteristicses collection is set up with TV play related data, successive Regression and polynary line are adopted to initial characteristicses collection then Property return and obtain collecting the prediction of equal program request amount and ranking prediction, method is easy, and accuracy is good, predicts the outcome The popular degree of TV play, can broadcast in TV play for video operator and provide foundation in copyright decision-making in purchasing, to online request The attraction user of system and increase ad click amount provide strong support.
The object of the present invention is achieved like this:A kind of TV play program request amount Forecasting Methodology based on network data, which is special Point is with the microblogging quantity related to TV play name and searching times and and TV before and after web crawlers crawl TV play first broadcast Acute related data, will set up initial characteristicses collection after the data application correlation analysiss and single variable linear regression Calculation of crawl, so X feature sets and X are obtained using method of gradual regression to initial characteristicses collection afterwardsbFeature set, by X feature sets and XbFeature set is using polynary Linear regression method obtains the equal program request amount of forecast set, and the ranking prediction of TV play is then carried out by the size of predictive value, concrete to predict Carry out in the steps below:
(One), data crawl
If a, by web crawlers crawl be over cadre's TV play of hot broadcast and basic number corresponding with TV play According to;
B, the user for obtaining in microblogging ranking list before entertainment field ranking 100, then according to concern relation, extending user Number, and official's microblogging of completion TV play performer and Ge great satellite TVs, and capture the microblog data of this crowd of user;
(Two), statistical sample
A, the data of analysis entertainment field user, statistics may be A microblog data samples with TV play correlative factor;
15 after the related total microblogging number of TV play name and first broadcast if b, statistics cadre TV play show for the first time in the previous moon weekly Microblogging number daily in it is B microblog data samples;
TV play in c, statistics Baidu index is shown for the first time in the previous moon after the searched number of times of TV play name weekly and first broadcast In 15 days, daily searching times are search data sample;
(Three), initial characteristicses collection foundation
A, the equal program request amount of correlative factor and collection of TV plays of a steps in statistical sample is entered respectively using SPSS analytical tools The calculating of row Pearson's correlation coefficient and Spearman's correlation coefficient, under the conditions of confidence level is 5%, as long as meeting one of them Dependency is significantly correlated factor;
15 after b, the microblogging quantity in the previous moon of being shown for the first time to TV play using SPSS analytical tools respectively weekly and first broadcast Microblogging quantity daily in it and TV play are shown for the first time after searched weekly number of times in the previous moon and first broadcast in 15 days quilt daily The number of times of search carries out single variable linear regression Calculation, obtains R of each variable to the equal program request amount of collection of TV plays2Value, by each Time point microblogging and R larger in search data2Value is the equal program request amount of collection of TV plays as characteristic factor, wherein dependent variable, from Variable is the single argument of each time point;
C, by R larger in the significantly correlated factor and b step in above-mentioned a steps2Value composition initial characteristicses collection;
(Four), X and XbThe foundation of feature set
Successive Regression is carried out to initial characteristicses collection with SPSS analytical tools and is calculated X feature sets, taken out from X feature sets The feature that can just obtain before taking TV play first broadcast obtains XbFeature set;
(Five), TV play ranking prediction
With SPSS analytical tools to X feature sets and XbFeature set carries out multivariate linear equation and obtains two forecast models, And bigoted item is added to forecast model and specific microblogging account number whether is set up, if setting up specific microblogging account number, then exist Add whether there is the equal difference of specific microblogging account number collection of TV plays in the result that SPSS analytical tools are calculated;By XbFeature set Jing adds Enter bigoted item multivariate linear equation obtain can TV play first broadcast before the equal program request amount of forecast set forecast model;By X feature sets Multivariate linear equation Jing bigoted item is added obtain can after TV play first broadcast the equal program request amount of forecast set forecast model, then The ranking prediction of TV play is carried out according to the size of predictive value;The forecast modelProgressively to be corrected after TV play first broadcast Predict the outcome.
The collection that the present invention goes out in the following a period of time of TV play in VOD system with look-ahead compared with prior art Program request amount, predicts the outcome and effectively reflects the popular degree of TV play, and method is easy, and accuracy is good, is video operator Foundation is provided in the decision-making that copyright purchase is broadcasted in TV play, the attraction user and increase ad click amount to online request system Strong support is provided.
Description of the drawings
Fig. 1 is the schematic flow sheet of the present invention.
Specific embodiment
Refering to accompanying drawing 1, the present invention utilizes Sina weibo and Baidu search index to capture before and after TV play is shown for the first time and TV play The related microblogging quantity of name and searching times and with TV play related data, by the data application correlation analysiss and list of crawl Initial characteristicses collection is set up after linear variable displacement regression Calculation, then to initial characteristicses collection using method of gradual regression obtain X feature sets and XbFeature set, by X feature sets and XbFeature set obtains the equal program request amount of forecast set using multiple linear regression method, then by predictive value Size carry out the ranking prediction of TV play, concrete prediction is carried out in the steps below:
(One), data crawl
A, the TV play that hot broadcast is over by web crawlers crawl, and the corresponding performer of TV play is captured in Semen Sojae Preparatum With the collection essential information such as number, the master data of n portion's TV plays is obtained.
B, the API provided using Sina weibo, obtain in microblogging ranking list before entertainment field ranking 100 user, then According to concern relation, official's microblogging of extending user number, and completion TV play performer and Ge great satellite TVs, and capture this crowd of user's Microblog data.
(Two), statistical sample
A, the data of analysis entertainment field user, the factor that statistics may be related to TV play form A microblog data samples.
15 after the related total microblogging number of TV play name and first broadcast if b, statistics cadre TV play show for the first time in the previous moon weekly Microblogging number daily in it is B microblog data samples.
TV play in c, statistics Baidu index is shown for the first time in the previous moon after the searched number of times of TV play name weekly and first broadcast In 15 days, daily searching times are search data sample.
(Three), initial characteristicses collection foundation
A, the equal program request amount of correlative factor and collection of TV plays of a steps in statistical sample is entered respectively using SPSS analytical tools The calculating of row Pearson's correlation coefficient and Spearman's correlation coefficient, under the conditions of confidence level is 5%, as long as meeting one of them Dependency is significantly correlated factor, and the significantly correlated factor is added to initial characteristicses concentration then.
After b, the microblogging quantity in the previous moon that TV play is shown for the first time weekly and first broadcast microblogging quantity daily in 15 days with TV play shows for the first time searched daily in 15 days number of times after searched weekly number of times in the previous moon and first broadcast respectively as list Variable, is calculated using the linear regression in SPSS analytical tools, and wherein dependent variable is the equal program request amount of collection of TV plays of crawl, Independent variable is the single argument of each time point, obtains each variable to capturing explanation degree R of the equal program request amount of collection of TV plays2Value, The forecasted variances of each time point microblogging and search data is contrasted, as each time point can calculate 2 R2Value, selects both Among R2Value the greater is added to initial characteristicses collection.
(Four), X and XbThe foundation of feature set
Initial characteristicses collection is further selected to obtain X feature sets with the stepwise regression method in SPSS analytical tools, its Used in the probability of F be to enter 0.05, delete 0.1, the feature before then extracting TV play first broadcast in X feature sets with regard to obtaining As XbFeature set.
(Five), TV play ranking prediction
With SPSS analytical tools to X feature sets and XbFeature set carries out multivariate linear equation and obtains two forecast models, And bigoted item is added to forecast model and specific microblogging account number whether is set up, if setting up specific microblogging account number, then exist Add whether there is the equal difference of specific microblogging account number collection of TV plays in the result that SPSS analytical tools are calculated.
In the multivariate linear equation for adding bigoted item, by XbFeature set is calculated can be predicted before TV play first broadcast Collect the forecast model of equal program request amount;By X feature sets be calculated can TV play first broadcast after the equal program request amount of forecast set prediction mould Type, forecast modelAmendment progressively can be carried out after TV play first broadcast.Forecast modelAnd forecast modelWhat is obtained is pre- The equal program request amount of collection is surveyed, the ranking prediction of TV play is then carried out according to the size of the equal program request amount of forecast set.Experiment shows:In test In data set, best result can reach R2=0.65, with true ranking and prediction of the SPSS analytical tools to TV play program request amount Ranking carries out the accuracy of the calculating of Spearman's correlation coefficient, the size of Spearman coefficient and significant performance explanation prediction, Coefficient is worth the more accurate of bigger prediction between 0 ~ 1.
More than simply the present invention is further illustrated, and be not used to limit this patent, it is all for equivalence enforcement of the present invention, It is intended to be limited solely by within the right of this patent.

Claims (1)

1. a kind of TV play program request amount Forecasting Methodology based on network data, it is characterised in that first with web crawlers crawl TV play Microblogging quantity related to TV play name before and after broadcasting and searching times and with TV play related data, will crawl data application Initial characteristicses collection is set up after correlation analysiss and single variable linear regression Calculation, method of gradual regression is adopted to initial characteristicses collection then Obtain X feature sets and XbFeature set, by X feature sets and XbFeature set obtains the equal program request amount of forecast set using multiple linear regression method, Then the ranking prediction of TV play is carried out by the size of predictive value, concrete prediction is carried out in the steps below:
(One), data crawl
If a, by web crawlers crawl be over cadre's TV play of hot broadcast and master data corresponding with TV play;
B, the user for obtaining in microblogging ranking list before entertainment field ranking 100, then according to concern relation, extending user number, and Official's microblogging of completion TV play performer and Ge great satellite TVs, and capture the microblog data of this crowd of user;
(Two), statistical sample
A, the data of analysis entertainment field user, the factor for counting related to TV play are A microblog data samples;
After the related total microblogging number of TV play name and first broadcast if b, statistics cadre TV play show for the first time in the previous moon weekly in 15 days Daily microblogging number is B microblog data samples;
TV play in c, statistics Baidu index is shown for the first time in the previous moon 15 days after the searched number of times of TV play name weekly and first broadcast In daily searching times be search data sample;
(Three), initial characteristicses collection foundation
A, skin is carried out respectively to the equal program request amount of correlative factor and collection of TV plays of a steps in statistical sample using SPSS analytical tools The calculating of the inferior correlation coefficient of that and Spearman's correlation coefficient, under the conditions of confidence level is 5%, as long as meeting one of related Property be significantly correlated factor;
After b, the microblogging quantity in the previous moon of being shown for the first time to TV play using SPSS analytical tools respectively weekly and first broadcast in 15 days Daily microblogging quantity and TV play are shown for the first time daily searched in 15 days after searched weekly number of times in the previous moon and first broadcast Number of times carry out single variable linear regression Calculation, obtain R of each variable to the equal program request amount of collection of TV plays2Value, by each time Point microblogging and R larger in search data2Used as characteristic factor, wherein dependent variable is the equal program request amount of collection of TV plays to value, independent variable For the single argument of each time point;
C, by R larger in the significantly correlated factor and b step in above-mentioned a steps2Value composition initial characteristicses collection;
(Four), X and XbThe foundation of feature set
Successive Regression is carried out to initial characteristicses collection with SPSS analytical tools and is calculated X feature sets, electricity is extracted from X feature sets Feature depending on can just obtain before acute first broadcast obtains XbFeature set;
(Five), TV play ranking prediction
With SPSS analytical tools to X feature sets and XbFeature set carries out multivariate linear equation and obtains two forecast models, and to pre- Survey model to add bigoted item and whether set up the inquiry item of specific microblogging account number, if setting up specific microblogging account number, The result that SPSS analytical tools are calculated is plus the equal difference of the specific microblogging account number collection of TV plays;By XbFeature set Jing adds inclined Hold item multivariate linear equation obtain can TV play first broadcast before the equal program request amount of forecast set forecast model I;Added by X feature sets Jing Enter bigoted item multivariate linear equation obtain can after TV play first broadcast the equal program request amount of forecast set forecast model II, then basis The size of predictive value carries out the ranking prediction of TV play;The forecast model II is pre- for what is progressively corrected after TV play first broadcast Survey result.
CN201410255632.2A 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data Expired - Fee Related CN104035994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410255632.2A CN104035994B (en) 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410255632.2A CN104035994B (en) 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data

Publications (2)

Publication Number Publication Date
CN104035994A CN104035994A (en) 2014-09-10
CN104035994B true CN104035994B (en) 2017-04-12

Family

ID=51466764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410255632.2A Expired - Fee Related CN104035994B (en) 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data

Country Status (1)

Country Link
CN (1) CN104035994B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537073B (en) * 2014-12-31 2018-05-01 合一网络技术(北京)有限公司 The evaluation method of network multimedia object broadcast information
CN104516983A (en) * 2015-01-08 2015-04-15 龙思薇 Data display method
CN105095414A (en) * 2015-07-10 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus used for predicting network search volume
CN105005623A (en) * 2015-07-27 2015-10-28 东南大学 Power demand prediction method based on keyword retrieval index correlation analysis
CN106339771A (en) * 2016-08-09 2017-01-18 北京猫眼文化传媒有限公司 Movie box office data prediction method and device
CN108898415A (en) * 2018-05-29 2018-11-27 北京奇艺世纪科技有限公司 A kind of the flow index of correlation prediction technique and device of video collection of drama
CN111050195B (en) * 2018-10-12 2021-11-26 中国电信股份有限公司 Streaming media caching method and device and computer readable storage medium
CN110706015B (en) * 2019-08-21 2023-06-13 北京大学(天津滨海)新一代信息技术研究院 Feature selection method for advertisement click rate prediction
CN111447470B (en) * 2019-10-22 2021-04-20 深圳市野生动物园有限公司 Video application program parameter setting platform
CN113379447A (en) * 2021-05-28 2021-09-10 西安影视数据评估中心有限公司 Method for predicting single-day audience rating of TV play

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084141A1 (en) * 2009-03-30 2012-04-05 Acquisio System and Method to Predict the Performance of Keywords for Advertising Campaigns Managed on the Internet

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute

Also Published As

Publication number Publication date
CN104035994A (en) 2014-09-10

Similar Documents

Publication Publication Date Title
CN104035994B (en) Prediction method of television play on-demand amount based on network data
Figueiredo et al. On the dynamics of social media popularity: A YouTube case study
Szabo et al. Predicting the popularity of online content
US10089402B1 (en) Display of videos based on referrers
CN104602042B (en) Label setting method based on user behavior
Takamura et al. Summarizing a document stream
CN102780920A (en) Television program recommending method and system
US8600969B2 (en) User interest pattern modeling server and method for modeling user interest pattern
CN104219575A (en) Related video recommending method and system
CN104021140B (en) A kind of processing method and processing device of Internet video
CN102207972A (en) Television program recommending method and device for digital television
CN104615627B (en) A kind of event public feelings information extracting method and system based on microblog
CN101944111B (en) Method and device for searching news video
CN105184616A (en) Method and device for targeted delivery of business object
Aggrawal et al. View-count based modeling for YouTube videos and weighted criteria–based ranking
WO2017192184A1 (en) Annotation of videos using aggregated user session data
CN109885656B (en) Microblog forwarding prediction method and device based on quantification heat degree
US9305215B2 (en) Apparatus, method and computer readable recording medium for analyzing video using image captured from video
CN105260905A (en) Method and device for evaluating and predicting influence of media program
CN103997662A (en) Program pushing method and system
CN103745380A (en) Advertisement delivery method and apparatus
CN104008193A (en) Information recommending method based on typical user group finding technique
CN106604068B (en) A kind of method and its system of more new media program
CN105141982B (en) A kind of method and device for the EPG generating popular program
Park et al. Exploring the user-generated content (UGC) uploading behavior on YouTube

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170412

Termination date: 20200611

CF01 Termination of patent right due to non-payment of annual fee