CN104035994A - Prediction method of television play on-demand amount based on network data - Google Patents
Prediction method of television play on-demand amount based on network data Download PDFInfo
- Publication number
- CN104035994A CN104035994A CN201410255632.2A CN201410255632A CN104035994A CN 104035994 A CN104035994 A CN 104035994A CN 201410255632 A CN201410255632 A CN 201410255632A CN 104035994 A CN104035994 A CN 104035994A
- Authority
- CN
- China
- Prior art keywords
- play
- feature set
- microblogging
- broadcast
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Abstract
The invention discloses a prediction method of television play on-demand amount based on network data. The prediction method is characterized in that the grabbed micro-blog numbers and the search times as well as related data of television plays are calculated by using a correlation analysis and single variable linear regression to obtain an initial features set, then a stepwise regression method is carried out on the initial feature set to obtain a feature set X and a feature set X<b>, a multiple linear regression method is carried out on the feature set X and the feature set X<b> to respectively obtain two prediction models before and after premieres of the television plays, then rankings of the television plays are predicted according to the sizes of predicted values. Compared with the prior art, according to the prediction method disclosed by the invention, the episode average on-demand amount of the television plays in an on-demand system in a period of the future time is predicated in advance, predicted results effectively reflect the popularity degree of the television plays, and the method is simple and good in accuracy, so that the basis can be provided for video operators on a decision of television play broadcast copyright purchase, and the strong support is provided for an online on-demand system on attracting users and increasing advertisement click rate.
Description
Technical field
The present invention relates to Skills of Information Searching on Web field, specifically a kind of TV play program request amount Forecasting Methodology based on Sina's microblogging and Baidu's search network data.
Background technology
The prediction of video request program amount has important application in network data excavation field, and the TV play that program request amount is high can improve the playback volume of advertisement, and the program request amount that look-ahead goes out TV play is having a wide range of applications aspect advertising business expansion.Program request amount after utilizing Sina's microblogging and Baidu's searchable index to reach the standard grade to TV play in VOD system in following a period of time predicts, and TV play and social networks contact the focus that becomes research.Especially the prediction to TV play program request amount in online request system by network data, buys TV play broadcast copyright to video operator and has played certain Decision-making Function, reduces the blindness input that copyright is bought.In addition, can comparatively comprehensively reflect user to TV play fancy grade by Sina's microblogging and Baidu's searchable index data.
At present, the prediction of video class resource program request amount is generally all adopted to the Forecasting Methodology based on historical order program data Forecasting Methodology and network data.Prediction based on historical order program data need to just can be predicted after TV play is broadcasted certain hour section.And in the Forecasting Methodology of data Network Based, traditional method is mainly that the box office of film is predicted, compared with box office receipts prediction, the affected factor of TV play program request amount is more, does not consider that social networks and search data reflect the difference of degree to TV play program request amount in different time points.
Prior art can not predict program request amount before TV play is reached the standard grade, and do not predict by social networks and search engine data simultaneously, can accurately the future position amount of broadcasting, can do nothing to help the decision-making that video operator is buying TV play and broadcasts copyright.
Summary of the invention
The object of the invention is the TV play program request amount Forecasting Methodology of a kind of data Network Based that design for the deficiencies in the prior art, adopt SPSS computational tool to microblogging quantity and searching times relevant to TV play name before and after the TV play first broadcast capturing and set up initial characteristics collection with TV play related data, then initial characteristics centralized procurement is obtained collecting prediction and the rank prediction of equal program request amount with successive Regression and multiple linear regression, method is easy, accuracy is good, predict the outcome and effectively reflect the popular degree of TV play, can broadcast in copyright purchase decision foundation is provided in TV play for video operator, provide strong support to the attraction user of online request system and increase ad click amount.
The object of the present invention is achieved like this: a kind of TV play program request amount Forecasting Methodology of data Network Based, be characterized in web crawlers capture microblogging quantity relevant to TV play name before and after TV play first broadcast and searching times and with TV play related data, by setting up initial characteristics collection after the data application correlation analysis of crawl and single argument linear regression calculating, then initial characteristics centralized procurement is obtained to X feature set and X with method of gradual regression
bfeature set, by X feature set and X
bfeature set adopts multiple linear regression method to obtain the equal program request amount of forecast set, then carries out the rank prediction of TV play by the size of predicted value, and concrete prediction is carried out in the steps below:
(1), the crawl of data
A, capture some the TV play and the master data corresponding with TV play that finish hot broadcast by web crawlers;
B, obtain the user of entertainment field rank front 100 in microblogging ranking list, then according to concern relation, extending user number, and official's microblogging of completion TV play performer and Ge great satellite TV, and capture this crowd of user's microblogging data;
(2), statistical sample
A, analysis entertainment field user's data, statistics may be A microblogging data sample with TV play correlative factor;
B, add up in previous month of some TV play first broadcast the relevant total microblogging number of TV play name weekly and the microblogging number of every day in latter 15 days of showing for the first time is B microblogging data sample;
The searched number of times of TV play name and the every day of showing for the first time in latter 15 days, searching times was search data sample weekly in previous month of TV play first broadcast in c, statistics Baidu index;
(3), the foundation of initial characteristics collection
A, correlative factor and the collection of TV plays equal program request amount of use SPSS analysis tool to a step in statistical sample are carried out respectively the calculating of Pearson correlation coefficient and Spearman's correlation coefficient, be under 5% condition in degree of confidence, be significant correlation factor as long as meet one of them correlativity;
B, use SPSS analysis tool respectively in previous month of microblogging quantity weekly in previous month of TV play first broadcast and the microblogging quantity of the every day in latter 15 days of showing for the first time and TV play first broadcast, searched number of times and the every day in latter 15 days of showing for the first time, searched number of times carried out single argument linear regression calculating weekly, obtain the R of each variable to the equal program request amount of collection of TV plays
2value, by R larger in each time point microblogging and search data
2value is as characteristic factor, and wherein dependent variable is the equal program request amount of collection of TV plays, the single argument that independent variable is each time point;
C, by R larger in the significant correlation factor in above-mentioned a step and b step
2value composition initial characteristics collection;
(4), X and X
bthe foundation of feature set
Use SPSS analysis tool to carry out successive Regression to initial characteristics collection and calculate X feature set, from X feature set, extract the feature that just can obtain before TV play first broadcast and obtain X
bfeature set;
(5), the rank of TV play prediction
Use SPSS analysis tool to X feature set and X
bfeature set is carried out multivariate linear equation and is obtained two forecast models, and forecast model is added bigoted item and whether set up specific microblogging account number, if set up specific microblogging account number, in the result of calculating in SPSS analysis tool, add and have or not the equal difference of specific microblogging account number collection of TV plays; By X
bfeature set through add bigoted multivariate linear equation obtain can be before TV play first broadcast the forecast model of the equal program request amount of forecast set
; By X feature set through add bigoted multivariate linear equation obtain can be after TV play first broadcast the forecast model of the equal program request amount of forecast set
, then carry out the rank prediction of TV play according to the size of predicted value; Described forecast model
for predicting the outcome of progressively revising after TV play first broadcast.
The present invention compared with prior art has look-ahead and goes out the equal program request amount of collection in following a period of time of TV play in VOD system, predict the outcome and effectively reflect the popular degree of TV play, method is easy, accuracy is good, for video operator provides foundation in the decision-making of TV play broadcast copyright purchase, provide strong support to the attraction user of online request system and increase ad click amount.
Brief description of the drawings
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
Consult accompanying drawing 1, the present invention utilize Sina's microblogging and Baidu searchable index capture microblogging quantity relevant to TV play name before and after TV play first broadcast and searching times and with TV play related data, by setting up initial characteristics collection after the data application correlation analysis of crawl and single argument linear regression calculating, then initial characteristics centralized procurement is obtained to X feature set and X with method of gradual regression
bfeature set, by X feature set and X
bfeature set adopts multiple linear regression method to obtain the equal program request amount of forecast set, then carries out the rank prediction of TV play by the size of predicted value, and concrete prediction is carried out in the steps below:
(1), the crawl of data
A, capture the TV play that finishes hot broadcast by web crawlers, and in bean cotyledon, capture the essential information such as performer corresponding to TV play and collection number, obtain the master data of n portion TV play.
B, the API that uses Sina microblogging to provide, obtain the user of entertainment field rank front 100 in microblogging ranking list, then according to concern relation, and extending user number, and official's microblogging of completion TV play performer and Ge great satellite TV, and capture this crowd of user's microblogging data.
(2), statistical sample
A, analysis entertainment field user's data, the factor that statistics may be relevant to TV play forms A microblogging data sample.
B, add up in previous month of some TV play first broadcast the relevant total microblogging number of TV play name weekly and the microblogging number of every day in latter 15 days of showing for the first time is B microblogging data sample.
The searched number of times of TV play name and the every day of showing for the first time in latter 15 days, searching times was search data sample weekly in previous month of TV play first broadcast in c, statistics Baidu index.
(3), the foundation of initial characteristics collection
A, correlative factor and the collection of TV plays equal program request amount of use SPSS analysis tool to a step in statistical sample are carried out respectively the calculating of Pearson correlation coefficient and Spearman's correlation coefficient, be under 5% condition in degree of confidence, be significant correlation factor as long as meet one of them correlativity, then this significant correlation factor joined to initial characteristics and concentrate.
B, using TV play show for the first time previous month in previous month of microblogging quantity weekly and the microblogging quantity of the every day in latter 15 days of showing for the first time and TV play first broadcast weekly searched number of times and the every day in latter 15 days of showing for the first time searched number of times respectively as single argument, use the linear regression in SPSS analysis tool to calculate, wherein dependent variable is the equal program request amount of collection of TV plays capturing, independent variable is the single argument of each time point, obtains each variable to capturing the explanation degree R of the equal program request amount of collection of TV plays
2be worth, contrast the forecasted variances of each time point microblogging and search data, because each time point can calculate 2 R
2value, selects R among both
2value the greater joins initial characteristics collection.
(4), X and X
bthe foundation of feature set
Use the stepwise regression method in SPSS analysis tool further to select to obtain X feature set to initial characteristics collection, wherein use the probability of F for entering 0.05, delete 0.1, then in X feature set, extract the feature that just can obtain before TV play first broadcast as X
bfeature set.
(5), the rank of TV play prediction
Use SPSS analysis tool to X feature set and X
bfeature set is carried out multivariate linear equation and is obtained two forecast models, and forecast model is added bigoted item and whether set up specific microblogging account number, if set up specific microblogging account number, in the result of calculating in SPSS analysis tool, add and have or not the equal difference of specific microblogging account number collection of TV plays.
In the multivariate linear equation that adds bigoted, by X
bfeature set calculates can be at the forecast model of the equal program request amount of forecast set before TV play first broadcast
; Being calculated by X feature set can be at the forecast model of the rear equal program request amount of forecast set of TV play first broadcast
, forecast model
can after TV play first broadcast, carry out correction progressively.Forecast model
and forecast model
the equal program request amount of forecast set obtaining, then carries out the rank prediction of TV play according to the size of the equal program request amount of forecast set.Experiment shows: concentrate best result can reach R in test data
2=0.65, use SPSS analysis tool to carry out the calculating of Spearman's correlation coefficient to the true rank of TV play program request amount and prediction rank, the size of Spearman coefficient and the significantly accuracy of performance specification prediction, coefficient, between 0 ~ 1, is worth the more accurate of larger prediction.
More than just the present invention is further illustrated, and not in order to limit this patent, all is the present invention's equivalence enforcement, within all should being contained in the claim scope of this patent.
Claims (1)
1. the TV play program request amount Forecasting Methodology of data Network Based, it is characterized in that with web crawlers capture microblogging quantity relevant to TV play name before and after TV play first broadcast and searching times and with TV play related data, by setting up initial characteristics collection after the data application correlation analysis of crawl and single argument linear regression calculating, then initial characteristics centralized procurement is obtained to X feature set and X with method of gradual regression
bfeature set, by X feature set and X
bfeature set adopts multiple linear regression method to obtain the equal program request amount of forecast set, then carries out the rank prediction of TV play by the size of predicted value, and concrete prediction is carried out in the steps below:
(1), the crawl of data
A, capture some the TV play and the master data corresponding with TV play that finish hot broadcast by web crawlers;
B, obtain the user of entertainment field rank front 100 in microblogging ranking list, then according to concern relation, extending user number, and official's microblogging of completion TV play performer and Ge great satellite TV, and capture this crowd of user's microblogging data;
(2), statistical sample
A, analysis entertainment field user's data, statistics may be A microblogging data sample with TV play correlative factor;
B, add up in previous month of some TV play first broadcast the relevant total microblogging number of TV play name weekly and the microblogging number of every day in latter 15 days of showing for the first time is B microblogging data sample;
The searched number of times of TV play name and the every day of showing for the first time in latter 15 days, searching times was search data sample weekly in previous month of TV play first broadcast in c, statistics Baidu index;
(3), the foundation of initial characteristics collection
A, correlative factor and the collection of TV plays equal program request amount of use SPSS analysis tool to a step in statistical sample are carried out respectively the calculating of Pearson correlation coefficient and Spearman's correlation coefficient, be under 5% condition in degree of confidence, be significant correlation factor as long as meet one of them correlativity;
B, use SPSS analysis tool respectively in previous month of microblogging quantity weekly in previous month of TV play first broadcast and the microblogging quantity of the every day in latter 15 days of showing for the first time and TV play first broadcast, searched number of times and the every day in latter 15 days of showing for the first time, searched number of times carried out single argument linear regression calculating weekly, obtain the R of each variable to the equal program request amount of collection of TV plays
2value, by R larger in each time point microblogging and search data
2value is as characteristic factor, and wherein dependent variable is the equal program request amount of collection of TV plays, the single argument that independent variable is each time point;
C, by R larger in the significant correlation factor in above-mentioned a step and b step
2value composition initial characteristics collection;
(4), X and X
bthe foundation of feature set
Use SPSS analysis tool to carry out successive Regression to initial characteristics collection and calculate X feature set, from X feature set, extract the feature that just can obtain before TV play first broadcast and obtain X
bfeature set;
(5), the rank of TV play prediction
Use SPSS analysis tool to X feature set and X
bfeature set is carried out multivariate linear equation and is obtained two forecast models, and forecast model is added bigoted item and whether set up specific microblogging account number, if set up specific microblogging account number, in the result of calculating in SPSS analysis tool, add and have or not the equal difference of specific microblogging account number collection of TV plays; By X
bfeature set through add bigoted multivariate linear equation obtain can be before TV play first broadcast the forecast model of the equal program request amount of forecast set
; By X feature set through add bigoted multivariate linear equation obtain can be after TV play first broadcast the forecast model of the equal program request amount of forecast set
, then carry out the rank prediction of TV play according to the size of predicted value; Described forecast model
for predicting the outcome of progressively revising after TV play first broadcast.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410255632.2A CN104035994B (en) | 2014-06-11 | 2014-06-11 | Prediction method of television play on-demand amount based on network data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410255632.2A CN104035994B (en) | 2014-06-11 | 2014-06-11 | Prediction method of television play on-demand amount based on network data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104035994A true CN104035994A (en) | 2014-09-10 |
CN104035994B CN104035994B (en) | 2017-04-12 |
Family
ID=51466764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410255632.2A Expired - Fee Related CN104035994B (en) | 2014-06-11 | 2014-06-11 | Prediction method of television play on-demand amount based on network data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104035994B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104516983A (en) * | 2015-01-08 | 2015-04-15 | 龙思薇 | Data display method |
CN104537073A (en) * | 2014-12-31 | 2015-04-22 | 合一网络技术(北京)有限公司 | Estimation method for network multimedia object broadcast information |
CN105005623A (en) * | 2015-07-27 | 2015-10-28 | 东南大学 | Power demand prediction method based on keyword retrieval index correlation analysis |
CN105095414A (en) * | 2015-07-10 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus used for predicting network search volume |
CN106339771A (en) * | 2016-08-09 | 2017-01-18 | 北京猫眼文化传媒有限公司 | Movie box office data prediction method and device |
CN108898415A (en) * | 2018-05-29 | 2018-11-27 | 北京奇艺世纪科技有限公司 | A kind of the flow index of correlation prediction technique and device of video collection of drama |
CN110706015A (en) * | 2019-08-21 | 2020-01-17 | 北京大学(天津滨海)新一代信息技术研究院 | Advertisement click rate prediction oriented feature selection method |
CN111050195A (en) * | 2018-10-12 | 2020-04-21 | 中国电信股份有限公司 | Streaming media caching method and device and computer readable storage medium |
CN111447470A (en) * | 2019-10-22 | 2020-07-24 | 奥菲(泰州)光电传感技术有限公司 | Video application program parameter setting platform |
CN113379447A (en) * | 2021-05-28 | 2021-09-10 | 西安影视数据评估中心有限公司 | Method for predicting single-day audience rating of TV play |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120084141A1 (en) * | 2009-03-30 | 2012-04-05 | Acquisio | System and Method to Predict the Performance of Keywords for Advertising Campaigns Managed on the Internet |
CN103077190A (en) * | 2012-12-20 | 2013-05-01 | 人民搜索网络股份公司 | Hot event ranking method based on order learning technology |
CN103345512A (en) * | 2013-07-06 | 2013-10-09 | 北京品友互动信息技术有限公司 | Online advertising click-through rate forecasting method and device based on user attribute |
-
2014
- 2014-06-11 CN CN201410255632.2A patent/CN104035994B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120084141A1 (en) * | 2009-03-30 | 2012-04-05 | Acquisio | System and Method to Predict the Performance of Keywords for Advertising Campaigns Managed on the Internet |
CN103077190A (en) * | 2012-12-20 | 2013-05-01 | 人民搜索网络股份公司 | Hot event ranking method based on order learning technology |
CN103345512A (en) * | 2013-07-06 | 2013-10-09 | 北京品友互动信息技术有限公司 | Online advertising click-through rate forecasting method and device based on user attribute |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537073A (en) * | 2014-12-31 | 2015-04-22 | 合一网络技术(北京)有限公司 | Estimation method for network multimedia object broadcast information |
CN104537073B (en) * | 2014-12-31 | 2018-05-01 | 合一网络技术(北京)有限公司 | The evaluation method of network multimedia object broadcast information |
CN104516983A (en) * | 2015-01-08 | 2015-04-15 | 龙思薇 | Data display method |
CN105095414A (en) * | 2015-07-10 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus used for predicting network search volume |
CN105005623A (en) * | 2015-07-27 | 2015-10-28 | 东南大学 | Power demand prediction method based on keyword retrieval index correlation analysis |
CN106339771A (en) * | 2016-08-09 | 2017-01-18 | 北京猫眼文化传媒有限公司 | Movie box office data prediction method and device |
CN108898415A (en) * | 2018-05-29 | 2018-11-27 | 北京奇艺世纪科技有限公司 | A kind of the flow index of correlation prediction technique and device of video collection of drama |
CN111050195A (en) * | 2018-10-12 | 2020-04-21 | 中国电信股份有限公司 | Streaming media caching method and device and computer readable storage medium |
CN111050195B (en) * | 2018-10-12 | 2021-11-26 | 中国电信股份有限公司 | Streaming media caching method and device and computer readable storage medium |
CN110706015A (en) * | 2019-08-21 | 2020-01-17 | 北京大学(天津滨海)新一代信息技术研究院 | Advertisement click rate prediction oriented feature selection method |
CN110706015B (en) * | 2019-08-21 | 2023-06-13 | 北京大学(天津滨海)新一代信息技术研究院 | Feature selection method for advertisement click rate prediction |
CN111447470A (en) * | 2019-10-22 | 2020-07-24 | 奥菲(泰州)光电传感技术有限公司 | Video application program parameter setting platform |
CN111447470B (en) * | 2019-10-22 | 2021-04-20 | 深圳市野生动物园有限公司 | Video application program parameter setting platform |
CN113379447A (en) * | 2021-05-28 | 2021-09-10 | 西安影视数据评估中心有限公司 | Method for predicting single-day audience rating of TV play |
Also Published As
Publication number | Publication date |
---|---|
CN104035994B (en) | 2017-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104035994A (en) | Prediction method of television play on-demand amount based on network data | |
CN100581227C (en) | Collaborative filtered recommendation method introducing hotness degree weight of program | |
EP2817970B1 (en) | Automatically recommending content | |
JP6104456B2 (en) | Push method, system and server based on location information | |
CN104317835B (en) | The new user of video terminal recommends method | |
CN102780920A (en) | Television program recommending method and system | |
CN107656938B (en) | Recommendation method and device and recommendation device | |
CN103546773A (en) | Television program recommendation method and system | |
CN111866528A (en) | Live program pushing method and readable storage medium | |
CN103678668A (en) | Prompting method of relevant search result, server and system | |
US20200026740A1 (en) | Annotation of videos using aggregated user session data | |
CN102999588A (en) | Method and system for recommending multimedia applications | |
CN101271559A (en) | Cooperation recommending system based on user partial interest digging | |
CN110087119A (en) | Homepage display methods, device and computer readable storage medium is broadcast live | |
CN105184616A (en) | Method and device for targeted delivery of business object | |
CN103870454A (en) | Method and method for recommending data | |
US20120124089A1 (en) | User interest pattern modeling server and method for modeling user interest pattern | |
CN107249145A (en) | A kind of method of pushing video | |
CN102521321A (en) | Video search method based on search term ambiguity and user preferences | |
CN101739427A (en) | Crawler capturing method and device thereof | |
CN105260905A (en) | Method and device for evaluating and predicting influence of media program | |
CN104699696A (en) | File recommendation method and device | |
JP2012525654A (en) | Technology that targets video that is expected to develop many viewers | |
CN104219577A (en) | Hybrid real-time television program and network video recommending method based on intelligent televisions | |
CN104853251A (en) | Online collection method and device for multimedia data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170412 Termination date: 20200611 |
|
CF01 | Termination of patent right due to non-payment of annual fee |