CN104035994A - Prediction method of television play on-demand amount based on network data - Google Patents

Prediction method of television play on-demand amount based on network data Download PDF

Info

Publication number
CN104035994A
CN104035994A CN201410255632.2A CN201410255632A CN104035994A CN 104035994 A CN104035994 A CN 104035994A CN 201410255632 A CN201410255632 A CN 201410255632A CN 104035994 A CN104035994 A CN 104035994A
Authority
CN
China
Prior art keywords
play
feature set
microblogging
broadcast
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410255632.2A
Other languages
Chinese (zh)
Other versions
CN104035994B (en
Inventor
胡琴敏
徐晓枫
陈国梁
杜泽宇
罗念
钟哲凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201410255632.2A priority Critical patent/CN104035994B/en
Publication of CN104035994A publication Critical patent/CN104035994A/en
Application granted granted Critical
Publication of CN104035994B publication Critical patent/CN104035994B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The invention discloses a prediction method of television play on-demand amount based on network data. The prediction method is characterized in that the grabbed micro-blog numbers and the search times as well as related data of television plays are calculated by using a correlation analysis and single variable linear regression to obtain an initial features set, then a stepwise regression method is carried out on the initial feature set to obtain a feature set X and a feature set X<b>, a multiple linear regression method is carried out on the feature set X and the feature set X<b> to respectively obtain two prediction models before and after premieres of the television plays, then rankings of the television plays are predicted according to the sizes of predicted values. Compared with the prior art, according to the prediction method disclosed by the invention, the episode average on-demand amount of the television plays in an on-demand system in a period of the future time is predicated in advance, predicted results effectively reflect the popularity degree of the television plays, and the method is simple and good in accuracy, so that the basis can be provided for video operators on a decision of television play broadcast copyright purchase, and the strong support is provided for an online on-demand system on attracting users and increasing advertisement click rate.

Description

A kind of TV play program request amount Forecasting Methodology of data Network Based
Technical field
The present invention relates to Skills of Information Searching on Web field, specifically a kind of TV play program request amount Forecasting Methodology based on Sina's microblogging and Baidu's search network data.
Background technology
The prediction of video request program amount has important application in network data excavation field, and the TV play that program request amount is high can improve the playback volume of advertisement, and the program request amount that look-ahead goes out TV play is having a wide range of applications aspect advertising business expansion.Program request amount after utilizing Sina's microblogging and Baidu's searchable index to reach the standard grade to TV play in VOD system in following a period of time predicts, and TV play and social networks contact the focus that becomes research.Especially the prediction to TV play program request amount in online request system by network data, buys TV play broadcast copyright to video operator and has played certain Decision-making Function, reduces the blindness input that copyright is bought.In addition, can comparatively comprehensively reflect user to TV play fancy grade by Sina's microblogging and Baidu's searchable index data.
At present, the prediction of video class resource program request amount is generally all adopted to the Forecasting Methodology based on historical order program data Forecasting Methodology and network data.Prediction based on historical order program data need to just can be predicted after TV play is broadcasted certain hour section.And in the Forecasting Methodology of data Network Based, traditional method is mainly that the box office of film is predicted, compared with box office receipts prediction, the affected factor of TV play program request amount is more, does not consider that social networks and search data reflect the difference of degree to TV play program request amount in different time points.
Prior art can not predict program request amount before TV play is reached the standard grade, and do not predict by social networks and search engine data simultaneously, can accurately the future position amount of broadcasting, can do nothing to help the decision-making that video operator is buying TV play and broadcasts copyright.
Summary of the invention
The object of the invention is the TV play program request amount Forecasting Methodology of a kind of data Network Based that design for the deficiencies in the prior art, adopt SPSS computational tool to microblogging quantity and searching times relevant to TV play name before and after the TV play first broadcast capturing and set up initial characteristics collection with TV play related data, then initial characteristics centralized procurement is obtained collecting prediction and the rank prediction of equal program request amount with successive Regression and multiple linear regression, method is easy, accuracy is good, predict the outcome and effectively reflect the popular degree of TV play, can broadcast in copyright purchase decision foundation is provided in TV play for video operator, provide strong support to the attraction user of online request system and increase ad click amount.
The object of the present invention is achieved like this: a kind of TV play program request amount Forecasting Methodology of data Network Based, be characterized in web crawlers capture microblogging quantity relevant to TV play name before and after TV play first broadcast and searching times and with TV play related data, by setting up initial characteristics collection after the data application correlation analysis of crawl and single argument linear regression calculating, then initial characteristics centralized procurement is obtained to X feature set and X with method of gradual regression bfeature set, by X feature set and X bfeature set adopts multiple linear regression method to obtain the equal program request amount of forecast set, then carries out the rank prediction of TV play by the size of predicted value, and concrete prediction is carried out in the steps below:
(1), the crawl of data
A, capture some the TV play and the master data corresponding with TV play that finish hot broadcast by web crawlers;
B, obtain the user of entertainment field rank front 100 in microblogging ranking list, then according to concern relation, extending user number, and official's microblogging of completion TV play performer and Ge great satellite TV, and capture this crowd of user's microblogging data;
(2), statistical sample
A, analysis entertainment field user's data, statistics may be A microblogging data sample with TV play correlative factor;
B, add up in previous month of some TV play first broadcast the relevant total microblogging number of TV play name weekly and the microblogging number of every day in latter 15 days of showing for the first time is B microblogging data sample;
The searched number of times of TV play name and the every day of showing for the first time in latter 15 days, searching times was search data sample weekly in previous month of TV play first broadcast in c, statistics Baidu index;
(3), the foundation of initial characteristics collection
A, correlative factor and the collection of TV plays equal program request amount of use SPSS analysis tool to a step in statistical sample are carried out respectively the calculating of Pearson correlation coefficient and Spearman's correlation coefficient, be under 5% condition in degree of confidence, be significant correlation factor as long as meet one of them correlativity;
B, use SPSS analysis tool respectively in previous month of microblogging quantity weekly in previous month of TV play first broadcast and the microblogging quantity of the every day in latter 15 days of showing for the first time and TV play first broadcast, searched number of times and the every day in latter 15 days of showing for the first time, searched number of times carried out single argument linear regression calculating weekly, obtain the R of each variable to the equal program request amount of collection of TV plays 2value, by R larger in each time point microblogging and search data 2value is as characteristic factor, and wherein dependent variable is the equal program request amount of collection of TV plays, the single argument that independent variable is each time point;
C, by R larger in the significant correlation factor in above-mentioned a step and b step 2value composition initial characteristics collection;
(4), X and X bthe foundation of feature set
Use SPSS analysis tool to carry out successive Regression to initial characteristics collection and calculate X feature set, from X feature set, extract the feature that just can obtain before TV play first broadcast and obtain X bfeature set;
(5), the rank of TV play prediction
Use SPSS analysis tool to X feature set and X bfeature set is carried out multivariate linear equation and is obtained two forecast models, and forecast model is added bigoted item and whether set up specific microblogging account number, if set up specific microblogging account number, in the result of calculating in SPSS analysis tool, add and have or not the equal difference of specific microblogging account number collection of TV plays; By X bfeature set through add bigoted multivariate linear equation obtain can be before TV play first broadcast the forecast model of the equal program request amount of forecast set ; By X feature set through add bigoted multivariate linear equation obtain can be after TV play first broadcast the forecast model of the equal program request amount of forecast set , then carry out the rank prediction of TV play according to the size of predicted value; Described forecast model for predicting the outcome of progressively revising after TV play first broadcast.
The present invention compared with prior art has look-ahead and goes out the equal program request amount of collection in following a period of time of TV play in VOD system, predict the outcome and effectively reflect the popular degree of TV play, method is easy, accuracy is good, for video operator provides foundation in the decision-making of TV play broadcast copyright purchase, provide strong support to the attraction user of online request system and increase ad click amount.
Brief description of the drawings
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
Consult accompanying drawing 1, the present invention utilize Sina's microblogging and Baidu searchable index capture microblogging quantity relevant to TV play name before and after TV play first broadcast and searching times and with TV play related data, by setting up initial characteristics collection after the data application correlation analysis of crawl and single argument linear regression calculating, then initial characteristics centralized procurement is obtained to X feature set and X with method of gradual regression bfeature set, by X feature set and X bfeature set adopts multiple linear regression method to obtain the equal program request amount of forecast set, then carries out the rank prediction of TV play by the size of predicted value, and concrete prediction is carried out in the steps below:
(1), the crawl of data
A, capture the TV play that finishes hot broadcast by web crawlers, and in bean cotyledon, capture the essential information such as performer corresponding to TV play and collection number, obtain the master data of n portion TV play.
B, the API that uses Sina microblogging to provide, obtain the user of entertainment field rank front 100 in microblogging ranking list, then according to concern relation, and extending user number, and official's microblogging of completion TV play performer and Ge great satellite TV, and capture this crowd of user's microblogging data.
(2), statistical sample
A, analysis entertainment field user's data, the factor that statistics may be relevant to TV play forms A microblogging data sample.
B, add up in previous month of some TV play first broadcast the relevant total microblogging number of TV play name weekly and the microblogging number of every day in latter 15 days of showing for the first time is B microblogging data sample.
The searched number of times of TV play name and the every day of showing for the first time in latter 15 days, searching times was search data sample weekly in previous month of TV play first broadcast in c, statistics Baidu index.
(3), the foundation of initial characteristics collection
A, correlative factor and the collection of TV plays equal program request amount of use SPSS analysis tool to a step in statistical sample are carried out respectively the calculating of Pearson correlation coefficient and Spearman's correlation coefficient, be under 5% condition in degree of confidence, be significant correlation factor as long as meet one of them correlativity, then this significant correlation factor joined to initial characteristics and concentrate.
B, using TV play show for the first time previous month in previous month of microblogging quantity weekly and the microblogging quantity of the every day in latter 15 days of showing for the first time and TV play first broadcast weekly searched number of times and the every day in latter 15 days of showing for the first time searched number of times respectively as single argument, use the linear regression in SPSS analysis tool to calculate, wherein dependent variable is the equal program request amount of collection of TV plays capturing, independent variable is the single argument of each time point, obtains each variable to capturing the explanation degree R of the equal program request amount of collection of TV plays 2be worth, contrast the forecasted variances of each time point microblogging and search data, because each time point can calculate 2 R 2value, selects R among both 2value the greater joins initial characteristics collection.
(4), X and X bthe foundation of feature set
Use the stepwise regression method in SPSS analysis tool further to select to obtain X feature set to initial characteristics collection, wherein use the probability of F for entering 0.05, delete 0.1, then in X feature set, extract the feature that just can obtain before TV play first broadcast as X bfeature set.
(5), the rank of TV play prediction
Use SPSS analysis tool to X feature set and X bfeature set is carried out multivariate linear equation and is obtained two forecast models, and forecast model is added bigoted item and whether set up specific microblogging account number, if set up specific microblogging account number, in the result of calculating in SPSS analysis tool, add and have or not the equal difference of specific microblogging account number collection of TV plays.
In the multivariate linear equation that adds bigoted, by X bfeature set calculates can be at the forecast model of the equal program request amount of forecast set before TV play first broadcast ; Being calculated by X feature set can be at the forecast model of the rear equal program request amount of forecast set of TV play first broadcast , forecast model can after TV play first broadcast, carry out correction progressively.Forecast model and forecast model the equal program request amount of forecast set obtaining, then carries out the rank prediction of TV play according to the size of the equal program request amount of forecast set.Experiment shows: concentrate best result can reach R in test data 2=0.65, use SPSS analysis tool to carry out the calculating of Spearman's correlation coefficient to the true rank of TV play program request amount and prediction rank, the size of Spearman coefficient and the significantly accuracy of performance specification prediction, coefficient, between 0 ~ 1, is worth the more accurate of larger prediction.
More than just the present invention is further illustrated, and not in order to limit this patent, all is the present invention's equivalence enforcement, within all should being contained in the claim scope of this patent.

Claims (1)

1. the TV play program request amount Forecasting Methodology of data Network Based, it is characterized in that with web crawlers capture microblogging quantity relevant to TV play name before and after TV play first broadcast and searching times and with TV play related data, by setting up initial characteristics collection after the data application correlation analysis of crawl and single argument linear regression calculating, then initial characteristics centralized procurement is obtained to X feature set and X with method of gradual regression bfeature set, by X feature set and X bfeature set adopts multiple linear regression method to obtain the equal program request amount of forecast set, then carries out the rank prediction of TV play by the size of predicted value, and concrete prediction is carried out in the steps below:
(1), the crawl of data
A, capture some the TV play and the master data corresponding with TV play that finish hot broadcast by web crawlers;
B, obtain the user of entertainment field rank front 100 in microblogging ranking list, then according to concern relation, extending user number, and official's microblogging of completion TV play performer and Ge great satellite TV, and capture this crowd of user's microblogging data;
(2), statistical sample
A, analysis entertainment field user's data, statistics may be A microblogging data sample with TV play correlative factor;
B, add up in previous month of some TV play first broadcast the relevant total microblogging number of TV play name weekly and the microblogging number of every day in latter 15 days of showing for the first time is B microblogging data sample;
The searched number of times of TV play name and the every day of showing for the first time in latter 15 days, searching times was search data sample weekly in previous month of TV play first broadcast in c, statistics Baidu index;
(3), the foundation of initial characteristics collection
A, correlative factor and the collection of TV plays equal program request amount of use SPSS analysis tool to a step in statistical sample are carried out respectively the calculating of Pearson correlation coefficient and Spearman's correlation coefficient, be under 5% condition in degree of confidence, be significant correlation factor as long as meet one of them correlativity;
B, use SPSS analysis tool respectively in previous month of microblogging quantity weekly in previous month of TV play first broadcast and the microblogging quantity of the every day in latter 15 days of showing for the first time and TV play first broadcast, searched number of times and the every day in latter 15 days of showing for the first time, searched number of times carried out single argument linear regression calculating weekly, obtain the R of each variable to the equal program request amount of collection of TV plays 2value, by R larger in each time point microblogging and search data 2value is as characteristic factor, and wherein dependent variable is the equal program request amount of collection of TV plays, the single argument that independent variable is each time point;
C, by R larger in the significant correlation factor in above-mentioned a step and b step 2value composition initial characteristics collection;
(4), X and X bthe foundation of feature set
Use SPSS analysis tool to carry out successive Regression to initial characteristics collection and calculate X feature set, from X feature set, extract the feature that just can obtain before TV play first broadcast and obtain X bfeature set;
(5), the rank of TV play prediction
Use SPSS analysis tool to X feature set and X bfeature set is carried out multivariate linear equation and is obtained two forecast models, and forecast model is added bigoted item and whether set up specific microblogging account number, if set up specific microblogging account number, in the result of calculating in SPSS analysis tool, add and have or not the equal difference of specific microblogging account number collection of TV plays; By X bfeature set through add bigoted multivariate linear equation obtain can be before TV play first broadcast the forecast model of the equal program request amount of forecast set ; By X feature set through add bigoted multivariate linear equation obtain can be after TV play first broadcast the forecast model of the equal program request amount of forecast set , then carry out the rank prediction of TV play according to the size of predicted value; Described forecast model for predicting the outcome of progressively revising after TV play first broadcast.
CN201410255632.2A 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data Expired - Fee Related CN104035994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410255632.2A CN104035994B (en) 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410255632.2A CN104035994B (en) 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data

Publications (2)

Publication Number Publication Date
CN104035994A true CN104035994A (en) 2014-09-10
CN104035994B CN104035994B (en) 2017-04-12

Family

ID=51466764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410255632.2A Expired - Fee Related CN104035994B (en) 2014-06-11 2014-06-11 Prediction method of television play on-demand amount based on network data

Country Status (1)

Country Link
CN (1) CN104035994B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104516983A (en) * 2015-01-08 2015-04-15 龙思薇 Data display method
CN104537073A (en) * 2014-12-31 2015-04-22 合一网络技术(北京)有限公司 Estimation method for network multimedia object broadcast information
CN105005623A (en) * 2015-07-27 2015-10-28 东南大学 Power demand prediction method based on keyword retrieval index correlation analysis
CN105095414A (en) * 2015-07-10 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus used for predicting network search volume
CN106339771A (en) * 2016-08-09 2017-01-18 北京猫眼文化传媒有限公司 Movie box office data prediction method and device
CN108898415A (en) * 2018-05-29 2018-11-27 北京奇艺世纪科技有限公司 A kind of the flow index of correlation prediction technique and device of video collection of drama
CN110706015A (en) * 2019-08-21 2020-01-17 北京大学(天津滨海)新一代信息技术研究院 Advertisement click rate prediction oriented feature selection method
CN111050195A (en) * 2018-10-12 2020-04-21 中国电信股份有限公司 Streaming media caching method and device and computer readable storage medium
CN111447470A (en) * 2019-10-22 2020-07-24 奥菲(泰州)光电传感技术有限公司 Video application program parameter setting platform
CN113379447A (en) * 2021-05-28 2021-09-10 西安影视数据评估中心有限公司 Method for predicting single-day audience rating of TV play

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084141A1 (en) * 2009-03-30 2012-04-05 Acquisio System and Method to Predict the Performance of Keywords for Advertising Campaigns Managed on the Internet
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084141A1 (en) * 2009-03-30 2012-04-05 Acquisio System and Method to Predict the Performance of Keywords for Advertising Campaigns Managed on the Internet
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537073A (en) * 2014-12-31 2015-04-22 合一网络技术(北京)有限公司 Estimation method for network multimedia object broadcast information
CN104537073B (en) * 2014-12-31 2018-05-01 合一网络技术(北京)有限公司 The evaluation method of network multimedia object broadcast information
CN104516983A (en) * 2015-01-08 2015-04-15 龙思薇 Data display method
CN105095414A (en) * 2015-07-10 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus used for predicting network search volume
CN105005623A (en) * 2015-07-27 2015-10-28 东南大学 Power demand prediction method based on keyword retrieval index correlation analysis
CN106339771A (en) * 2016-08-09 2017-01-18 北京猫眼文化传媒有限公司 Movie box office data prediction method and device
CN108898415A (en) * 2018-05-29 2018-11-27 北京奇艺世纪科技有限公司 A kind of the flow index of correlation prediction technique and device of video collection of drama
CN111050195A (en) * 2018-10-12 2020-04-21 中国电信股份有限公司 Streaming media caching method and device and computer readable storage medium
CN111050195B (en) * 2018-10-12 2021-11-26 中国电信股份有限公司 Streaming media caching method and device and computer readable storage medium
CN110706015A (en) * 2019-08-21 2020-01-17 北京大学(天津滨海)新一代信息技术研究院 Advertisement click rate prediction oriented feature selection method
CN110706015B (en) * 2019-08-21 2023-06-13 北京大学(天津滨海)新一代信息技术研究院 Feature selection method for advertisement click rate prediction
CN111447470A (en) * 2019-10-22 2020-07-24 奥菲(泰州)光电传感技术有限公司 Video application program parameter setting platform
CN111447470B (en) * 2019-10-22 2021-04-20 深圳市野生动物园有限公司 Video application program parameter setting platform
CN113379447A (en) * 2021-05-28 2021-09-10 西安影视数据评估中心有限公司 Method for predicting single-day audience rating of TV play

Also Published As

Publication number Publication date
CN104035994B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
CN104035994A (en) Prediction method of television play on-demand amount based on network data
CN100581227C (en) Collaborative filtered recommendation method introducing hotness degree weight of program
EP2817970B1 (en) Automatically recommending content
JP6104456B2 (en) Push method, system and server based on location information
CN104317835B (en) The new user of video terminal recommends method
CN102780920A (en) Television program recommending method and system
CN107656938B (en) Recommendation method and device and recommendation device
CN103546773A (en) Television program recommendation method and system
CN111866528A (en) Live program pushing method and readable storage medium
CN103678668A (en) Prompting method of relevant search result, server and system
US20200026740A1 (en) Annotation of videos using aggregated user session data
CN102999588A (en) Method and system for recommending multimedia applications
CN101271559A (en) Cooperation recommending system based on user partial interest digging
CN110087119A (en) Homepage display methods, device and computer readable storage medium is broadcast live
CN105184616A (en) Method and device for targeted delivery of business object
CN103870454A (en) Method and method for recommending data
US20120124089A1 (en) User interest pattern modeling server and method for modeling user interest pattern
CN107249145A (en) A kind of method of pushing video
CN102521321A (en) Video search method based on search term ambiguity and user preferences
CN101739427A (en) Crawler capturing method and device thereof
CN105260905A (en) Method and device for evaluating and predicting influence of media program
CN104699696A (en) File recommendation method and device
JP2012525654A (en) Technology that targets video that is expected to develop many viewers
CN104219577A (en) Hybrid real-time television program and network video recommending method based on intelligent televisions
CN104853251A (en) Online collection method and device for multimedia data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170412

Termination date: 20200611

CF01 Termination of patent right due to non-payment of annual fee