CN104035994B - Prediction method of television play on-demand amount based on network data - Google Patents
Prediction method of television play on-demand amount based on network data Download PDFInfo
- Publication number
- CN104035994B CN104035994B CN201410255632.2A CN201410255632A CN104035994B CN 104035994 B CN104035994 B CN 104035994B CN 201410255632 A CN201410255632 A CN 201410255632A CN 104035994 B CN104035994 B CN 104035994B
- Authority
- CN
- China
- Prior art keywords
- play
- microblogging
- broadcast
- data
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Abstract
The invention discloses a prediction method of television play on-demand amount based on network data. The prediction method is characterized in that the grabbed micro-blog numbers and the search times as well as related data of television plays are calculated by using a correlation analysis and single variable linear regression to obtain an initial features set, then a stepwise regression method is carried out on the initial feature set to obtain a feature set X and a feature set X<b>, a multiple linear regression method is carried out on the feature set X and the feature set X<b> to respectively obtain two prediction models before and after premieres of the television plays, then rankings of the television plays are predicted according to the sizes of predicted values. Compared with the prior art, according to the prediction method disclosed by the invention, the episode average on-demand amount of the television plays in an on-demand system in a period of the future time is predicated in advance, predicted results effectively reflect the popularity degree of the television plays, and the method is simple and good in accuracy, so that the basis can be provided for video operators on a decision of television play broadcast copyright purchase, and the strong support is provided for an online on-demand system on attracting users and increasing advertisement click rate.
Description
Technical field
The present invention relates to Skills of Information Searching on Web field, specifically a kind of to be based on Sina weibo and Baidu search net
The TV play program request amount Forecasting Methodology of network data.
Background technology
The prediction of video request program amount has important application in network data excavation field, and the high TV play of program request amount can
The playback volume of advertisement is improved, look-ahead goes out the program request amount of TV play and has a wide range of applications in terms of advertising business extension.
Program request amount after being reached the standard grade using TV play in Sina weibo and Baidu search exponent pair VOD system in following a period of time is carried out
Prediction, and TV play and social networkies contact the focus for becoming research.Particularly by network data to online request system
The prediction of middle TV play program request amount, broadcasts copyright to the purchase TV play of video operator and has played certain Decision-making Function, reduce
The blindness input of copyright purchase.Additionally, more can comprehensively reflect use by Sina weibo and Baidu search exponent data
Family is to TV play fancy grade.
At present, the prediction to video class resource program request amount is typically all adopted based on history order program data Forecasting Methodology and network
The Forecasting Methodology of data.Prediction needs based on history order program data can just carry out pre- after certain period of time is broadcasted in TV play
Survey.And based on the Forecasting Methodology of network data in, traditional method is then mainly predicted to the box office of film, with film
Box office prediction compare, the impacted factor of TV play program request amount is more, do not account for social networkies and search data in difference
Between put to TV play program request amount reflect degree difference.
Prior art can not be predicted to program request amount before TV play is reached the standard grade, and not over social networkies and was searched
Rope engine data is predicted simultaneously, it is impossible to accurately predict program request amount, it is impossible to help video operator to broadcast in purchase TV play
The decision-making of publishing right.
The content of the invention
The purpose of the present invention is a kind of TV play program request based on network data designed for the deficiencies in the prior art
Amount Forecasting Methodology, using SPSS calculating instruments to the microblogging quantity related to TV play name and searching before and after the TV play first broadcast that captures
Rope number of times and initial characteristicses collection is set up with TV play related data, successive Regression and polynary line are adopted to initial characteristicses collection then
Property return and obtain collecting the prediction of equal program request amount and ranking prediction, method is easy, and accuracy is good, predicts the outcome
The popular degree of TV play, can broadcast in TV play for video operator and provide foundation in copyright decision-making in purchasing, to online request
The attraction user of system and increase ad click amount provide strong support.
The object of the present invention is achieved like this:A kind of TV play program request amount Forecasting Methodology based on network data, which is special
Point is with the microblogging quantity related to TV play name and searching times and and TV before and after web crawlers crawl TV play first broadcast
Acute related data, will set up initial characteristicses collection after the data application correlation analysiss and single variable linear regression Calculation of crawl, so
X feature sets and X are obtained using method of gradual regression to initial characteristicses collection afterwardsbFeature set, by X feature sets and XbFeature set is using polynary
Linear regression method obtains the equal program request amount of forecast set, and the ranking prediction of TV play is then carried out by the size of predictive value, concrete to predict
Carry out in the steps below:
(One), data crawl
If a, by web crawlers crawl be over cadre's TV play of hot broadcast and basic number corresponding with TV play
According to;
B, the user for obtaining in microblogging ranking list before entertainment field ranking 100, then according to concern relation, extending user
Number, and official's microblogging of completion TV play performer and Ge great satellite TVs, and capture the microblog data of this crowd of user;
(Two), statistical sample
A, the data of analysis entertainment field user, statistics may be A microblog data samples with TV play correlative factor;
15 after the related total microblogging number of TV play name and first broadcast if b, statistics cadre TV play show for the first time in the previous moon weekly
Microblogging number daily in it is B microblog data samples;
TV play in c, statistics Baidu index is shown for the first time in the previous moon after the searched number of times of TV play name weekly and first broadcast
In 15 days, daily searching times are search data sample;
(Three), initial characteristicses collection foundation
A, the equal program request amount of correlative factor and collection of TV plays of a steps in statistical sample is entered respectively using SPSS analytical tools
The calculating of row Pearson's correlation coefficient and Spearman's correlation coefficient, under the conditions of confidence level is 5%, as long as meeting one of them
Dependency is significantly correlated factor;
15 after b, the microblogging quantity in the previous moon of being shown for the first time to TV play using SPSS analytical tools respectively weekly and first broadcast
Microblogging quantity daily in it and TV play are shown for the first time after searched weekly number of times in the previous moon and first broadcast in 15 days quilt daily
The number of times of search carries out single variable linear regression Calculation, obtains R of each variable to the equal program request amount of collection of TV plays2Value, by each
Time point microblogging and R larger in search data2Value is the equal program request amount of collection of TV plays as characteristic factor, wherein dependent variable, from
Variable is the single argument of each time point;
C, by R larger in the significantly correlated factor and b step in above-mentioned a steps2Value composition initial characteristicses collection;
(Four), X and XbThe foundation of feature set
Successive Regression is carried out to initial characteristicses collection with SPSS analytical tools and is calculated X feature sets, taken out from X feature sets
The feature that can just obtain before taking TV play first broadcast obtains XbFeature set;
(Five), TV play ranking prediction
With SPSS analytical tools to X feature sets and XbFeature set carries out multivariate linear equation and obtains two forecast models,
And bigoted item is added to forecast model and specific microblogging account number whether is set up, if setting up specific microblogging account number, then exist
Add whether there is the equal difference of specific microblogging account number collection of TV plays in the result that SPSS analytical tools are calculated;By XbFeature set Jing adds
Enter bigoted item multivariate linear equation obtain can TV play first broadcast before the equal program request amount of forecast set forecast model;By X feature sets
Multivariate linear equation Jing bigoted item is added obtain can after TV play first broadcast the equal program request amount of forecast set forecast model, then
The ranking prediction of TV play is carried out according to the size of predictive value;The forecast modelProgressively to be corrected after TV play first broadcast
Predict the outcome.
The collection that the present invention goes out in the following a period of time of TV play in VOD system with look-ahead compared with prior art
Program request amount, predicts the outcome and effectively reflects the popular degree of TV play, and method is easy, and accuracy is good, is video operator
Foundation is provided in the decision-making that copyright purchase is broadcasted in TV play, the attraction user and increase ad click amount to online request system
Strong support is provided.
Description of the drawings
Fig. 1 is the schematic flow sheet of the present invention.
Specific embodiment
Refering to accompanying drawing 1, the present invention utilizes Sina weibo and Baidu search index to capture before and after TV play is shown for the first time and TV play
The related microblogging quantity of name and searching times and with TV play related data, by the data application correlation analysiss and list of crawl
Initial characteristicses collection is set up after linear variable displacement regression Calculation, then to initial characteristicses collection using method of gradual regression obtain X feature sets and
XbFeature set, by X feature sets and XbFeature set obtains the equal program request amount of forecast set using multiple linear regression method, then by predictive value
Size carry out the ranking prediction of TV play, concrete prediction is carried out in the steps below:
(One), data crawl
A, the TV play that hot broadcast is over by web crawlers crawl, and the corresponding performer of TV play is captured in Semen Sojae Preparatum
With the collection essential information such as number, the master data of n portion's TV plays is obtained.
B, the API provided using Sina weibo, obtain in microblogging ranking list before entertainment field ranking 100 user, then
According to concern relation, official's microblogging of extending user number, and completion TV play performer and Ge great satellite TVs, and capture this crowd of user's
Microblog data.
(Two), statistical sample
A, the data of analysis entertainment field user, the factor that statistics may be related to TV play form A microblog data samples.
15 after the related total microblogging number of TV play name and first broadcast if b, statistics cadre TV play show for the first time in the previous moon weekly
Microblogging number daily in it is B microblog data samples.
TV play in c, statistics Baidu index is shown for the first time in the previous moon after the searched number of times of TV play name weekly and first broadcast
In 15 days, daily searching times are search data sample.
(Three), initial characteristicses collection foundation
A, the equal program request amount of correlative factor and collection of TV plays of a steps in statistical sample is entered respectively using SPSS analytical tools
The calculating of row Pearson's correlation coefficient and Spearman's correlation coefficient, under the conditions of confidence level is 5%, as long as meeting one of them
Dependency is significantly correlated factor, and the significantly correlated factor is added to initial characteristicses concentration then.
After b, the microblogging quantity in the previous moon that TV play is shown for the first time weekly and first broadcast microblogging quantity daily in 15 days with
TV play shows for the first time searched daily in 15 days number of times after searched weekly number of times in the previous moon and first broadcast respectively as list
Variable, is calculated using the linear regression in SPSS analytical tools, and wherein dependent variable is the equal program request amount of collection of TV plays of crawl,
Independent variable is the single argument of each time point, obtains each variable to capturing explanation degree R of the equal program request amount of collection of TV plays2Value,
The forecasted variances of each time point microblogging and search data is contrasted, as each time point can calculate 2 R2Value, selects both
Among R2Value the greater is added to initial characteristicses collection.
(Four), X and XbThe foundation of feature set
Initial characteristicses collection is further selected to obtain X feature sets with the stepwise regression method in SPSS analytical tools, its
Used in the probability of F be to enter 0.05, delete 0.1, the feature before then extracting TV play first broadcast in X feature sets with regard to obtaining
As XbFeature set.
(Five), TV play ranking prediction
With SPSS analytical tools to X feature sets and XbFeature set carries out multivariate linear equation and obtains two forecast models,
And bigoted item is added to forecast model and specific microblogging account number whether is set up, if setting up specific microblogging account number, then exist
Add whether there is the equal difference of specific microblogging account number collection of TV plays in the result that SPSS analytical tools are calculated.
In the multivariate linear equation for adding bigoted item, by XbFeature set is calculated can be predicted before TV play first broadcast
Collect the forecast model of equal program request amount;By X feature sets be calculated can TV play first broadcast after the equal program request amount of forecast set prediction mould
Type, forecast modelAmendment progressively can be carried out after TV play first broadcast.Forecast modelAnd forecast modelWhat is obtained is pre-
The equal program request amount of collection is surveyed, the ranking prediction of TV play is then carried out according to the size of the equal program request amount of forecast set.Experiment shows:In test
In data set, best result can reach R2=0.65, with true ranking and prediction of the SPSS analytical tools to TV play program request amount
Ranking carries out the accuracy of the calculating of Spearman's correlation coefficient, the size of Spearman coefficient and significant performance explanation prediction,
Coefficient is worth the more accurate of bigger prediction between 0 ~ 1.
More than simply the present invention is further illustrated, and be not used to limit this patent, it is all for equivalence enforcement of the present invention,
It is intended to be limited solely by within the right of this patent.
Claims (1)
1. a kind of TV play program request amount Forecasting Methodology based on network data, it is characterised in that first with web crawlers crawl TV play
Microblogging quantity related to TV play name before and after broadcasting and searching times and with TV play related data, will crawl data application
Initial characteristicses collection is set up after correlation analysiss and single variable linear regression Calculation, method of gradual regression is adopted to initial characteristicses collection then
Obtain X feature sets and XbFeature set, by X feature sets and XbFeature set obtains the equal program request amount of forecast set using multiple linear regression method,
Then the ranking prediction of TV play is carried out by the size of predictive value, concrete prediction is carried out in the steps below:
(One), data crawl
If a, by web crawlers crawl be over cadre's TV play of hot broadcast and master data corresponding with TV play;
B, the user for obtaining in microblogging ranking list before entertainment field ranking 100, then according to concern relation, extending user number, and
Official's microblogging of completion TV play performer and Ge great satellite TVs, and capture the microblog data of this crowd of user;
(Two), statistical sample
A, the data of analysis entertainment field user, the factor for counting related to TV play are A microblog data samples;
After the related total microblogging number of TV play name and first broadcast if b, statistics cadre TV play show for the first time in the previous moon weekly in 15 days
Daily microblogging number is B microblog data samples;
TV play in c, statistics Baidu index is shown for the first time in the previous moon 15 days after the searched number of times of TV play name weekly and first broadcast
In daily searching times be search data sample;
(Three), initial characteristicses collection foundation
A, skin is carried out respectively to the equal program request amount of correlative factor and collection of TV plays of a steps in statistical sample using SPSS analytical tools
The calculating of the inferior correlation coefficient of that and Spearman's correlation coefficient, under the conditions of confidence level is 5%, as long as meeting one of related
Property be significantly correlated factor;
After b, the microblogging quantity in the previous moon of being shown for the first time to TV play using SPSS analytical tools respectively weekly and first broadcast in 15 days
Daily microblogging quantity and TV play are shown for the first time daily searched in 15 days after searched weekly number of times in the previous moon and first broadcast
Number of times carry out single variable linear regression Calculation, obtain R of each variable to the equal program request amount of collection of TV plays2Value, by each time
Point microblogging and R larger in search data2Used as characteristic factor, wherein dependent variable is the equal program request amount of collection of TV plays to value, independent variable
For the single argument of each time point;
C, by R larger in the significantly correlated factor and b step in above-mentioned a steps2Value composition initial characteristicses collection;
(Four), X and XbThe foundation of feature set
Successive Regression is carried out to initial characteristicses collection with SPSS analytical tools and is calculated X feature sets, electricity is extracted from X feature sets
Feature depending on can just obtain before acute first broadcast obtains XbFeature set;
(Five), TV play ranking prediction
With SPSS analytical tools to X feature sets and XbFeature set carries out multivariate linear equation and obtains two forecast models, and to pre-
Survey model to add bigoted item and whether set up the inquiry item of specific microblogging account number, if setting up specific microblogging account number,
The result that SPSS analytical tools are calculated is plus the equal difference of the specific microblogging account number collection of TV plays;By XbFeature set Jing adds inclined
Hold item multivariate linear equation obtain can TV play first broadcast before the equal program request amount of forecast set forecast model I;Added by X feature sets Jing
Enter bigoted item multivariate linear equation obtain can after TV play first broadcast the equal program request amount of forecast set forecast model II, then basis
The size of predictive value carries out the ranking prediction of TV play;The forecast model II is pre- for what is progressively corrected after TV play first broadcast
Survey result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410255632.2A CN104035994B (en) | 2014-06-11 | 2014-06-11 | Prediction method of television play on-demand amount based on network data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410255632.2A CN104035994B (en) | 2014-06-11 | 2014-06-11 | Prediction method of television play on-demand amount based on network data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104035994A CN104035994A (en) | 2014-09-10 |
CN104035994B true CN104035994B (en) | 2017-04-12 |
Family
ID=51466764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410255632.2A Expired - Fee Related CN104035994B (en) | 2014-06-11 | 2014-06-11 | Prediction method of television play on-demand amount based on network data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104035994B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537073B (en) * | 2014-12-31 | 2018-05-01 | 合一网络技术(北京)有限公司 | The evaluation method of network multimedia object broadcast information |
CN104516983A (en) * | 2015-01-08 | 2015-04-15 | 龙思薇 | Data display method |
CN105095414A (en) * | 2015-07-10 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus used for predicting network search volume |
CN105005623A (en) * | 2015-07-27 | 2015-10-28 | 东南大学 | Power demand prediction method based on keyword retrieval index correlation analysis |
CN106339771A (en) * | 2016-08-09 | 2017-01-18 | 北京猫眼文化传媒有限公司 | Movie box office data prediction method and device |
CN108898415A (en) * | 2018-05-29 | 2018-11-27 | 北京奇艺世纪科技有限公司 | A kind of the flow index of correlation prediction technique and device of video collection of drama |
CN111050195B (en) * | 2018-10-12 | 2021-11-26 | 中国电信股份有限公司 | Streaming media caching method and device and computer readable storage medium |
CN110706015B (en) * | 2019-08-21 | 2023-06-13 | 北京大学(天津滨海)新一代信息技术研究院 | Feature selection method for advertisement click rate prediction |
CN111447470B (en) * | 2019-10-22 | 2021-04-20 | 深圳市野生动物园有限公司 | Video application program parameter setting platform |
CN113379447A (en) * | 2021-05-28 | 2021-09-10 | 西安影视数据评估中心有限公司 | Method for predicting single-day audience rating of TV play |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077190A (en) * | 2012-12-20 | 2013-05-01 | 人民搜索网络股份公司 | Hot event ranking method based on order learning technology |
CN103345512A (en) * | 2013-07-06 | 2013-10-09 | 北京品友互动信息技术有限公司 | Online advertising click-through rate forecasting method and device based on user attribute |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120084141A1 (en) * | 2009-03-30 | 2012-04-05 | Acquisio | System and Method to Predict the Performance of Keywords for Advertising Campaigns Managed on the Internet |
-
2014
- 2014-06-11 CN CN201410255632.2A patent/CN104035994B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077190A (en) * | 2012-12-20 | 2013-05-01 | 人民搜索网络股份公司 | Hot event ranking method based on order learning technology |
CN103345512A (en) * | 2013-07-06 | 2013-10-09 | 北京品友互动信息技术有限公司 | Online advertising click-through rate forecasting method and device based on user attribute |
Also Published As
Publication number | Publication date |
---|---|
CN104035994A (en) | 2014-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104035994B (en) | Prediction method of television play on-demand amount based on network data | |
Figueiredo et al. | On the dynamics of social media popularity: A YouTube case study | |
Szabo et al. | Predicting the popularity of online content | |
US10089402B1 (en) | Display of videos based on referrers | |
CN104602042B (en) | Label setting method based on user behavior | |
Takamura et al. | Summarizing a document stream | |
CN102780920A (en) | Television program recommending method and system | |
US8600969B2 (en) | User interest pattern modeling server and method for modeling user interest pattern | |
CN104219575A (en) | Related video recommending method and system | |
CN104021140B (en) | A kind of processing method and processing device of Internet video | |
CN102207972A (en) | Television program recommending method and device for digital television | |
CN104615627B (en) | A kind of event public feelings information extracting method and system based on microblog | |
CN101944111B (en) | Method and device for searching news video | |
CN105184616A (en) | Method and device for targeted delivery of business object | |
Aggrawal et al. | View-count based modeling for YouTube videos and weighted criteria–based ranking | |
WO2017192184A1 (en) | Annotation of videos using aggregated user session data | |
CN109885656B (en) | Microblog forwarding prediction method and device based on quantification heat degree | |
US9305215B2 (en) | Apparatus, method and computer readable recording medium for analyzing video using image captured from video | |
CN105260905A (en) | Method and device for evaluating and predicting influence of media program | |
CN103997662A (en) | Program pushing method and system | |
CN103745380A (en) | Advertisement delivery method and apparatus | |
CN104008193A (en) | Information recommending method based on typical user group finding technique | |
CN106604068B (en) | A kind of method and its system of more new media program | |
CN105141982B (en) | A kind of method and device for the EPG generating popular program | |
Park et al. | Exploring the user-generated content (UGC) uploading behavior on YouTube |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170412 Termination date: 20200611 |
|
CF01 | Termination of patent right due to non-payment of annual fee |