CN107920260A - Digital cable customers behavior prediction method and device - Google Patents

Digital cable customers behavior prediction method and device Download PDF

Info

Publication number
CN107920260A
CN107920260A CN201610883971.4A CN201610883971A CN107920260A CN 107920260 A CN107920260 A CN 107920260A CN 201610883971 A CN201610883971 A CN 201610883971A CN 107920260 A CN107920260 A CN 107920260A
Authority
CN
China
Prior art keywords
user
shutdown
model
contextual information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610883971.4A
Other languages
Chinese (zh)
Inventor
万倩
赵明
朱佩江
李培琳
牛妍华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National News Publishes Broadcast Research Institute Of General Bureau Of Radio Film And Television
Academy of Broadcasting Science of SAPPRFT
Original Assignee
National News Publishes Broadcast Research Institute Of General Bureau Of Radio Film And Television
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National News Publishes Broadcast Research Institute Of General Bureau Of Radio Film And Television filed Critical National News Publishes Broadcast Research Institute Of General Bureau Of Radio Film And Television
Priority to CN201610883971.4A priority Critical patent/CN107920260A/en
Publication of CN107920260A publication Critical patent/CN107920260A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4665Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the present invention provides a kind of digital cable customers behavior prediction method and device.This method includes:User watched contextual information is obtained, the contextual information includes:Fundamental type, programme attribute, rating period;According to the contextual information, user's shutdown model is determined;According to user shutdown model, it is predicted to the shutdown behavior of user of television set shutdown data can not be collected.The embodiment of the present invention is by obtaining user watched contextual information, contextual information, determine user's shutdown model, according to user's shutdown model, the shutdown behavior of user that television set shutdown data can not be collected is predicted, at the time of predicting user's closing television machine, at the time of according to user's closing television machine, determine the invalid data in the user audience data of set-top box passback, improve the accuracy that broadcasting and TV cable operator carries out audience rating investigating and user watched behavioural analysis.

Description

Digital cable customers behavior prediction method and device
Technical field
The present embodiments relate to field of communication technology, more particularly to a kind of digital cable customers behavior prediction method and dress Put.
Background technology
With the popularization of the acceleration of Two-way Reconstruction in CATV Network, and bidirectional digital television set top box, magnanimity family The behavioral data of user's operation set-top box can be collected, and be back to back-end data storage server by acquisition system, real The collection of existing mass users viewing behavior data.Meanwhile have benefited from the development of big data technology, on the one hand by audience rating investigating and The sample space of analysis expands as total user, can obtain comprehensively accurately analysis result;On the other hand can also be directed to specific Crowd carries out rating signature analysis, helps operator to adjust Operation Decision in real time, provides the rating service of personalization to the user, from And improve user experience and increase operating income.
As long as however, bidirectional digital television set top box in the state of start, will in real time monitor and return user channel Redirect, the use of interactive service and the page stop etc. behavior.And in real life, most of user habitually only closes Television set, and set-top box is still in open state, at this time, set-top box may proceed to passback user audience data, it is clear that this portion Divided data is invalid.This partial invalidity data can largely influence broadcasting and TV cable operator carry out audience rating investigating and The accuracy of user watched behavioural analysis.
The content of the invention
The embodiment of the present invention provides a kind of digital cable customers behavior prediction method and device, with improve audience rating investigating and The accuracy of user watched behavioural analysis.
The one side of the embodiment of the present invention is to provide a kind of digital cable customers behavior prediction method, including:
User watched contextual information is obtained, the contextual information includes:When fundamental type, programme attribute, rating Section;
According to the contextual information, user's shutdown model is determined;
According to user shutdown model, carried out in advance to the shutdown behavior of user of television set shutdown data can not be collected Survey.
The other side of the embodiment of the present invention is to provide a kind of digital cable customers behavior prediction device, including:
Acquisition module, for obtaining user watched contextual information, the contextual information includes:Fundamental type, section Mesh attribute, rating period;
Determining module, for according to the contextual information, determining user's shutdown model;
Prediction module, for being shut down model according to the user, the user to television set shutdown data can not be collected Shutdown behavior is predicted.
Digital cable customers behavior prediction method and device provided in an embodiment of the present invention, it is user watched upper by obtaining Context information, contextual information, determines user's shutdown model, according to user's shutdown model, to that can not collect television set The shutdown behavior of the user for data of shutting down is predicted, at the time of predicting user's closing television machine, according to user's closing television machine At the time of, determine set-top box passback user audience data in invalid data, improve broadcasting and TV cable operator into Row audience rating investigating and the accuracy of user watched behavioural analysis.
Brief description of the drawings
Fig. 1 is digital cable customers behavior prediction method flow diagram provided in an embodiment of the present invention;
Figure 1A is shutdown behavior prediction flow chart provided in an embodiment of the present invention;
Fig. 2 is shutdown duration distribution map provided in an embodiment of the present invention;
Fig. 3 is shutdown duration log series model figure provided in an embodiment of the present invention;
Fig. 4 A are the cumulative distribution figure of CCTV1 channels provided in an embodiment of the present invention shutdown duration distribution;
Fig. 4 B are the log series model figure of CCTV1 channels provided in an embodiment of the present invention shutdown duration distribution;
Fig. 5 A are the cumulative distribution figure of Hunan Satellite TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 5 B are the log series model figure of Hunan Satellite TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 6 A are the cumulative distribution figure of Beijing TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 6 B are the log series model figure of Beijing TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 7 A are provided in an embodiment of the present invention《Hungry game-ridicule bird (on)》The cumulative distribution of shutdown duration distribution Figure;
Fig. 7 B are provided in an embodiment of the present invention《Hungry game-ridicule bird (on)》Shut down the log series model that duration is distributed Figure;
Fig. 8 A are provided in an embodiment of the present invention《The Mi months pass》Shut down the cumulative distribution figure that duration is distributed;
Fig. 8 B are provided in an embodiment of the present invention《The Mi months pass》Shut down the log series model figure that duration is distributed;
Fig. 9 A are provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the cumulative distribution figure that duration is distributed;
Fig. 9 B are provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the log series model figure that duration is distributed;
Figure 10 is the schematic diagram of the result of the prior art being trained with a traditional regression tree;
Figure 11 is the schematic diagram of regression tree model provided in an embodiment of the present invention;
Figure 12 is showing for the experimental result that iteration traditional decision-tree provided in an embodiment of the present invention carries out shutdown model training It is intended to;
Figure 13 is the structure chart of digital cable customers behavior prediction device provided in an embodiment of the present invention;
Figure 14 is the structure chart for the digital cable customers behavior prediction device that another embodiment of the present invention provides.
Embodiment
Iteration decision tree (Gradient Boosting Decision Tree, abbreviation GBDT) is called multiple accumulative recurrence Set (Multiple Additive Regression Tree, abbreviation MART), be a kind of decision Tree algorithms of iteration, the algorithm It is made of more decision trees, the conclusion of all trees, which adds up, does final result.It is just recognized together at the beginning of being suggested with SVM To be the stronger algorithm of generalization ability.More caused everybody pass because the machine learning model of sequence is used to search in recent years Note.
Gradient Boost are a frame in fact, and the inside can be inserted in many different algorithms.Boost is " lifting " The meaning, general Boosting algorithms are all the processes of an iteration, and training new each time is provided to improve last As a result.
Original Boost algorithms are at the beginning of algorithm, and a weighted value is assigned to for each sample, when initial Wait, everybody is just as important.The model obtained in the training of each step, can cause being estimated to be to wrong for data point, I Just after each the end of the step, increase the weight of the point of misclassification, reduce point to point weight, so that if some points are old It is by misclassification, then a very high weight also will be just assigned to by " serious concern ".Etc. then carried out n times iteration (by User specifies), it will obtain N number of simple grader (basic learner), then we they are combined (such as Them can be weighted or allow them to vote etc. by saying), obtain a final model.
And Gradient Boost and the difference of traditional Boost are, calculating each time is to reduce the last time Residual error (residual), and in order to eliminate residual error, one can be established on gradient (Gradient) direction of residual error reduction newly Model.Thus, in Gradient Boost, the foundation of each new model is in order to enable the residual error of model is past before Gradient direction is reduced, and very big difference is weighted to correct, mistake sample with traditional Boost.
In classification problem, there is a critically important content to be called Multi-Class Logistic, that is, more classification Logistic problems, it is suitable for those classification numbers>2 the problem of, and in classification results, sample x is not certain only category Sample x can be obtained in some class to be belonging respectively to the probability of multiple classes (alternatively it is several to meet some by the estimation y of sample x What is distributed).Here just with a conclusion:If a classification problem meets geometry distribution, then can be become with Logistic Bring the computing after carrying out.
Assuming that for a sample x, it may belong to K classification, its estimate is respectively F1 (x) ... FK (x), Logistic conversion is as follows, and logistic conversion is one smooth and by the mistake of data normalization (so that the length of vector is 1) Journey, as a result to belong to the Probability p of classification kk(x), as shown in formula (1):
For Logistic convert after as a result, loss function such as formula (2)
Wherein, yk is the estimate of the sample data of input, when a sample x belongs to classification k, yk=1, otherwise yk= 0。
Bring the Logistic formulas converted into loss function, and to its derivation, the gradient of loss function can be obtained, As shown in formula (3):
Assuming that input data x may belong to 5 classification (be respectively 1,2,3,4,5), in training data, x belongs to classification 3, Then y=(0,0,1,0,0), it is assumed that the F (x) that model is estimated=(0,0.3,0.6,0,0), then after Logistic is converted Data p (x)=(0.16,0.21,0.29,0.16,0.16), y-p obtains gradient g:(-0.16,-0.21,0.71,-0.16,- 0.16).Observation can obtain a more interesting conclusion here:
Assuming that gk works as the gradient in certain one-dimensional (some classification) for sample:
gk>When 0, bigger its Probability p (x) on this is one-dimensional of expression should more improve, such as the third dimension above Probability is 0.29, should just be improved, and belong to advance toward " being correctly oriented ";It is smaller to represent that " accurate " is got in this estimation.
gk<Smaller when 0, negative probability of more expressions on this is one-dimensional that obtain should reduce, such as the second dimension 0.21 just should This is reduced.Belong to advance towards " opposite direction of mistake ";Bigger, bearing fewer must represent that this estimation is more " good By mistake ".
Generally, for a sample, optimal gradient is the gradient closer to 0.So we will can allow letter Several estimates enable to gradient toward opposite direction movement (>In 0 dimension, moved toward negative direction,<In 0 dimension, toward pros To movement) finally cause gradient as far as possible=0, and the algorithm can seriously pay close attention to the bigger sample of those gradients, with Boost The meaning it is similar.
After obtaining gradient, it is how to allow gradient to reduce.Here the method for an iteration+decision tree for being, when When initialization, at will provide an estimation function F (x) (it is a random value that can allow F (x), can also allow F (x)= 0), then a decision tree is established afterwards per one step of iteration just according to the situation of the gradient of each current sample.Just allow function Negative side toward gradient marches forward, and finally so that after iteration N steps, gradient is smaller.
Here the decision tree and common decision tree established are not quite alike, and first, this decision tree is a leaf node Number J is fixed, and after J node is generated, does not just regenerate new node.
Fig. 1 is digital cable customers behavior prediction method flow diagram provided in an embodiment of the present invention.As shown in Figure 1, the party Method comprises the following steps that:
Step S101, user watched contextual information is obtained, the contextual information includes:Fundamental type, program category Property, rating period.
The user that the present embodiment is related to shuts down, and behavior is related with contextual information, and contextual information includes:Fundamental type, section Mesh attribute, rating period;Wherein, fundamental type includes:Region and type of service, type of service specifically such as page browsing, point Broadcast, is live, time shift and reviewing;Programme attribute includes direct broadcast band, program category such as film, TV play, variety, animation Deng, program temperature, program duration etc.;The rating period includes broadcast time such as what day, live rating period.Specifically, ground Domain, type of service, direct broadcast band, program category, program temperature, program duration, broadcast time, live rating period, this eight Attribute can judge as behavioural characteristic.
Step S102, according to the contextual information, user's shutdown model is determined.
In the present embodiment, the basis of user's shutdown model foundation is:Set-top box can capture partial TV machine switch The electricity of high-definition multimedia interface (High Definition Multimedia Interface, abbreviation HDMI) pin during machine Flat change, returns TV power on and off data.
The effect of digital cable customers shutdown behavior prediction is, is left or closing television machine in user, and set-top box In the case of still in open state, this partial invalidity data is distinguished from the user behavior data of passback, and estimate use The family most possible shutdown moment.User's shutdown behavior is specifically defined as:When between the behavioral data time of user's operation set-top box Every excessive, user leaves or the possibility of closing television machine during this period for estimation, and the most possible shutdown moment, so as to protect Demonstrate,prove the validity of user watched behavioral statistics.
User's ring-closing metathesis is analyzed below, for example, certain Netcom of province crosses bidirectional digital television set top box and collects hundred All behavioral datas of the general-purpose family during in March, 2016, type of service is related to live, program request, time shift, reviews and information Deng.Wherein contain more than 300 ten thousand television sets shutdown data at the family of general-purpose more than 20.If define shutdown behavior to close with user Time interval between the last item behavioral data before machine is as shutdown duration, then when can obtain more than 300 ten thousand shutdown Long data, Fig. 2 are shutdown duration distribution map provided in an embodiment of the present invention, and abscissa represents shutdown duration, and ordinate, which represents, to close Machine duration is located at the shutdown number in corresponding time interval, it can be seen that most of shutdown duration is less than 100 minutes.If to figure Transverse and longitudinal coordinate in 2 is taken the logarithm, and obtained shutdown duration log series model figure shows that user is closed as shown in figure 3, approximate straight line Machine duration meets Zipf distributions.
In addition, by experiment, it can also draw direct broadcast band distribution, programme televised live distribution, introduce separately below:
(1) direct broadcast band is distributed
Behavior before user's shutdown is live more than 90%, other behaviors such as program request, time shift, review and page browsing etc. Less than 10%, in order to be better understood by the rule grown when the user turns it down, the present invention has counted to be grown in each live frequency when the user turns it down Distribution situation on road, the present embodiment sets forth CCTV1, Hunan Satellite TV, the shutdown duration distribution map of Beijing TV, can be with Find out that shutdown duration equally shows as the characteristic of Zipf distributions on each direct broadcast band.Fig. 4 A are provided in an embodiment of the present invention The cumulative distribution figure of CCTV1 channels shutdown duration distribution, Fig. 4 B are CCTV1 channels provided in an embodiment of the present invention shutdown duration point The log series model figure of cloth.The cumulative distribution figure that Fig. 5 A are distributed for Hunan Satellite TV provided in an embodiment of the present invention shutdown duration, Fig. 5 B For the log series model figure of Hunan Satellite TV provided in an embodiment of the present invention shutdown duration distribution.Fig. 6 A are provided in an embodiment of the present invention The cumulative distribution figure of Beijing TV shutdown duration distribution, Fig. 6 B are Beijing TV provided in an embodiment of the present invention shutdown duration distribution Log series model figure.
(2) programme televised live is distributed
It is long when the user turns it down that the characteristic of Zipf distributions is not only showed on direct broadcast band, but also also have in programme televised live Similar property, optionally, the present embodiment with film for example《Hungry game-ridicule bird (on)》, TV play for example《The Mi months pass》、 Variety show《Happy pleasure is overturned the heavens》Exemplified by, the duration distribution that user shuts down after these programs are watched is introduced, Fig. 7 A are the present invention What embodiment provided《Hungry game-ridicule bird (on)》Shut down the cumulative distribution figure that duration is distributed, and Fig. 7 B are the embodiment of the present invention There is provided《Hungry game-ridicule bird (on)》Shut down the log series model figure that duration is distributed.Fig. 8 A are provided in an embodiment of the present invention 《The Mi months pass》Shut down the cumulative distribution figure that duration is distributed, and Fig. 8 B are provided in an embodiment of the present invention《The Mi months pass》Shut down what duration was distributed Log series model figure.Fig. 9 A are provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the cumulative distribution figure that duration is distributed, Fig. 9 B To be provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the log series model figure that duration is distributed.As it can be seen that during contrast channel shutdown Long and program shutdown duration, it can be clearly seen that, program shutdown duration will be much smaller than channel shutdown duration, when reason is program Length is limited, and the time that the residence time that user enters certain channel obviously can rest in certain program than user is long.
In addition, user shuts down, behavior is heavily dependent on the contextual information of user watched state, such as:Service class Type, user's shutdown possibility after browsing pages or viewing request program is smaller than viewing direct broadcast band, particularly for a long time When resting on a certain channel;In addition, also rating period, the probability of rating peak period shutdown is obviously than the unexpected winner rating period It is small.Therefore, user watched contextual information is classified as three classes by the present invention:Fundamental type, programme attribute, rating period, wherein Fundamental type includes region and type of service (page browsing, program request, live, time shift and review);Programme attribute includes live Channel, program category (film, TV play, variety, animation etc.), program temperature, program duration etc.;The rating period includes week Several, live six period of rating etc..Specific classification is as shown in table 1:
Table 1
For the ease of statement, region represents that value is discrete integer (T with R>=1):Each numerical value uniquely corresponds to some Districts and cities;Type of service represents that value is discrete integer (T={ 1-5 }) with T:1 representation page browses, 2 represent that program request, 3 represent straight Broadcast, 4 represent that time shifts, 5 represent to review;Direct broadcast band represents that value is discrete integer (C with C>=1):Each numerical value is uniquely right Some channel is answered, such as 1 represents CCTV1, and 2 represent CCTV2 etc.;Program category represents that value is discrete integer (P={ 1- with P 4}):1 represents that film, 2 represent that TV play, 3 represent that variety, 4 represent animation, has only carried out first-level class to program here, actual Can further it be segmented in, such as film can be further subdivided into comedy, action, love;Program temperature represents with H, Value is discrete integer (H>=1):Value is bigger, i.e. viewing number is more, illustrates that the program is more popular, under normal circumstances, uses Family is interested in popular program, therefore the possibility shut down in the program playing duration is smaller;Program duration represents with L, Value is continuous integral number (L>0);What day represents that value is discrete integer (W={ 1-7 }) with W:1 represents that Monday, 2 represent Tuesday ..., 7 represent Sunday;Live six period of rating represents that value is discrete integer (1-6) with I:1 represents 0 point to 6 Point, 26 points to 9 points of expressions, 39 points to 12 points of expressions, 4 12 points to 15 points of expressions, 5 15 points to 19 points of expressions, 6 represent 19 points to 24 Point.Therefore, the user provided shuts down shown in model such as formula (4):
T=f (R, T, C, P, H, L, W, I) (4)
Wherein f is shutdown pattern function, and the present invention uses iteration decision Tree algorithms (Gradient Boosting Decision Tree, abbreviation GBDT) model is trained, and the use of television set shutdown data can not be collected with the model prediction The shutdown behavior at family.
Step S103, according to user shutdown model, the shutdown of the user to television set shutdown data can not be collected Behavior is predicted.
As shown in Figure 1A, shutdown behavior prediction flow includes:Cleaning and arrangement generation input data, model training, prediction As a result 3 stages, wherein, user behavior data, user property and matchmaker provide data and are input to Spark Distributed Computing Platforms, shape Into input data, input data specifically include region, type of service, direct broadcast band, program category, program temperature, program duration, Broadcast time, live rating period.According to test and training data scale parameter division input data, and by the training after division Data are input to the training pattern based on iteration decision tree, in the training process, can be trained according to algorithm parameter, algorithm ginseng Number includes purity and calculates and set depth selection, obtains final training pattern.Shutdown duration is returned according to training pattern Prediction, obtains feedback result.
Decision tree includes categorised decision tree and regression tree, and the representative algorithm of wherein categorised decision tree is C4.5, mainly For the prediction of more tag along sort values, the ups and downs of the gender, Spam Classification, stock market of such as user;And regression tree can For predict real number value, the age of such as user, height, representative algorithm is GBDT, it at the beginning of being suggested just and SVM quilts It is considered the most strong algorithm of generalization ability.
As a comparison, categorised decision tree is first said, C4.5 is all classification thresholds of each exhaustive feature in each branch Value, finds so that according to characteristic value<=threshold value, and characteristic value>The feature and threshold value of the entropy maximum for two branches that threshold value is divided into, Two new nodes are obtained according to the standard branch, continue branch until all samples are all divided into unique leaf with same method Node, or reach default end condition, if the classification in final leaf node is not unique, using the classification of more numerical examples as The classification of the leaf node.
The workflow of regression tree is similar, but can obtain one in each node (being not necessarily leaf node) Predicted value, by taking the age as an example, which is equal to the average value at the owner's age for belonging to this node.Exhaustion is each during branch All threshold values of a feature look for best cut-point, but it is no longer maximum entropy to weigh best standard, but minimize mean square deviation, I.e., it is well understood that the number for being predicted error is more, wrong more goes against accepted conventions for this, Mean square deviation is bigger, and most reliable branch foundation can be found by minimizing mean square deviation.Branch is until on each leaf node The age of people is all unique or reaches default end condition (such as leaf number upper limit), if on final leaf node people age It is not unique, then prediction age with proprietary average age on the node as the leaf node.
Iteration decision tree GBDT (Gradient BoostDecision Tree) is calculated as the representativeness of regression tree Method, is with the difference of traditional regression tree:GBDT is set come Shared Decision Making for more by Gradient Iteration, each The input of tree be before all trees conclusion sum residual error, this residual error be exactly before all trees the sum of prediction result and reality As a result difference.By taking the age is predicted as an example, if training set only has 4 people, first, second, and third, fourth, their age is 8,22 respectively, 26,40.Wherein first, second is to read student;Thirdth, fourth is company personnel.If trained with a traditional regression tree, obtain Arrive that the results are shown in Figure 10.
The regression tree model that GBDT is trained on identical sample space is as shown in figure 11.Can from Figure 11 Go out, the one tree of GBDT is as the first layer branch of Figure 10, since the first and second ages are close, the third fourth age is close, Ta Menfen The left and right node of tree is not assigned to, predicted value of the average age per node as one tree.The residual error difference obtained at this time For first=- 7, second=7, the third=- 7, fourth=7, then, the input sample of one tree are substituted with residual error, can obtain second Tree, it is respectively first=0 to be trained to obtain new residual error with new feature, second=0, the third=0, fourth=0, it is clear that by second Tree iterative learning after, residual error has been kept to 0 (being difficult to realize under truth), and at this moment we can utilize trained model It is predicted.I.e.:
First:The student of 8 years old, likes playing computer game, the prediction age is 15+ (- 7)=8
Second:The student of 22 years old, likes playing mobile phone games, the prediction age is 15+7=22
Third:The employee of 26 years old, likes playing computer game, the prediction age is 33+ (- 7)=26
Fourth:The employee of 40 years old, likes playing mobile phone games, the prediction age is 33+7=40
For the embodiment of the present invention by obtaining user watched contextual information, contextual information, determines that user shuts down Model, according to user's shutdown model, is predicted to that can not collect the shutdown behavior of user of television set shutdown data, predicts At the time of user's closing television machine, at the time of according to user's closing television machine, the user watched behavior of set-top box passback is determined Invalid data in data, improves broadcasting and TV cable operator and carries out the accurate of audience rating investigating and user watched behavioural analysis Property.
In addition, in order to illustrate the method for above-described embodiment, the embodiment of the present invention obtains the institute in March, 2016 from certain province's net There is the behavioral data of bidirectional digital television set top box user, including user's browsing pages, live, program request, time shift and return The business such as see.Total amount of data more than 300G, the moon active users up to more than million, nearly 300,000 set-top box can upload TV organ Machine behavior, amounts to more than 300 ten thousand shutdown records.First, the behavioral data of magnanimity is carried out using Spark distributed proccessings Pretreatment, and therefrom extract feature shown in the every corresponding table 1 of shutdown data, the present invention only taken region, type of service, into The rating period of business, the rating period at moment of shutting down, what day these feature entered, and these behavioural characteristics are identical Shutdown duration be averaging to obtain the shutdown durations of the shutdown data with this category feature, while the identical shutdown of feature Number of data is as newly-increased feature.Nearly 5000 samples are obtained with this.Finally, sample space is divided into training set and test Collection, wherein training set includes 80% sample, and shutdown model instruction is carried out using the iteration traditional decision-tree (GBDT) described in upper section Practice, then with the shutdown duration of sample in trained model prediction test set.Experimental result is as shown in figure 12.
Grey filled lines represent the actual value of test sample shutdown duration in figure, when shutdown is pressed easy to observe, in drawing course Length is ranked up from small to large, and solid black lines represent the corresponding prediction result of shutdown model, it can be seen that predicted value is around actual Value fluctuation, but global error is smaller, and absolute error illustrated that prediction was more accurate within 20 minutes.
The present embodiment is based on cable digital TV user when terminating to watch, and gets used to a closing television machine and have ignored pass The universal phenomenon of set-top box is closed, the present invention indicates that the user behavior data of set-top box passback during this period largely can Influence accuracy of the radio and TV operator to the statistic analysis result of the audience ratings index such as user watched behavior and program and channel. Therefore, the present invention proposes television set shutdown model, and is trained using iteration decision Tree algorithms, realizes and is determined based on iteration The digital cable customers shutdown behavior prediction of plan tree, compensate for currently being difficult to the vacancy for gathering user's shutdown data, which exists Certain saves and has obtained preferable prediction result on the truthful data that net provides, so as to ensure audience rating investigating and user watched behavior point The accuracy of analysis.
Figure 13 is the structure chart of digital cable customers behavior prediction device provided in an embodiment of the present invention.The embodiment of the present invention The digital cable customers behavior prediction device of offer can perform the place of digital cable customers behavior prediction embodiment of the method offer Flow is managed, as shown in figure 13, digital cable customers behavior prediction device includes acquisition module 131, determining module 132, prediction mould Block 133;Acquisition module 131 is used to obtain user watched contextual information, and the contextual information includes:Fundamental type, section Mesh attribute, rating period;Determining module 132 is used for according to the contextual information, determines user's shutdown model;Prediction module 133 are used for according to user shutdown model, are carried out in advance to that can not collect the shutdown behavior of user of television set shutdown data Survey.
For the embodiment of the present invention by obtaining user watched contextual information, contextual information, determines that user shuts down Model, according to user's shutdown model, is predicted to that can not collect the shutdown behavior of user of television set shutdown data, predicts At the time of user's closing television machine, at the time of according to user's closing television machine, the user watched behavior of set-top box passback is determined Invalid data in data, improves broadcasting and TV cable operator and carries out the accurate of audience rating investigating and user watched behavioural analysis Property.
Figure 14 is the structure chart for the digital cable customers behavior prediction device that another embodiment of the present invention provides.Figure 13's On the basis of, digital cable customers behavior prediction device further includes:Training module 134, for using iteration decision Tree algorithms, training User's shutdown model.
Acquisition module 131 is additionally operable to obtain the behavioral data time interval of user's operation set-top box;Determining module 132 is also used In:If the time interval is more than threshold value, according to the user shut down model, determine user's closing television machine when Carve.
In addition, digital cable customers behavior prediction device further includes sort module 135, sort module 135 is used for by described in Contextual information is divided into sample data and test data;Determining module 132 is specifically used for according to the sample data, determines user Shut down model;Prediction module 133 is specifically used for according to user shutdown model, to the shutdown row of the user of the test data To be predicted.
In addition, stating fundamental type includes region and type of service;The programme attribute include direct broadcast band, program category, Program temperature, program duration;The rating period includes broadcast time, live rating period.
Digital cable customers behavior prediction device provided in an embodiment of the present invention can be specifically used for the above-mentioned Fig. 1 of execution and be carried The embodiment of the method for confession, details are not described herein again for concrete function.
The embodiment of the present invention is based on cable digital TV user when terminating to watch, and gets used to a closing television machine and ignores The universal phenomenon of set-top box is closed, the present invention indicates the user behavior data of set-top box passback during this period largely On can influence standard of the radio and TV operator to the statistic analysis result of the audience ratings index such as user watched behavior and program and channel True property.Therefore, the present invention proposes television set shutdown model, and is trained using iteration decision Tree algorithms, realizes and is based on The digital cable customers shutdown behavior prediction of iteration decision tree, compensate for currently being difficult to the vacancy for gathering user's shutdown data, should Model has obtained preferable prediction result on the truthful data that certain province's net provides, so as to ensure audience rating investigating and user watched The accuracy of behavioural analysis.
In conclusion the embodiment of the present invention, by obtaining user watched contextual information, contextual information, determines User's shutdown model, according to user's shutdown model, carries out to that can not collect the shutdown behavior of user of television set shutdown data Prediction, at the time of predicting user's closing television machine, at the time of according to user's closing television machine, determines the user of set-top box passback Invalid data in viewing behavior data, improves broadcasting and TV cable operator and carries out audience rating investigating and user watched behavioural analysis Accuracy;Based on cable digital TV user when terminating to watch, get used to a closing television machine and have ignored closing machine top The universal phenomenon of box, the present invention indicate that the user behavior data of set-top box passback during this period can largely influence extensively Accuracy of the electric operator to the statistic analysis result of the audience ratings index such as user watched behavior and program and channel.Therefore, The present invention proposes television set shutdown model, and is trained using iteration decision Tree algorithms, realizes based on iteration decision tree Digital cable customers shutdown behavior prediction, compensate for currently being difficult to gathering user and shut down the vacancies of data, which saves at certain Net and obtained preferable prediction result on the truthful data provided, so as to ensure audience rating investigating and user watched behavioural analysis Accuracy.
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, can pass through it Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be the INDIRECT COUPLING or logical by some interfaces, device or unit Letter connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention The part steps of embodiment the method.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various Can be with the medium of store program codes.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each function module Division progress for example, in practical application, can be complete by different function modules by above-mentioned function distribution as needed Into the internal structure of device being divided into different function modules, to complete all or part of function described above.On The specific work process of the device of description is stated, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe is described in detail the present invention with reference to foregoing embodiments, it will be understood by those of ordinary skill in the art that:Its according to Can so modify to the technical solution described in foregoing embodiments, either to which part or all technical characteristic into Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme.

Claims (10)

  1. A kind of 1. digital cable customers behavior prediction method, it is characterised in that including:
    User watched contextual information is obtained, the contextual information includes:Fundamental type, programme attribute, rating period;
    According to the contextual information, user's shutdown model is determined;
    According to user shutdown model, it is predicted to the shutdown behavior of user of television set shutdown data can not be collected.
  2. 2. according to the method described in claim 1, it is characterized in that, described according to the contextual information, determine that user shuts down After model, further include:
    Using iteration decision Tree algorithms, training user's shutdown model.
  3. 3. according to the method described in claim 2, it is characterized in that, it is described according to the user shut down model, to that can not gather Shutdown behavior to the user of television set shutdown data is predicted, including:
    Obtain the behavioral data time interval of user's operation set-top box;
    If the time interval is more than threshold value, according to the user shut down model, determine user's closing television machine when Carve.
  4. 4. according to the method described in claim 1, it is characterized in that, it is described obtain user watched contextual information after, also Including:
    The contextual information is divided into sample data and test data;
    It is described to determine that user shuts down model according to the contextual information, including:
    According to the sample data, user's shutdown model is determined;
    It is described to be shut down model according to the user, to can not collect the shutdown behavior of user of television set shutdown data carry out it is pre- Survey, including:
    According to user shutdown model, the shutdown behavior to the user of the test data is predicted.
  5. 5. according to claim 1-4 any one of them methods, it is characterised in that the fundamental type includes region and service class Type;
    The programme attribute includes direct broadcast band, program category, program temperature, program duration;
    The rating period includes broadcast time, live rating period.
  6. A kind of 6. digital cable customers behavior prediction device, it is characterised in that including:
    Acquisition module, for obtaining user watched contextual information, the contextual information includes:Fundamental type, program category Property, rating period;
    Determining module, for according to the contextual information, determining user's shutdown model;
    Prediction module, for according to user shutdown model, the shutdown of the user to television set shutdown data can not be collected Behavior is predicted.
  7. 7. digital cable customers behavior prediction device according to claim 6, it is characterised in that further include:
    Training module, for using iteration decision Tree algorithms, training user's shutdown model.
  8. 8. digital cable customers behavior prediction device according to claim 7, it is characterised in that the acquisition module is also used In the behavioral data time interval for obtaining user's operation set-top box;
    The determining module is additionally operable to:If the time interval is more than threshold value, according to user shutdown model, determine described At the time of user's closing television machine.
  9. 9. digital cable customers behavior prediction device according to claim 6, it is characterised in that further include:
    Sort module, for the contextual information to be divided into sample data and test data;
    The determining module is specifically used for according to the sample data, determines user's shutdown model;
    The prediction module is specifically used for being shut down model according to the user, the shutdown behavior to the user of the test data into Row prediction.
  10. 10. according to claim 6-9 any one of them digital cable customers behavior prediction devices, it is characterised in that the base This type includes region and type of service;
    The programme attribute includes direct broadcast band, program category, program temperature, program duration;
    The rating period includes broadcast time, live rating period.
CN201610883971.4A 2016-10-10 2016-10-10 Digital cable customers behavior prediction method and device Pending CN107920260A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610883971.4A CN107920260A (en) 2016-10-10 2016-10-10 Digital cable customers behavior prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610883971.4A CN107920260A (en) 2016-10-10 2016-10-10 Digital cable customers behavior prediction method and device

Publications (1)

Publication Number Publication Date
CN107920260A true CN107920260A (en) 2018-04-17

Family

ID=61892670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610883971.4A Pending CN107920260A (en) 2016-10-10 2016-10-10 Digital cable customers behavior prediction method and device

Country Status (1)

Country Link
CN (1) CN107920260A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109769146A (en) * 2018-12-25 2019-05-17 国家新闻出版广电总局广播电视规划院 The determination method and device of broadcast TV program audience ratings
CN110110191A (en) * 2019-03-28 2019-08-09 北京奇艺世纪科技有限公司 Search processing method and device and computer readable storage medium
CN111246294A (en) * 2020-01-06 2020-06-05 国家广播电视总局广播电视规划院 Method, device, equipment and storage medium for processing audience rating index data
CN112507420A (en) * 2020-11-19 2021-03-16 同济大学 System for constructing personal personalized environment control behavior prediction model training set in office building
CN113727192A (en) * 2020-06-19 2021-11-30 天翼智慧家庭科技有限公司 Method and system for collecting viewing behaviors
US20220070535A1 (en) * 2020-09-01 2022-03-03 Comscore, Inc. Predictive detection of real-time and future viewability

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102137293A (en) * 2010-12-31 2011-07-27 华为技术有限公司 Resource allocation method, user business terminal and head end system of streaming media service
US20130055329A1 (en) * 2006-03-29 2013-02-28 At&T Intellectual Property I, L.P. Close-Captioning Uniform Resource Locator Capture System and Method
CN103260061A (en) * 2013-05-24 2013-08-21 华东师范大学 Context-perceptive IPTV program recommending method
CN103297814A (en) * 2013-06-28 2013-09-11 百视通新媒体股份有限公司 Television viewing rate assessment method and system based on internet protocol television (IPTV)
CN103329559A (en) * 2011-04-06 2013-09-25 兰屈克有限公司 Method and system for detecting non-powered video playback devices
CN104038822A (en) * 2014-07-02 2014-09-10 程振国 Automatic standby achieving method of digital television set top box
CN104349193A (en) * 2014-11-11 2015-02-11 无锡科思电子科技有限公司 Automatic power-off set-top box based on sleep recognition
US20150215566A1 (en) * 2014-01-30 2015-07-30 Vizio Inc Predictive time to turn on a television based on previously used program schedules
CN104980800A (en) * 2014-04-04 2015-10-14 北京秒针信息咨询有限公司 Method and system for monitoring startup/shutdown state of television

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130055329A1 (en) * 2006-03-29 2013-02-28 At&T Intellectual Property I, L.P. Close-Captioning Uniform Resource Locator Capture System and Method
CN102137293A (en) * 2010-12-31 2011-07-27 华为技术有限公司 Resource allocation method, user business terminal and head end system of streaming media service
CN103329559A (en) * 2011-04-06 2013-09-25 兰屈克有限公司 Method and system for detecting non-powered video playback devices
CN103260061A (en) * 2013-05-24 2013-08-21 华东师范大学 Context-perceptive IPTV program recommending method
CN103297814A (en) * 2013-06-28 2013-09-11 百视通新媒体股份有限公司 Television viewing rate assessment method and system based on internet protocol television (IPTV)
US20150215566A1 (en) * 2014-01-30 2015-07-30 Vizio Inc Predictive time to turn on a television based on previously used program schedules
CN104980800A (en) * 2014-04-04 2015-10-14 北京秒针信息咨询有限公司 Method and system for monitoring startup/shutdown state of television
CN104038822A (en) * 2014-07-02 2014-09-10 程振国 Automatic standby achieving method of digital television set top box
CN104349193A (en) * 2014-11-11 2015-02-11 无锡科思电子科技有限公司 Automatic power-off set-top box based on sleep recognition

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109769146A (en) * 2018-12-25 2019-05-17 国家新闻出版广电总局广播电视规划院 The determination method and device of broadcast TV program audience ratings
CN110110191A (en) * 2019-03-28 2019-08-09 北京奇艺世纪科技有限公司 Search processing method and device and computer readable storage medium
CN110110191B (en) * 2019-03-28 2021-05-25 北京奇艺世纪科技有限公司 Search processing method and apparatus, and computer-readable storage medium
CN111246294A (en) * 2020-01-06 2020-06-05 国家广播电视总局广播电视规划院 Method, device, equipment and storage medium for processing audience rating index data
CN113727192A (en) * 2020-06-19 2021-11-30 天翼智慧家庭科技有限公司 Method and system for collecting viewing behaviors
CN113727192B (en) * 2020-06-19 2023-09-12 天翼数字生活科技有限公司 Method and system for collecting viewing behaviors
US20220070535A1 (en) * 2020-09-01 2022-03-03 Comscore, Inc. Predictive detection of real-time and future viewability
CN112507420A (en) * 2020-11-19 2021-03-16 同济大学 System for constructing personal personalized environment control behavior prediction model training set in office building

Similar Documents

Publication Publication Date Title
CN107920260A (en) Digital cable customers behavior prediction method and device
CN110704674B (en) Video playing integrity prediction method and device
CN103559206B (en) A kind of information recommendation method and system
Hazrati et al. Recommender systems effect on the evolution of users’ choices distribution
Papadamou et al. Understanding the incel community on youtube
CN110442790A (en) Recommend method, apparatus, server and the storage medium of multi-medium data
CN103731738A (en) Video recommendation method and device based on user group behavioral analysis
CN103546773A (en) Television program recommendation method and system
CN109511015B (en) Multimedia resource recommendation method, device, storage medium and equipment
CN107832437A (en) Audio/video method for pushing, device, equipment and storage medium
Li et al. Leave no user behind: Towards improving the utility of recommender systems for non-mainstream users
CN110430476A (en) Direct broadcasting room searching method, system, computer equipment and storage medium
CN108650532B (en) Cable television on-demand program recommendation method and system
CN107894998A (en) Video recommendation method and device
CN107451148A (en) Video classification method and device and electronic equipment
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
US20190082206A1 (en) Systems and Methods for Predicting Audience Measurements of a Television Program
CN107846611A (en) A kind of TV programme method for pushing and system based on age bracket
CN107864405A (en) A kind of Forecasting Methodology, device and the computer-readable medium of viewing behavior type
Cremonesi et al. Time-evolution of IPTV recommender systems
CN107977445A (en) Application program recommends method and device
CN109508407A (en) The tv product recommended method of time of fusion and Interest Similarity
CN106604068B (en) A kind of method and its system of more new media program
Prestianta Mapping the ASEAN YouTube uploaders
CN110213660A (en) Distribution method, system, computer equipment and the storage medium of program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180417

RJ01 Rejection of invention patent application after publication