CN107920260A - Digital cable customers behavior prediction method and device - Google Patents
Digital cable customers behavior prediction method and device Download PDFInfo
- Publication number
- CN107920260A CN107920260A CN201610883971.4A CN201610883971A CN107920260A CN 107920260 A CN107920260 A CN 107920260A CN 201610883971 A CN201610883971 A CN 201610883971A CN 107920260 A CN107920260 A CN 107920260A
- Authority
- CN
- China
- Prior art keywords
- user
- shutdown
- model
- contextual information
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44222—Analytics of user selections, e.g. selection of programs or purchase activity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
- H04N21/4665—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Social Psychology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The embodiment of the present invention provides a kind of digital cable customers behavior prediction method and device.This method includes:User watched contextual information is obtained, the contextual information includes:Fundamental type, programme attribute, rating period;According to the contextual information, user's shutdown model is determined;According to user shutdown model, it is predicted to the shutdown behavior of user of television set shutdown data can not be collected.The embodiment of the present invention is by obtaining user watched contextual information, contextual information, determine user's shutdown model, according to user's shutdown model, the shutdown behavior of user that television set shutdown data can not be collected is predicted, at the time of predicting user's closing television machine, at the time of according to user's closing television machine, determine the invalid data in the user audience data of set-top box passback, improve the accuracy that broadcasting and TV cable operator carries out audience rating investigating and user watched behavioural analysis.
Description
Technical field
The present embodiments relate to field of communication technology, more particularly to a kind of digital cable customers behavior prediction method and dress
Put.
Background technology
With the popularization of the acceleration of Two-way Reconstruction in CATV Network, and bidirectional digital television set top box, magnanimity family
The behavioral data of user's operation set-top box can be collected, and be back to back-end data storage server by acquisition system, real
The collection of existing mass users viewing behavior data.Meanwhile have benefited from the development of big data technology, on the one hand by audience rating investigating and
The sample space of analysis expands as total user, can obtain comprehensively accurately analysis result;On the other hand can also be directed to specific
Crowd carries out rating signature analysis, helps operator to adjust Operation Decision in real time, provides the rating service of personalization to the user, from
And improve user experience and increase operating income.
As long as however, bidirectional digital television set top box in the state of start, will in real time monitor and return user channel
Redirect, the use of interactive service and the page stop etc. behavior.And in real life, most of user habitually only closes
Television set, and set-top box is still in open state, at this time, set-top box may proceed to passback user audience data, it is clear that this portion
Divided data is invalid.This partial invalidity data can largely influence broadcasting and TV cable operator carry out audience rating investigating and
The accuracy of user watched behavioural analysis.
The content of the invention
The embodiment of the present invention provides a kind of digital cable customers behavior prediction method and device, with improve audience rating investigating and
The accuracy of user watched behavioural analysis.
The one side of the embodiment of the present invention is to provide a kind of digital cable customers behavior prediction method, including:
User watched contextual information is obtained, the contextual information includes:When fundamental type, programme attribute, rating
Section;
According to the contextual information, user's shutdown model is determined;
According to user shutdown model, carried out in advance to the shutdown behavior of user of television set shutdown data can not be collected
Survey.
The other side of the embodiment of the present invention is to provide a kind of digital cable customers behavior prediction device, including:
Acquisition module, for obtaining user watched contextual information, the contextual information includes:Fundamental type, section
Mesh attribute, rating period;
Determining module, for according to the contextual information, determining user's shutdown model;
Prediction module, for being shut down model according to the user, the user to television set shutdown data can not be collected
Shutdown behavior is predicted.
Digital cable customers behavior prediction method and device provided in an embodiment of the present invention, it is user watched upper by obtaining
Context information, contextual information, determines user's shutdown model, according to user's shutdown model, to that can not collect television set
The shutdown behavior of the user for data of shutting down is predicted, at the time of predicting user's closing television machine, according to user's closing television machine
At the time of, determine set-top box passback user audience data in invalid data, improve broadcasting and TV cable operator into
Row audience rating investigating and the accuracy of user watched behavioural analysis.
Brief description of the drawings
Fig. 1 is digital cable customers behavior prediction method flow diagram provided in an embodiment of the present invention;
Figure 1A is shutdown behavior prediction flow chart provided in an embodiment of the present invention;
Fig. 2 is shutdown duration distribution map provided in an embodiment of the present invention;
Fig. 3 is shutdown duration log series model figure provided in an embodiment of the present invention;
Fig. 4 A are the cumulative distribution figure of CCTV1 channels provided in an embodiment of the present invention shutdown duration distribution;
Fig. 4 B are the log series model figure of CCTV1 channels provided in an embodiment of the present invention shutdown duration distribution;
Fig. 5 A are the cumulative distribution figure of Hunan Satellite TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 5 B are the log series model figure of Hunan Satellite TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 6 A are the cumulative distribution figure of Beijing TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 6 B are the log series model figure of Beijing TV provided in an embodiment of the present invention shutdown duration distribution;
Fig. 7 A are provided in an embodiment of the present invention《Hungry game-ridicule bird (on)》The cumulative distribution of shutdown duration distribution
Figure;
Fig. 7 B are provided in an embodiment of the present invention《Hungry game-ridicule bird (on)》Shut down the log series model that duration is distributed
Figure;
Fig. 8 A are provided in an embodiment of the present invention《The Mi months pass》Shut down the cumulative distribution figure that duration is distributed;
Fig. 8 B are provided in an embodiment of the present invention《The Mi months pass》Shut down the log series model figure that duration is distributed;
Fig. 9 A are provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the cumulative distribution figure that duration is distributed;
Fig. 9 B are provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the log series model figure that duration is distributed;
Figure 10 is the schematic diagram of the result of the prior art being trained with a traditional regression tree;
Figure 11 is the schematic diagram of regression tree model provided in an embodiment of the present invention;
Figure 12 is showing for the experimental result that iteration traditional decision-tree provided in an embodiment of the present invention carries out shutdown model training
It is intended to;
Figure 13 is the structure chart of digital cable customers behavior prediction device provided in an embodiment of the present invention;
Figure 14 is the structure chart for the digital cable customers behavior prediction device that another embodiment of the present invention provides.
Embodiment
Iteration decision tree (Gradient Boosting Decision Tree, abbreviation GBDT) is called multiple accumulative recurrence
Set (Multiple Additive Regression Tree, abbreviation MART), be a kind of decision Tree algorithms of iteration, the algorithm
It is made of more decision trees, the conclusion of all trees, which adds up, does final result.It is just recognized together at the beginning of being suggested with SVM
To be the stronger algorithm of generalization ability.More caused everybody pass because the machine learning model of sequence is used to search in recent years
Note.
Gradient Boost are a frame in fact, and the inside can be inserted in many different algorithms.Boost is " lifting "
The meaning, general Boosting algorithms are all the processes of an iteration, and training new each time is provided to improve last
As a result.
Original Boost algorithms are at the beginning of algorithm, and a weighted value is assigned to for each sample, when initial
Wait, everybody is just as important.The model obtained in the training of each step, can cause being estimated to be to wrong for data point, I
Just after each the end of the step, increase the weight of the point of misclassification, reduce point to point weight, so that if some points are old
It is by misclassification, then a very high weight also will be just assigned to by " serious concern ".Etc. then carried out n times iteration (by
User specifies), it will obtain N number of simple grader (basic learner), then we they are combined (such as
Them can be weighted or allow them to vote etc. by saying), obtain a final model.
And Gradient Boost and the difference of traditional Boost are, calculating each time is to reduce the last time
Residual error (residual), and in order to eliminate residual error, one can be established on gradient (Gradient) direction of residual error reduction newly
Model.Thus, in Gradient Boost, the foundation of each new model is in order to enable the residual error of model is past before
Gradient direction is reduced, and very big difference is weighted to correct, mistake sample with traditional Boost.
In classification problem, there is a critically important content to be called Multi-Class Logistic, that is, more classification
Logistic problems, it is suitable for those classification numbers>2 the problem of, and in classification results, sample x is not certain only category
Sample x can be obtained in some class to be belonging respectively to the probability of multiple classes (alternatively it is several to meet some by the estimation y of sample x
What is distributed).Here just with a conclusion:If a classification problem meets geometry distribution, then can be become with Logistic
Bring the computing after carrying out.
Assuming that for a sample x, it may belong to K classification, its estimate is respectively F1 (x) ... FK (x),
Logistic conversion is as follows, and logistic conversion is one smooth and by the mistake of data normalization (so that the length of vector is 1)
Journey, as a result to belong to the Probability p of classification kk(x), as shown in formula (1):
For Logistic convert after as a result, loss function such as formula (2)
Wherein, yk is the estimate of the sample data of input, when a sample x belongs to classification k, yk=1, otherwise yk=
0。
Bring the Logistic formulas converted into loss function, and to its derivation, the gradient of loss function can be obtained,
As shown in formula (3):
Assuming that input data x may belong to 5 classification (be respectively 1,2,3,4,5), in training data, x belongs to classification 3,
Then y=(0,0,1,0,0), it is assumed that the F (x) that model is estimated=(0,0.3,0.6,0,0), then after Logistic is converted
Data p (x)=(0.16,0.21,0.29,0.16,0.16), y-p obtains gradient g:(-0.16,-0.21,0.71,-0.16,-
0.16).Observation can obtain a more interesting conclusion here:
Assuming that gk works as the gradient in certain one-dimensional (some classification) for sample:
gk>When 0, bigger its Probability p (x) on this is one-dimensional of expression should more improve, such as the third dimension above
Probability is 0.29, should just be improved, and belong to advance toward " being correctly oriented ";It is smaller to represent that " accurate " is got in this estimation.
gk<Smaller when 0, negative probability of more expressions on this is one-dimensional that obtain should reduce, such as the second dimension 0.21 just should
This is reduced.Belong to advance towards " opposite direction of mistake ";Bigger, bearing fewer must represent that this estimation is more " good
By mistake ".
Generally, for a sample, optimal gradient is the gradient closer to 0.So we will can allow letter
Several estimates enable to gradient toward opposite direction movement (>In 0 dimension, moved toward negative direction,<In 0 dimension, toward pros
To movement) finally cause gradient as far as possible=0, and the algorithm can seriously pay close attention to the bigger sample of those gradients, with Boost
The meaning it is similar.
After obtaining gradient, it is how to allow gradient to reduce.Here the method for an iteration+decision tree for being, when
When initialization, at will provide an estimation function F (x) (it is a random value that can allow F (x), can also allow F (x)=
0), then a decision tree is established afterwards per one step of iteration just according to the situation of the gradient of each current sample.Just allow function
Negative side toward gradient marches forward, and finally so that after iteration N steps, gradient is smaller.
Here the decision tree and common decision tree established are not quite alike, and first, this decision tree is a leaf node
Number J is fixed, and after J node is generated, does not just regenerate new node.
Fig. 1 is digital cable customers behavior prediction method flow diagram provided in an embodiment of the present invention.As shown in Figure 1, the party
Method comprises the following steps that:
Step S101, user watched contextual information is obtained, the contextual information includes:Fundamental type, program category
Property, rating period.
The user that the present embodiment is related to shuts down, and behavior is related with contextual information, and contextual information includes:Fundamental type, section
Mesh attribute, rating period;Wherein, fundamental type includes:Region and type of service, type of service specifically such as page browsing, point
Broadcast, is live, time shift and reviewing;Programme attribute includes direct broadcast band, program category such as film, TV play, variety, animation
Deng, program temperature, program duration etc.;The rating period includes broadcast time such as what day, live rating period.Specifically, ground
Domain, type of service, direct broadcast band, program category, program temperature, program duration, broadcast time, live rating period, this eight
Attribute can judge as behavioural characteristic.
Step S102, according to the contextual information, user's shutdown model is determined.
In the present embodiment, the basis of user's shutdown model foundation is:Set-top box can capture partial TV machine switch
The electricity of high-definition multimedia interface (High Definition Multimedia Interface, abbreviation HDMI) pin during machine
Flat change, returns TV power on and off data.
The effect of digital cable customers shutdown behavior prediction is, is left or closing television machine in user, and set-top box
In the case of still in open state, this partial invalidity data is distinguished from the user behavior data of passback, and estimate use
The family most possible shutdown moment.User's shutdown behavior is specifically defined as:When between the behavioral data time of user's operation set-top box
Every excessive, user leaves or the possibility of closing television machine during this period for estimation, and the most possible shutdown moment, so as to protect
Demonstrate,prove the validity of user watched behavioral statistics.
User's ring-closing metathesis is analyzed below, for example, certain Netcom of province crosses bidirectional digital television set top box and collects hundred
All behavioral datas of the general-purpose family during in March, 2016, type of service is related to live, program request, time shift, reviews and information
Deng.Wherein contain more than 300 ten thousand television sets shutdown data at the family of general-purpose more than 20.If define shutdown behavior to close with user
Time interval between the last item behavioral data before machine is as shutdown duration, then when can obtain more than 300 ten thousand shutdown
Long data, Fig. 2 are shutdown duration distribution map provided in an embodiment of the present invention, and abscissa represents shutdown duration, and ordinate, which represents, to close
Machine duration is located at the shutdown number in corresponding time interval, it can be seen that most of shutdown duration is less than 100 minutes.If to figure
Transverse and longitudinal coordinate in 2 is taken the logarithm, and obtained shutdown duration log series model figure shows that user is closed as shown in figure 3, approximate straight line
Machine duration meets Zipf distributions.
In addition, by experiment, it can also draw direct broadcast band distribution, programme televised live distribution, introduce separately below:
(1) direct broadcast band is distributed
Behavior before user's shutdown is live more than 90%, other behaviors such as program request, time shift, review and page browsing etc.
Less than 10%, in order to be better understood by the rule grown when the user turns it down, the present invention has counted to be grown in each live frequency when the user turns it down
Distribution situation on road, the present embodiment sets forth CCTV1, Hunan Satellite TV, the shutdown duration distribution map of Beijing TV, can be with
Find out that shutdown duration equally shows as the characteristic of Zipf distributions on each direct broadcast band.Fig. 4 A are provided in an embodiment of the present invention
The cumulative distribution figure of CCTV1 channels shutdown duration distribution, Fig. 4 B are CCTV1 channels provided in an embodiment of the present invention shutdown duration point
The log series model figure of cloth.The cumulative distribution figure that Fig. 5 A are distributed for Hunan Satellite TV provided in an embodiment of the present invention shutdown duration, Fig. 5 B
For the log series model figure of Hunan Satellite TV provided in an embodiment of the present invention shutdown duration distribution.Fig. 6 A are provided in an embodiment of the present invention
The cumulative distribution figure of Beijing TV shutdown duration distribution, Fig. 6 B are Beijing TV provided in an embodiment of the present invention shutdown duration distribution
Log series model figure.
(2) programme televised live is distributed
It is long when the user turns it down that the characteristic of Zipf distributions is not only showed on direct broadcast band, but also also have in programme televised live
Similar property, optionally, the present embodiment with film for example《Hungry game-ridicule bird (on)》, TV play for example《The Mi months pass》、
Variety show《Happy pleasure is overturned the heavens》Exemplified by, the duration distribution that user shuts down after these programs are watched is introduced, Fig. 7 A are the present invention
What embodiment provided《Hungry game-ridicule bird (on)》Shut down the cumulative distribution figure that duration is distributed, and Fig. 7 B are the embodiment of the present invention
There is provided《Hungry game-ridicule bird (on)》Shut down the log series model figure that duration is distributed.Fig. 8 A are provided in an embodiment of the present invention
《The Mi months pass》Shut down the cumulative distribution figure that duration is distributed, and Fig. 8 B are provided in an embodiment of the present invention《The Mi months pass》Shut down what duration was distributed
Log series model figure.Fig. 9 A are provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the cumulative distribution figure that duration is distributed, Fig. 9 B
To be provided in an embodiment of the present invention《Happy pleasure is overturned the heavens》Shut down the log series model figure that duration is distributed.As it can be seen that during contrast channel shutdown
Long and program shutdown duration, it can be clearly seen that, program shutdown duration will be much smaller than channel shutdown duration, when reason is program
Length is limited, and the time that the residence time that user enters certain channel obviously can rest in certain program than user is long.
In addition, user shuts down, behavior is heavily dependent on the contextual information of user watched state, such as:Service class
Type, user's shutdown possibility after browsing pages or viewing request program is smaller than viewing direct broadcast band, particularly for a long time
When resting on a certain channel;In addition, also rating period, the probability of rating peak period shutdown is obviously than the unexpected winner rating period
It is small.Therefore, user watched contextual information is classified as three classes by the present invention:Fundamental type, programme attribute, rating period, wherein
Fundamental type includes region and type of service (page browsing, program request, live, time shift and review);Programme attribute includes live
Channel, program category (film, TV play, variety, animation etc.), program temperature, program duration etc.;The rating period includes week
Several, live six period of rating etc..Specific classification is as shown in table 1:
Table 1
For the ease of statement, region represents that value is discrete integer (T with R>=1):Each numerical value uniquely corresponds to some
Districts and cities;Type of service represents that value is discrete integer (T={ 1-5 }) with T:1 representation page browses, 2 represent that program request, 3 represent straight
Broadcast, 4 represent that time shifts, 5 represent to review;Direct broadcast band represents that value is discrete integer (C with C>=1):Each numerical value is uniquely right
Some channel is answered, such as 1 represents CCTV1, and 2 represent CCTV2 etc.;Program category represents that value is discrete integer (P={ 1- with P
4}):1 represents that film, 2 represent that TV play, 3 represent that variety, 4 represent animation, has only carried out first-level class to program here, actual
Can further it be segmented in, such as film can be further subdivided into comedy, action, love;Program temperature represents with H,
Value is discrete integer (H>=1):Value is bigger, i.e. viewing number is more, illustrates that the program is more popular, under normal circumstances, uses
Family is interested in popular program, therefore the possibility shut down in the program playing duration is smaller;Program duration represents with L,
Value is continuous integral number (L>0);What day represents that value is discrete integer (W={ 1-7 }) with W:1 represents that Monday, 2 represent
Tuesday ..., 7 represent Sunday;Live six period of rating represents that value is discrete integer (1-6) with I:1 represents 0 point to 6
Point, 26 points to 9 points of expressions, 39 points to 12 points of expressions, 4 12 points to 15 points of expressions, 5 15 points to 19 points of expressions, 6 represent 19 points to 24
Point.Therefore, the user provided shuts down shown in model such as formula (4):
T=f (R, T, C, P, H, L, W, I) (4)
Wherein f is shutdown pattern function, and the present invention uses iteration decision Tree algorithms (Gradient Boosting
Decision Tree, abbreviation GBDT) model is trained, and the use of television set shutdown data can not be collected with the model prediction
The shutdown behavior at family.
Step S103, according to user shutdown model, the shutdown of the user to television set shutdown data can not be collected
Behavior is predicted.
As shown in Figure 1A, shutdown behavior prediction flow includes:Cleaning and arrangement generation input data, model training, prediction
As a result 3 stages, wherein, user behavior data, user property and matchmaker provide data and are input to Spark Distributed Computing Platforms, shape
Into input data, input data specifically include region, type of service, direct broadcast band, program category, program temperature, program duration,
Broadcast time, live rating period.According to test and training data scale parameter division input data, and by the training after division
Data are input to the training pattern based on iteration decision tree, in the training process, can be trained according to algorithm parameter, algorithm ginseng
Number includes purity and calculates and set depth selection, obtains final training pattern.Shutdown duration is returned according to training pattern
Prediction, obtains feedback result.
Decision tree includes categorised decision tree and regression tree, and the representative algorithm of wherein categorised decision tree is C4.5, mainly
For the prediction of more tag along sort values, the ups and downs of the gender, Spam Classification, stock market of such as user;And regression tree can
For predict real number value, the age of such as user, height, representative algorithm is GBDT, it at the beginning of being suggested just and SVM quilts
It is considered the most strong algorithm of generalization ability.
As a comparison, categorised decision tree is first said, C4.5 is all classification thresholds of each exhaustive feature in each branch
Value, finds so that according to characteristic value<=threshold value, and characteristic value>The feature and threshold value of the entropy maximum for two branches that threshold value is divided into,
Two new nodes are obtained according to the standard branch, continue branch until all samples are all divided into unique leaf with same method
Node, or reach default end condition, if the classification in final leaf node is not unique, using the classification of more numerical examples as
The classification of the leaf node.
The workflow of regression tree is similar, but can obtain one in each node (being not necessarily leaf node)
Predicted value, by taking the age as an example, which is equal to the average value at the owner's age for belonging to this node.Exhaustion is each during branch
All threshold values of a feature look for best cut-point, but it is no longer maximum entropy to weigh best standard, but minimize mean square deviation,
I.e., it is well understood that the number for being predicted error is more, wrong more goes against accepted conventions for this,
Mean square deviation is bigger, and most reliable branch foundation can be found by minimizing mean square deviation.Branch is until on each leaf node
The age of people is all unique or reaches default end condition (such as leaf number upper limit), if on final leaf node people age
It is not unique, then prediction age with proprietary average age on the node as the leaf node.
Iteration decision tree GBDT (Gradient BoostDecision Tree) is calculated as the representativeness of regression tree
Method, is with the difference of traditional regression tree:GBDT is set come Shared Decision Making for more by Gradient Iteration, each
The input of tree be before all trees conclusion sum residual error, this residual error be exactly before all trees the sum of prediction result and reality
As a result difference.By taking the age is predicted as an example, if training set only has 4 people, first, second, and third, fourth, their age is 8,22 respectively,
26,40.Wherein first, second is to read student;Thirdth, fourth is company personnel.If trained with a traditional regression tree, obtain
Arrive that the results are shown in Figure 10.
The regression tree model that GBDT is trained on identical sample space is as shown in figure 11.Can from Figure 11
Go out, the one tree of GBDT is as the first layer branch of Figure 10, since the first and second ages are close, the third fourth age is close, Ta Menfen
The left and right node of tree is not assigned to, predicted value of the average age per node as one tree.The residual error difference obtained at this time
For first=- 7, second=7, the third=- 7, fourth=7, then, the input sample of one tree are substituted with residual error, can obtain second
Tree, it is respectively first=0 to be trained to obtain new residual error with new feature, second=0, the third=0, fourth=0, it is clear that by second
Tree iterative learning after, residual error has been kept to 0 (being difficult to realize under truth), and at this moment we can utilize trained model
It is predicted.I.e.:
First:The student of 8 years old, likes playing computer game, the prediction age is 15+ (- 7)=8
Second:The student of 22 years old, likes playing mobile phone games, the prediction age is 15+7=22
Third:The employee of 26 years old, likes playing computer game, the prediction age is 33+ (- 7)=26
Fourth:The employee of 40 years old, likes playing mobile phone games, the prediction age is 33+7=40
For the embodiment of the present invention by obtaining user watched contextual information, contextual information, determines that user shuts down
Model, according to user's shutdown model, is predicted to that can not collect the shutdown behavior of user of television set shutdown data, predicts
At the time of user's closing television machine, at the time of according to user's closing television machine, the user watched behavior of set-top box passback is determined
Invalid data in data, improves broadcasting and TV cable operator and carries out the accurate of audience rating investigating and user watched behavioural analysis
Property.
In addition, in order to illustrate the method for above-described embodiment, the embodiment of the present invention obtains the institute in March, 2016 from certain province's net
There is the behavioral data of bidirectional digital television set top box user, including user's browsing pages, live, program request, time shift and return
The business such as see.Total amount of data more than 300G, the moon active users up to more than million, nearly 300,000 set-top box can upload TV organ
Machine behavior, amounts to more than 300 ten thousand shutdown records.First, the behavioral data of magnanimity is carried out using Spark distributed proccessings
Pretreatment, and therefrom extract feature shown in the every corresponding table 1 of shutdown data, the present invention only taken region, type of service, into
The rating period of business, the rating period at moment of shutting down, what day these feature entered, and these behavioural characteristics are identical
Shutdown duration be averaging to obtain the shutdown durations of the shutdown data with this category feature, while the identical shutdown of feature
Number of data is as newly-increased feature.Nearly 5000 samples are obtained with this.Finally, sample space is divided into training set and test
Collection, wherein training set includes 80% sample, and shutdown model instruction is carried out using the iteration traditional decision-tree (GBDT) described in upper section
Practice, then with the shutdown duration of sample in trained model prediction test set.Experimental result is as shown in figure 12.
Grey filled lines represent the actual value of test sample shutdown duration in figure, when shutdown is pressed easy to observe, in drawing course
Length is ranked up from small to large, and solid black lines represent the corresponding prediction result of shutdown model, it can be seen that predicted value is around actual
Value fluctuation, but global error is smaller, and absolute error illustrated that prediction was more accurate within 20 minutes.
The present embodiment is based on cable digital TV user when terminating to watch, and gets used to a closing television machine and have ignored pass
The universal phenomenon of set-top box is closed, the present invention indicates that the user behavior data of set-top box passback during this period largely can
Influence accuracy of the radio and TV operator to the statistic analysis result of the audience ratings index such as user watched behavior and program and channel.
Therefore, the present invention proposes television set shutdown model, and is trained using iteration decision Tree algorithms, realizes and is determined based on iteration
The digital cable customers shutdown behavior prediction of plan tree, compensate for currently being difficult to the vacancy for gathering user's shutdown data, which exists
Certain saves and has obtained preferable prediction result on the truthful data that net provides, so as to ensure audience rating investigating and user watched behavior point
The accuracy of analysis.
Figure 13 is the structure chart of digital cable customers behavior prediction device provided in an embodiment of the present invention.The embodiment of the present invention
The digital cable customers behavior prediction device of offer can perform the place of digital cable customers behavior prediction embodiment of the method offer
Flow is managed, as shown in figure 13, digital cable customers behavior prediction device includes acquisition module 131, determining module 132, prediction mould
Block 133;Acquisition module 131 is used to obtain user watched contextual information, and the contextual information includes:Fundamental type, section
Mesh attribute, rating period;Determining module 132 is used for according to the contextual information, determines user's shutdown model;Prediction module
133 are used for according to user shutdown model, are carried out in advance to that can not collect the shutdown behavior of user of television set shutdown data
Survey.
For the embodiment of the present invention by obtaining user watched contextual information, contextual information, determines that user shuts down
Model, according to user's shutdown model, is predicted to that can not collect the shutdown behavior of user of television set shutdown data, predicts
At the time of user's closing television machine, at the time of according to user's closing television machine, the user watched behavior of set-top box passback is determined
Invalid data in data, improves broadcasting and TV cable operator and carries out the accurate of audience rating investigating and user watched behavioural analysis
Property.
Figure 14 is the structure chart for the digital cable customers behavior prediction device that another embodiment of the present invention provides.Figure 13's
On the basis of, digital cable customers behavior prediction device further includes:Training module 134, for using iteration decision Tree algorithms, training
User's shutdown model.
Acquisition module 131 is additionally operable to obtain the behavioral data time interval of user's operation set-top box;Determining module 132 is also used
In:If the time interval is more than threshold value, according to the user shut down model, determine user's closing television machine when
Carve.
In addition, digital cable customers behavior prediction device further includes sort module 135, sort module 135 is used for by described in
Contextual information is divided into sample data and test data;Determining module 132 is specifically used for according to the sample data, determines user
Shut down model;Prediction module 133 is specifically used for according to user shutdown model, to the shutdown row of the user of the test data
To be predicted.
In addition, stating fundamental type includes region and type of service;The programme attribute include direct broadcast band, program category,
Program temperature, program duration;The rating period includes broadcast time, live rating period.
Digital cable customers behavior prediction device provided in an embodiment of the present invention can be specifically used for the above-mentioned Fig. 1 of execution and be carried
The embodiment of the method for confession, details are not described herein again for concrete function.
The embodiment of the present invention is based on cable digital TV user when terminating to watch, and gets used to a closing television machine and ignores
The universal phenomenon of set-top box is closed, the present invention indicates the user behavior data of set-top box passback during this period largely
On can influence standard of the radio and TV operator to the statistic analysis result of the audience ratings index such as user watched behavior and program and channel
True property.Therefore, the present invention proposes television set shutdown model, and is trained using iteration decision Tree algorithms, realizes and is based on
The digital cable customers shutdown behavior prediction of iteration decision tree, compensate for currently being difficult to the vacancy for gathering user's shutdown data, should
Model has obtained preferable prediction result on the truthful data that certain province's net provides, so as to ensure audience rating investigating and user watched
The accuracy of behavioural analysis.
In conclusion the embodiment of the present invention, by obtaining user watched contextual information, contextual information, determines
User's shutdown model, according to user's shutdown model, carries out to that can not collect the shutdown behavior of user of television set shutdown data
Prediction, at the time of predicting user's closing television machine, at the time of according to user's closing television machine, determines the user of set-top box passback
Invalid data in viewing behavior data, improves broadcasting and TV cable operator and carries out audience rating investigating and user watched behavioural analysis
Accuracy;Based on cable digital TV user when terminating to watch, get used to a closing television machine and have ignored closing machine top
The universal phenomenon of box, the present invention indicate that the user behavior data of set-top box passback during this period can largely influence extensively
Accuracy of the electric operator to the statistic analysis result of the audience ratings index such as user watched behavior and program and channel.Therefore,
The present invention proposes television set shutdown model, and is trained using iteration decision Tree algorithms, realizes based on iteration decision tree
Digital cable customers shutdown behavior prediction, compensate for currently being difficult to gathering user and shut down the vacancies of data, which saves at certain
Net and obtained preferable prediction result on the truthful data provided, so as to ensure audience rating investigating and user watched behavioural analysis
Accuracy.
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, can pass through it
Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only
Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied
Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be the INDIRECT COUPLING or logical by some interfaces, device or unit
Letter connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention
The part steps of embodiment the method.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various
Can be with the medium of store program codes.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each function module
Division progress for example, in practical application, can be complete by different function modules by above-mentioned function distribution as needed
Into the internal structure of device being divided into different function modules, to complete all or part of function described above.On
The specific work process of the device of description is stated, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe is described in detail the present invention with reference to foregoing embodiments, it will be understood by those of ordinary skill in the art that:Its according to
Can so modify to the technical solution described in foregoing embodiments, either to which part or all technical characteristic into
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme.
Claims (10)
- A kind of 1. digital cable customers behavior prediction method, it is characterised in that including:User watched contextual information is obtained, the contextual information includes:Fundamental type, programme attribute, rating period;According to the contextual information, user's shutdown model is determined;According to user shutdown model, it is predicted to the shutdown behavior of user of television set shutdown data can not be collected.
- 2. according to the method described in claim 1, it is characterized in that, described according to the contextual information, determine that user shuts down After model, further include:Using iteration decision Tree algorithms, training user's shutdown model.
- 3. according to the method described in claim 2, it is characterized in that, it is described according to the user shut down model, to that can not gather Shutdown behavior to the user of television set shutdown data is predicted, including:Obtain the behavioral data time interval of user's operation set-top box;If the time interval is more than threshold value, according to the user shut down model, determine user's closing television machine when Carve.
- 4. according to the method described in claim 1, it is characterized in that, it is described obtain user watched contextual information after, also Including:The contextual information is divided into sample data and test data;It is described to determine that user shuts down model according to the contextual information, including:According to the sample data, user's shutdown model is determined;It is described to be shut down model according to the user, to can not collect the shutdown behavior of user of television set shutdown data carry out it is pre- Survey, including:According to user shutdown model, the shutdown behavior to the user of the test data is predicted.
- 5. according to claim 1-4 any one of them methods, it is characterised in that the fundamental type includes region and service class Type;The programme attribute includes direct broadcast band, program category, program temperature, program duration;The rating period includes broadcast time, live rating period.
- A kind of 6. digital cable customers behavior prediction device, it is characterised in that including:Acquisition module, for obtaining user watched contextual information, the contextual information includes:Fundamental type, program category Property, rating period;Determining module, for according to the contextual information, determining user's shutdown model;Prediction module, for according to user shutdown model, the shutdown of the user to television set shutdown data can not be collected Behavior is predicted.
- 7. digital cable customers behavior prediction device according to claim 6, it is characterised in that further include:Training module, for using iteration decision Tree algorithms, training user's shutdown model.
- 8. digital cable customers behavior prediction device according to claim 7, it is characterised in that the acquisition module is also used In the behavioral data time interval for obtaining user's operation set-top box;The determining module is additionally operable to:If the time interval is more than threshold value, according to user shutdown model, determine described At the time of user's closing television machine.
- 9. digital cable customers behavior prediction device according to claim 6, it is characterised in that further include:Sort module, for the contextual information to be divided into sample data and test data;The determining module is specifically used for according to the sample data, determines user's shutdown model;The prediction module is specifically used for being shut down model according to the user, the shutdown behavior to the user of the test data into Row prediction.
- 10. according to claim 6-9 any one of them digital cable customers behavior prediction devices, it is characterised in that the base This type includes region and type of service;The programme attribute includes direct broadcast band, program category, program temperature, program duration;The rating period includes broadcast time, live rating period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610883971.4A CN107920260A (en) | 2016-10-10 | 2016-10-10 | Digital cable customers behavior prediction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610883971.4A CN107920260A (en) | 2016-10-10 | 2016-10-10 | Digital cable customers behavior prediction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107920260A true CN107920260A (en) | 2018-04-17 |
Family
ID=61892670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610883971.4A Pending CN107920260A (en) | 2016-10-10 | 2016-10-10 | Digital cable customers behavior prediction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107920260A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109769146A (en) * | 2018-12-25 | 2019-05-17 | 国家新闻出版广电总局广播电视规划院 | The determination method and device of broadcast TV program audience ratings |
CN110110191A (en) * | 2019-03-28 | 2019-08-09 | 北京奇艺世纪科技有限公司 | Search processing method and device and computer readable storage medium |
CN111246294A (en) * | 2020-01-06 | 2020-06-05 | 国家广播电视总局广播电视规划院 | Method, device, equipment and storage medium for processing audience rating index data |
CN112507420A (en) * | 2020-11-19 | 2021-03-16 | 同济大学 | System for constructing personal personalized environment control behavior prediction model training set in office building |
CN113727192A (en) * | 2020-06-19 | 2021-11-30 | 天翼智慧家庭科技有限公司 | Method and system for collecting viewing behaviors |
US20220070535A1 (en) * | 2020-09-01 | 2022-03-03 | Comscore, Inc. | Predictive detection of real-time and future viewability |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102137293A (en) * | 2010-12-31 | 2011-07-27 | 华为技术有限公司 | Resource allocation method, user business terminal and head end system of streaming media service |
US20130055329A1 (en) * | 2006-03-29 | 2013-02-28 | At&T Intellectual Property I, L.P. | Close-Captioning Uniform Resource Locator Capture System and Method |
CN103260061A (en) * | 2013-05-24 | 2013-08-21 | 华东师范大学 | Context-perceptive IPTV program recommending method |
CN103297814A (en) * | 2013-06-28 | 2013-09-11 | 百视通新媒体股份有限公司 | Television viewing rate assessment method and system based on internet protocol television (IPTV) |
CN103329559A (en) * | 2011-04-06 | 2013-09-25 | 兰屈克有限公司 | Method and system for detecting non-powered video playback devices |
CN104038822A (en) * | 2014-07-02 | 2014-09-10 | 程振国 | Automatic standby achieving method of digital television set top box |
CN104349193A (en) * | 2014-11-11 | 2015-02-11 | 无锡科思电子科技有限公司 | Automatic power-off set-top box based on sleep recognition |
US20150215566A1 (en) * | 2014-01-30 | 2015-07-30 | Vizio Inc | Predictive time to turn on a television based on previously used program schedules |
CN104980800A (en) * | 2014-04-04 | 2015-10-14 | 北京秒针信息咨询有限公司 | Method and system for monitoring startup/shutdown state of television |
-
2016
- 2016-10-10 CN CN201610883971.4A patent/CN107920260A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130055329A1 (en) * | 2006-03-29 | 2013-02-28 | At&T Intellectual Property I, L.P. | Close-Captioning Uniform Resource Locator Capture System and Method |
CN102137293A (en) * | 2010-12-31 | 2011-07-27 | 华为技术有限公司 | Resource allocation method, user business terminal and head end system of streaming media service |
CN103329559A (en) * | 2011-04-06 | 2013-09-25 | 兰屈克有限公司 | Method and system for detecting non-powered video playback devices |
CN103260061A (en) * | 2013-05-24 | 2013-08-21 | 华东师范大学 | Context-perceptive IPTV program recommending method |
CN103297814A (en) * | 2013-06-28 | 2013-09-11 | 百视通新媒体股份有限公司 | Television viewing rate assessment method and system based on internet protocol television (IPTV) |
US20150215566A1 (en) * | 2014-01-30 | 2015-07-30 | Vizio Inc | Predictive time to turn on a television based on previously used program schedules |
CN104980800A (en) * | 2014-04-04 | 2015-10-14 | 北京秒针信息咨询有限公司 | Method and system for monitoring startup/shutdown state of television |
CN104038822A (en) * | 2014-07-02 | 2014-09-10 | 程振国 | Automatic standby achieving method of digital television set top box |
CN104349193A (en) * | 2014-11-11 | 2015-02-11 | 无锡科思电子科技有限公司 | Automatic power-off set-top box based on sleep recognition |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109769146A (en) * | 2018-12-25 | 2019-05-17 | 国家新闻出版广电总局广播电视规划院 | The determination method and device of broadcast TV program audience ratings |
CN110110191A (en) * | 2019-03-28 | 2019-08-09 | 北京奇艺世纪科技有限公司 | Search processing method and device and computer readable storage medium |
CN110110191B (en) * | 2019-03-28 | 2021-05-25 | 北京奇艺世纪科技有限公司 | Search processing method and apparatus, and computer-readable storage medium |
CN111246294A (en) * | 2020-01-06 | 2020-06-05 | 国家广播电视总局广播电视规划院 | Method, device, equipment and storage medium for processing audience rating index data |
CN113727192A (en) * | 2020-06-19 | 2021-11-30 | 天翼智慧家庭科技有限公司 | Method and system for collecting viewing behaviors |
CN113727192B (en) * | 2020-06-19 | 2023-09-12 | 天翼数字生活科技有限公司 | Method and system for collecting viewing behaviors |
US20220070535A1 (en) * | 2020-09-01 | 2022-03-03 | Comscore, Inc. | Predictive detection of real-time and future viewability |
CN112507420A (en) * | 2020-11-19 | 2021-03-16 | 同济大学 | System for constructing personal personalized environment control behavior prediction model training set in office building |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107920260A (en) | Digital cable customers behavior prediction method and device | |
CN110704674B (en) | Video playing integrity prediction method and device | |
CN103559206B (en) | A kind of information recommendation method and system | |
Hazrati et al. | Recommender systems effect on the evolution of users’ choices distribution | |
Papadamou et al. | Understanding the incel community on youtube | |
CN110442790A (en) | Recommend method, apparatus, server and the storage medium of multi-medium data | |
CN103731738A (en) | Video recommendation method and device based on user group behavioral analysis | |
CN103546773A (en) | Television program recommendation method and system | |
CN109511015B (en) | Multimedia resource recommendation method, device, storage medium and equipment | |
CN107832437A (en) | Audio/video method for pushing, device, equipment and storage medium | |
Li et al. | Leave no user behind: Towards improving the utility of recommender systems for non-mainstream users | |
CN110430476A (en) | Direct broadcasting room searching method, system, computer equipment and storage medium | |
CN108650532B (en) | Cable television on-demand program recommendation method and system | |
CN107894998A (en) | Video recommendation method and device | |
CN107451148A (en) | Video classification method and device and electronic equipment | |
CN112464100B (en) | Information recommendation model training method, information recommendation method, device and equipment | |
US20190082206A1 (en) | Systems and Methods for Predicting Audience Measurements of a Television Program | |
CN107846611A (en) | A kind of TV programme method for pushing and system based on age bracket | |
CN107864405A (en) | A kind of Forecasting Methodology, device and the computer-readable medium of viewing behavior type | |
Cremonesi et al. | Time-evolution of IPTV recommender systems | |
CN107977445A (en) | Application program recommends method and device | |
CN109508407A (en) | The tv product recommended method of time of fusion and Interest Similarity | |
CN106604068B (en) | A kind of method and its system of more new media program | |
Prestianta | Mapping the ASEAN YouTube uploaders | |
CN110213660A (en) | Distribution method, system, computer equipment and the storage medium of program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180417 |
|
RJ01 | Rejection of invention patent application after publication |