CN107180246A - A kind of IPTV user's report barrier data synthesis method based on mixed model - Google Patents
A kind of IPTV user's report barrier data synthesis method based on mixed model Download PDFInfo
- Publication number
- CN107180246A CN107180246A CN201710247904.8A CN201710247904A CN107180246A CN 107180246 A CN107180246 A CN 107180246A CN 201710247904 A CN201710247904 A CN 201710247904A CN 107180246 A CN107180246 A CN 107180246A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- msub
- data
- msup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Strategic Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention discloses a kind of IPTV user's report barrier data synthesis method based on mixed model, this method is used to solve not analyze and process a few sample in existing minority class data synthesis method, the defect for the subsequent classification forecast model hydraulic performance decline for directly generating new samples and causing, the present invention extracts user's report barrier data set first from the data that IPTV set top box is collected, the distribution of the data set is represented with mixed model, then complete initialization and the parameter Estimation of model parameter, finally using the mixed model established, new IPTV user's report barrier data are generated.Using the method for the present invention, user's report barrier data characteristic with unbalance response can preferably be held, produced new IPTV user reports barrier data more representative and classification discrimination, and can preferably lift follow-up user's report barrier classification with unbalance response, the performance of prediction.
Description
Technical field
The present invention relates to a kind of IPTV user's report barrier data synthesis method based on mixed model, belong at unbalanced data
Manage technical field.
Background technology
With the development of Internet technology, increasing user begins to use IPTV service.IPTV operators are also exerting
Power provides the user higher quality and transmission more stable streaming media video service.When the user's body in video traffic is checked the quality
When amount declines or be not good, user can propose report barrier to operator.In other words, the report barrier of user and the Quality of experience of user are close
It is related.If operator can shift to an earlier date the report barrier behavior of the user of precise and high efficiency, and can in the solution IPTV networks that take measures
Energy produced problem, then just can effectively reduce the report barrier behavior of user.Therefore, the analysis of the report barrier related data of user
And the prediction of behavior is most important for operator.
In systems in practice, the user of report barrier ratio shared in overall user is relatively small, in other words, report barrier production
Raw probability will be well below the normal probability of Consumer's Experience.Therefore, the report for user hinders prediction task, data set tool
There is unbalance response.Unbalanced dataset refers to that the class data in data set are substantially few more many than other class data.At this
In, report barrier (minority class sample) data volume of user will hinder the data volume of (more several classes of samples) far less than non-report.For so
Situation, traditional two graders generally train obtained grader to have preference when handling unbalanced data so that majority
Class prediction is with very high accuracy, and then accuracy is very low for minority class.In the method for processing unbalanced dataset, lead to
The normal method based on sampling, passes through the distribution of change data collection so that unbalanced dataset becomes the data set of balance.
Most of existing methods are to handle injustice by the way of directly new minority class sample is generated from available sample
The data that weigh, such as Synthetic Minority Oversampling Technique (SMOTE) method.These modes compare
Intuitively, but be due to its deeply excavate minority class sample distribution character, its produce sample not only not necessarily help
In classification, reaction often is played to classification, to find out its cause, the new minority class sample of generation is not representative, thus
IPTV user's report barrier prediction can not be preferably applied to.
The content of the invention
The present invention seeks to the defect for being to solve IPTV user's report barrier Data processing, it is proposed that one kind is based on hybrid guided mode
IPTV user's report barrier data synthesis method of type, this method is (a small number of for describing IPTV user's report barrier data using mixed model
Class sample) distribution situation, first, according to existing user report barrier data set up graph model structure, then carry out parameter Estimation, most
Eventually using the model for estimating parameter, new user's report barrier data are produced, so that unbalanced data becomes relative equilibrium.
The technical scheme adopted by the invention to solve the technical problem is that:A kind of IPTV user's report barrier based on mixed model
Data synthesis method, this method comprises the following steps:
Step 1:If the user experience data collection obtained by IPTV set top box isWherein xiBy eight attributes
Composition, be respectively:Packet loss (LOSSRATE), set top box downloading rate (DOWN_BANDWIDH), video downloading rate
(MEDIARATE), transmission delay (MDI_DF), transmission of video packet loss (MDIMLR), network transmission quality (VSTQ), video
MOS points of (MOS_VALUE), cpu busy percentages (CPU_USAGE).Choose wherein user's report barrier mark(yi=0 represents
User does not report barrier, yi=1 represents user's report barrier) it is used as IPTV user's report barrier data set for 1 data
Xalm∈ X, if a total of N number of data.
Step 2:X is represented from mixed modelalmDistribution, its probability-distribution function expression formula is as follows:
Wherein,For the parameter set of mixed model.The model is made up of K Gaussian Profile, μk,Σk
The mean value vector and covariance matrix of respectively k-th Gaussian Profile;πkRepresent XalmCome from the probability of k-th of Gaussian Profile,
It meets 0≤πk≤ 1 He
For the ease of expression and follow-up parameter Estimation, for each dataIntroducing one is corresponding random
Variable zi, the value mode (taking some integer in 1~K) of variable use " 1-of-K ".By formula (1),And ziJoint
ProbabilityIt is expressed as:
Wherein, p (zi=k)=πk,
Step 3:Initialize the parameter of mixed model.Randomly selectK data in data set, it is assumed that
The data of taking-up are that k-th of data isIn XalmIt is middle searching withC nearest node of Euclidean distanceComposition withRelated data setThe average and covariance of the data set are tried to achieve, respectively
It is used as the mean μ of k-th of composition in mixed modelkWith covariance matrix ΣkInitial value, πkInitial value take 1/K.With this side
Formula obtains initial parameter collectionIteration count variable t=1, iteration total degree is set to T.
Step 4:Utilize XalmParameter Estimation is carried out to mixed model, interative computation step is as follows:
(4-1) utilizes existing parameter setAnd Xalm, calculate givenUnder conditions of,
zi=k probability γ (i, k):
(4-2), using γ (i, k), the parameter set for updating mixed model isCalculating process is such as
Under:
Wherein,
(4-3) calculates the log-likelihood function value LIK under current iteration(t), i.e.,:
If LIK(t)-LIK(t-1)≤ δ, then stop iteration, exports current parameter set Θ(t);Otherwise, t=t+1, continues
Run (4-1), into next iteration process.If iterations has reached T, also terminate iterative process.
Step 5:Using the mixed model estimated, new IPTV user's report barrier data are generated, its collection is combined into (Xalm)'.If
The data volume for needing generation is N ', and generating process is as follows:
(5-1) obeys equally distributed random number ε between randomly generating one 0 to 1.
(5-2) is if ε ∈ [0, π1], then produce a Gaussian distributed N (μ1,Σ1) sample;IfK=2 ..., K-1, then produce a Gaussian distributed N (μk,Σk) sample;IfThen produce a Gaussian distributed N (μK,ΣK) sample.
(5-3) (5-1) and (5-2) N ' that repeats the above steps is secondary, obtains (Xalm)'.Final IPTV user's report hinders data set and isFor follow-up IPTV user's report barrier Forecasting Methodology.
Beneficial effect:
1. the present invention solves go out in IPTV user's report barrier prediction task well by producing IPTV user's report barrier data
The classification of existing unbalanced data, the problem of predict not accurate enough.
2. the present invention has modeled the distribution of IPTV user's report barrier data, the spy that the data are held well using mixed model
Property, compared with traditional method, new IPTV user's report barrier data produced by the present invention are more representative and classificatory
Discrimination.
3. the present invention can avoid the data overlap during the minority class sample generation that traditional SMOTE methods are brought
Problem.
Brief description of the drawings
Fig. 1 is IPTV unbalanced datas Treatment Analysis of the invention and prediction flow chart.
Fig. 2 is the present invention and traditional algorithm performance comparison figure.
Embodiment
The invention is described in further detail with reference to Figure of description.
As shown in figure 1, the invention provides a kind of IPTV user's report barrier data synthesis method based on mixed model, the party
Method comprises the following steps:
Step 1:If the user experience data collection obtained by IPTV set top box isWherein xiBy eight attributes
Composition, be respectively:Packet loss (LOSSRATE), set top box downloading rate (DOWN_BANDWIDH), video downloading rate
(MEDIARATE), transmission delay (MDI_DF), transmission of video packet loss (MDIMLR), network transmission quality (VSTQ), video
MOS points of (MOS_VALUE), cpu busy percentages (CPU_USAGE).Choose wherein user's report barrier mark(ALARM) it is 1
Data be used as IPTV user report barrier data setXalm∈ X, a total of N number of data.
Step 2:X is represented from gauss hybrid models (GMM)almDistribution, its probability-distribution function expression formula is as follows:
Wherein,For the parameter set of mixed model.The model is made up of K Gaussian Profile, μk,Σk
The mean value vector and covariance matrix of respectively k-th Gaussian Profile;πkRepresent XalmCome from the probability of k-th of Gaussian Profile,
It meets 0≤πk≤ 1 He
For the ease of expression and follow-up parameter Estimation, for each dataIntroducing one is corresponding random
Variable zi, the value mode (taking some integer in 1~K) of variable use " 1-of-K ".By formula (1),And ziJoint
ProbabilityIt is expressed as:
Wherein, p (zi=k)=πk,
Step 3:Initialize the parameter of mixed model.Randomly selectK data in data set, it is assumed that
The data of taking-up are that k-th of data isIn XalmIt is middle searching withC nearest node of Euclidean distanceComposition withRelated data setThe average and covariance of the data set are tried to achieve, respectively
It is used as the mean μ of k-th of composition in mixed modelkWith covariance matrix ΣkInitial value, πkInitial value take 1/K.With this side
Formula obtains initial parameter collectionIteration count variable t=1, iteration total degree is set to T.
Step 4:Utilize XalmParameter Estimation is carried out to mixed model, interative computation step is as follows:
(4-1) utilizes existing parameter setAnd Xalm, calculate givenUnder conditions of,
zi=k probability γ (i, k):
(4-2), using γ (i, k), the parameter set for updating mixed model isCalculating process is such as
Under:
Wherein,
(4-3) calculates the log-likelihood function value LIK under current iteration(t), i.e.,:
If LIK(t)-LIK(t-1)≤ δ, then stop iteration, exports current parameter set Θ(t);Otherwise, t=t+1, continues
Run (4-1), into next iteration process.If iterations has reached T, also terminate iterative process.
Step 5:Using the mixed model estimated, new IPTV user's report barrier data set (X is generatedalm)'.If needing life
Into data volume be N ', generating process is as follows:
(5-1) obeys equally distributed random number ε between randomly generating one 0 to 1.
(5-2) is if ε ∈ [0, π1], then produce a Gaussian distributed N (μ1,Σ1) sample;IfK=2 ..., K-1, then produce a Gaussian distributed N (μk,Σk) sample;IfThen produce a Gaussian distributed N (μK,ΣK) sample.
(5-3) (5-1) and (5-2) N ' that repeats the above steps is secondary, obtains (Xalm)'.Final IPTV user's report hinders data set and isFor follow-up IPTV user's report barrier Forecasting Methodology.
As shown in Fig. 2 the present invention reports in order to which the IPTV user based on mixed model designed by the present invention is better described
Hinder data synthesis method advantage, will using the present invention designed by method produce IPTV user report barrier data application in
User's report barrier prediction of IPTV system.In prediction grader is used as from naive Bayesian.By using designed by the present invention
IPTV user's report barrier that method is produced predicts the outcome and (represented with GMM) and do not handle (no-SMOTE), Borderline-
SMOTE, Kmeans-SMOTE method are compared, so as to evaluate and weigh method involved in the present invention validity and
Accuracy.When user's report barrier (minority class sample) quantity and user do not report the quantitative proportion of barrier (more several classes of samples) to reach 1:89
When, method G values proposed by the present invention reach 0.5982, are higher than traditional SMOTE methods, and concrete outcome is as shown in Figure 2.G values
It is more high, illustrate that the degree of accuracy for user's report barrier prediction with a small number of class features is higher.Test result indicates that using the present invention
Designed IPTV user's report barrier data synthesis method, the classification for significantly improving existing uneven IPTV data sets is predictive
Energy.
Claims (3)
1. a kind of IPTV user's report barrier data synthesis method based on mixed model, it is characterised in that methods described following steps:
Step 1:If the user experience data collection obtained by IPTV set top box isWherein xiBy eight set of properties
Into respectively:Packet loss (LOSSRATE), set top box downloading rate (DOWN_BANDWIDH), video downloading rate
(MEDIARATE), transmission delay (MDI_DF), transmission of video packet loss (MDIMLR), network transmission quality (VSTQ), video
MOS points of (MOS_VALUE), cpu busy percentages (CPU_USAGE);Choose wherein user's report barrier mark(ALARM) it is 1
Data be used as IPTV user report barrier data setXalm∈ X, a total of N number of data;
Step 2:X is represented from mixed modelalmDistribution, its probability-distribution function expression formula includes:
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msub>
<mi>&pi;</mi>
<mi>k</mi>
</msub>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>&mu;</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>&Sigma;</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Wherein,For the parameter set of mixed model;The model is made up of K Gaussian Profile, μk,ΣkRespectively
For the mean value vector and covariance matrix of k-th Gaussian Profile;πkRepresent XalmCome from the probability of k-th of Gaussian Profile, it is expired
0≤π of footk≤ 1 He
Step 3:Randomly selectK data in data set, it is assumed that k-th of data of taking-up are
XalmIt is middle searching withC nearest node of Euclidean distanceComposition withRelated locality setThe average and covariance of the locality set are tried to achieve, respectively as the mean μ of k-th of composition in mixed modelk
With covariance matrix ΣkInitial value;πkInitial value take 1/K;Initial parameter collection is obtained in this wayIteration count variable t=1, iteration total degree is set to T;
Step 4:Utilize XalmParameter Estimation is carried out to mixed model, including:
(4-1) utilizes existing parameter setAnd Xalm, calculate givenUnder conditions of, zi=k
Probability γ (i, k):
<mrow>
<mi>&gamma;</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</msubsup>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>=</mo>
<mfrac>
<mrow>
<msubsup>
<mi>&pi;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msubsup>
<mi>&mu;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>&Sigma;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</msubsup>
<msubsup>
<mi>&pi;</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msubsup>
<mi>&mu;</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>&Sigma;</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>;</mo>
</mrow>
(4-2), using γ (i, k), the parameter set for updating mixed model isCalculating process is as follows:
<mrow>
<msubsup>
<mi>&mu;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>N</mi>
<mi>k</mi>
</msub>
</mfrac>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mi>&gamma;</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>&CenterDot;</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>,</mo>
</mrow>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>N</mi>
<mi>k</mi>
</msub>
</mfrac>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mi>&gamma;</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>&mu;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>&mu;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mi>T</mi>
</msup>
<mo>,</mo>
</mrow>
<mrow>
<msubsup>
<mi>&pi;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>=</mo>
<mfrac>
<msub>
<mi>N</mi>
<mi>k</mi>
</msub>
<mi>N</mi>
</mfrac>
<mo>,</mo>
</mrow>
Wherein,
(4-3) calculates the log-likelihood function value LIK under current iteration(t), i.e.,:
<mrow>
<msup>
<mi>LIK</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
</msup>
<mo>=</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msub>
<mi>&pi;</mi>
<mi>k</mi>
</msub>
<mi>N</mi>
<mo>(</mo>
<mrow>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>&mu;</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>&Sigma;</mi>
<mi>k</mi>
</msub>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
If LIK(t)-LIK(t-1)≤ δ, then stop iteration, exports current parameter set Θ(t);Otherwise, t=t+1, is continued to run with
Step (4-1), into parameter estimation procedure next time, if iterations has reached T, also ending said process;
Step 5:Using the mixed model estimated, new IPTV user's report barrier data set (X is generatedalm) ', if needing what is generated
Data volume is N ', including:
(5-1) obeys equally distributed random number ε between randomly generating one 0 to 1;
(5-2) is if ε ∈ [0, π1], then produce a Gaussian distributed N (μ1,Σ1) sample;IfThen produce a Gaussian distributed N (μk,Σk) sample;IfThen produce a Gaussian distributed N (μK,ΣK) sample;
(5-3) (5-1) and (5-2) N ' that repeats the above steps is secondary, obtains (Xalm) ', final IPTV user's report barrier data set is
2. a kind of IPTV user's report barrier data synthesis method based on mixed model according to claim 1, its feature exists
In the step 2 includes:For each dataIntroduce a corresponding stochastic variable zi, the variable is using " 1-
Of-K " value mode, takes some integer in 1~K, according toExpression formula,And ziJoint probabilityIt is expressed as:
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>&CenterDot;</mo>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>a</mi>
<mi>l</mi>
<mi>m</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Wherein, p (zi=k)=πk,
3. a kind of IPTV user's report barrier data synthesis method based on mixed model according to claim 1, its feature exists
In methods described is used for the distribution situation for describing IPTV user's report barrier data using mixed model, first, is reported according to existing user
Barrier data set up mixed model structure, then carry out parameter Estimation, final using the mixed model for estimating parameter, produce newly
User's report barrier data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710247904.8A CN107180246A (en) | 2017-04-17 | 2017-04-17 | A kind of IPTV user's report barrier data synthesis method based on mixed model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710247904.8A CN107180246A (en) | 2017-04-17 | 2017-04-17 | A kind of IPTV user's report barrier data synthesis method based on mixed model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107180246A true CN107180246A (en) | 2017-09-19 |
Family
ID=59830937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710247904.8A Pending CN107180246A (en) | 2017-04-17 | 2017-04-17 | A kind of IPTV user's report barrier data synthesis method based on mixed model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107180246A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109951327A (en) * | 2019-03-05 | 2019-06-28 | 南京信息职业技术学院 | A kind of network failure data synthesis method based on Bayesian mixture models |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102201236A (en) * | 2011-04-06 | 2011-09-28 | 中国人民解放军理工大学 | Speaker recognition method combining Gaussian mixture model and quantum neural network |
CN105208343A (en) * | 2015-09-25 | 2015-12-30 | 珠海安联锐视科技股份有限公司 | Intelligent monitoring system and method capable of being used for video monitoring device |
CN106056160A (en) * | 2016-06-06 | 2016-10-26 | 南京邮电大学 | User fault-reporting prediction method in unbalanced IPTV data set |
-
2017
- 2017-04-17 CN CN201710247904.8A patent/CN107180246A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102201236A (en) * | 2011-04-06 | 2011-09-28 | 中国人民解放军理工大学 | Speaker recognition method combining Gaussian mixture model and quantum neural network |
CN105208343A (en) * | 2015-09-25 | 2015-12-30 | 珠海安联锐视科技股份有限公司 | Intelligent monitoring system and method capable of being used for video monitoring device |
CN106056160A (en) * | 2016-06-06 | 2016-10-26 | 南京邮电大学 | User fault-reporting prediction method in unbalanced IPTV data set |
Non-Patent Citations (5)
Title |
---|
DONGYUN AN等: "Gaussian Mixture Model Based Interest Prediction In Social Networks", 《2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM)》 * |
RUOCHEN HUANG等: "Prediction Model for User’s QoE in Imbalanced Dataset", 《2015 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE THEORY, SYSTEMS AND APPLICATIONS》 * |
张爱华等: "非均衡文本分类中基于特征分布的抽样技术研究", 《第六届全国信息检索学术会议论文集》 * |
曹鹏: "不均衡数据分类方法的研究", 《中国博士学位论文全文数据库信息科技辑》 * |
连军艳: "EM算法及其改进在混合模型参数估计中的应用研究", 《中国优秀博硕士学位论文全文数据库 (硕士)工程科技Ⅱ辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109951327A (en) * | 2019-03-05 | 2019-06-28 | 南京信息职业技术学院 | A kind of network failure data synthesis method based on Bayesian mixture models |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104657744B (en) | A kind of multi-categorizer training method and sorting technique based on non-determined Active Learning | |
CN107590565A (en) | A kind of method and device for building building energy consumption forecast model | |
CN107622162B (en) | Copula function-based water level flow relation curve calculation method | |
CN107038167A (en) | Big data excavating analysis system and its analysis method based on model evaluation | |
CN107886160B (en) | BP neural network interval water demand prediction method | |
CN111898831B (en) | Real-time flood probability forecasting practical method | |
CN110910004A (en) | Reservoir dispatching rule extraction method and system with multiple uncertainties | |
CN104468728B (en) | A kind of method for service selection based on comentropy and variance | |
CN104182474A (en) | Method for recognizing pre-churn users | |
CN101826090A (en) | WEB public opinion trend forecasting method based on optimal model | |
CN104539601B (en) | Dynamic network attack process analysis method for reliability and system | |
CN106897774A (en) | Multiple soft measurement algorithm cluster modeling methods based on Monte Carlo cross validation | |
CN110457369A (en) | A kind of training method and relevant device of model | |
CN104615866A (en) | Service life prediction method based on physical statistic model | |
CN109033513A (en) | Method for diagnosing fault of power transformer and diagnosing fault of power transformer device | |
CN109787821B (en) | Intelligent prediction method for large-scale mobile client traffic consumption | |
CN106919564A (en) | A kind of influence power measure based on mobile subscriber's behavior | |
CN110059938B (en) | Power distribution network planning method based on association rule driving | |
CN109299853B (en) | Reservoir dispatching function extraction method based on joint probability distribution | |
CN107180246A (en) | A kind of IPTV user's report barrier data synthesis method based on mixed model | |
CN104579850A (en) | Quality of service (QoS) prediction method for Web service under mobile Internet environment | |
CN112508254A (en) | Method for determining investment prediction data of transformer substation engineering project | |
CN115022195B (en) | Flow dynamic measurement method for IPv6 network | |
CN115796341A (en) | Carbon effect code-based collaborative measure method for enterprise low-carbon economic performance | |
CN115330085A (en) | Wind speed prediction method based on deep neural network and without future information leakage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170919 |