CN107180246A - A kind of IPTV user's report barrier data synthesis method based on mixed model - Google Patents

A kind of IPTV user's report barrier data synthesis method based on mixed model Download PDF

Info

Publication number
CN107180246A
CN107180246A CN201710247904.8A CN201710247904A CN107180246A CN 107180246 A CN107180246 A CN 107180246A CN 201710247904 A CN201710247904 A CN 201710247904A CN 107180246 A CN107180246 A CN 107180246A
Authority
CN
China
Prior art keywords
mrow
msubsup
msub
data
msup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710247904.8A
Other languages
Chinese (zh)
Inventor
魏昕
李智林
刘榕华
周亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201710247904.8A priority Critical patent/CN107180246A/en
Publication of CN107180246A publication Critical patent/CN107180246A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a kind of IPTV user's report barrier data synthesis method based on mixed model, this method is used to solve not analyze and process a few sample in existing minority class data synthesis method, the defect for the subsequent classification forecast model hydraulic performance decline for directly generating new samples and causing, the present invention extracts user's report barrier data set first from the data that IPTV set top box is collected, the distribution of the data set is represented with mixed model, then complete initialization and the parameter Estimation of model parameter, finally using the mixed model established, new IPTV user's report barrier data are generated.Using the method for the present invention, user's report barrier data characteristic with unbalance response can preferably be held, produced new IPTV user reports barrier data more representative and classification discrimination, and can preferably lift follow-up user's report barrier classification with unbalance response, the performance of prediction.

Description

A kind of IPTV user's report barrier data synthesis method based on mixed model
Technical field
The present invention relates to a kind of IPTV user's report barrier data synthesis method based on mixed model, belong at unbalanced data Manage technical field.
Background technology
With the development of Internet technology, increasing user begins to use IPTV service.IPTV operators are also exerting Power provides the user higher quality and transmission more stable streaming media video service.When the user's body in video traffic is checked the quality When amount declines or be not good, user can propose report barrier to operator.In other words, the report barrier of user and the Quality of experience of user are close It is related.If operator can shift to an earlier date the report barrier behavior of the user of precise and high efficiency, and can in the solution IPTV networks that take measures Energy produced problem, then just can effectively reduce the report barrier behavior of user.Therefore, the analysis of the report barrier related data of user And the prediction of behavior is most important for operator.
In systems in practice, the user of report barrier ratio shared in overall user is relatively small, in other words, report barrier production Raw probability will be well below the normal probability of Consumer's Experience.Therefore, the report for user hinders prediction task, data set tool There is unbalance response.Unbalanced dataset refers to that the class data in data set are substantially few more many than other class data.At this In, report barrier (minority class sample) data volume of user will hinder the data volume of (more several classes of samples) far less than non-report.For so Situation, traditional two graders generally train obtained grader to have preference when handling unbalanced data so that majority Class prediction is with very high accuracy, and then accuracy is very low for minority class.In the method for processing unbalanced dataset, lead to The normal method based on sampling, passes through the distribution of change data collection so that unbalanced dataset becomes the data set of balance.
Most of existing methods are to handle injustice by the way of directly new minority class sample is generated from available sample The data that weigh, such as Synthetic Minority Oversampling Technique (SMOTE) method.These modes compare Intuitively, but be due to its deeply excavate minority class sample distribution character, its produce sample not only not necessarily help In classification, reaction often is played to classification, to find out its cause, the new minority class sample of generation is not representative, thus IPTV user's report barrier prediction can not be preferably applied to.
The content of the invention
The present invention seeks to the defect for being to solve IPTV user's report barrier Data processing, it is proposed that one kind is based on hybrid guided mode IPTV user's report barrier data synthesis method of type, this method is (a small number of for describing IPTV user's report barrier data using mixed model Class sample) distribution situation, first, according to existing user report barrier data set up graph model structure, then carry out parameter Estimation, most Eventually using the model for estimating parameter, new user's report barrier data are produced, so that unbalanced data becomes relative equilibrium.
The technical scheme adopted by the invention to solve the technical problem is that:A kind of IPTV user's report barrier based on mixed model Data synthesis method, this method comprises the following steps:
Step 1:If the user experience data collection obtained by IPTV set top box isWherein xiBy eight attributes Composition, be respectively:Packet loss (LOSSRATE), set top box downloading rate (DOWN_BANDWIDH), video downloading rate (MEDIARATE), transmission delay (MDI_DF), transmission of video packet loss (MDIMLR), network transmission quality (VSTQ), video MOS points of (MOS_VALUE), cpu busy percentages (CPU_USAGE).Choose wherein user's report barrier mark(yi=0 represents User does not report barrier, yi=1 represents user's report barrier) it is used as IPTV user's report barrier data set for 1 data Xalm∈ X, if a total of N number of data.
Step 2:X is represented from mixed modelalmDistribution, its probability-distribution function expression formula is as follows:
Wherein,For the parameter set of mixed model.The model is made up of K Gaussian Profile, μkk The mean value vector and covariance matrix of respectively k-th Gaussian Profile;πkRepresent XalmCome from the probability of k-th of Gaussian Profile, It meets 0≤πk≤ 1 He
For the ease of expression and follow-up parameter Estimation, for each dataIntroducing one is corresponding random Variable zi, the value mode (taking some integer in 1~K) of variable use " 1-of-K ".By formula (1),And ziJoint ProbabilityIt is expressed as:
Wherein, p (zi=k)=πk,
Step 3:Initialize the parameter of mixed model.Randomly selectK data in data set, it is assumed that The data of taking-up are that k-th of data isIn XalmIt is middle searching withC nearest node of Euclidean distanceComposition withRelated data setThe average and covariance of the data set are tried to achieve, respectively It is used as the mean μ of k-th of composition in mixed modelkWith covariance matrix ΣkInitial value, πkInitial value take 1/K.With this side Formula obtains initial parameter collectionIteration count variable t=1, iteration total degree is set to T.
Step 4:Utilize XalmParameter Estimation is carried out to mixed model, interative computation step is as follows:
(4-1) utilizes existing parameter setAnd Xalm, calculate givenUnder conditions of, zi=k probability γ (i, k):
(4-2), using γ (i, k), the parameter set for updating mixed model isCalculating process is such as Under:
Wherein,
(4-3) calculates the log-likelihood function value LIK under current iteration(t), i.e.,:
If LIK(t)-LIK(t-1)≤ δ, then stop iteration, exports current parameter set Θ(t);Otherwise, t=t+1, continues Run (4-1), into next iteration process.If iterations has reached T, also terminate iterative process.
Step 5:Using the mixed model estimated, new IPTV user's report barrier data are generated, its collection is combined into (Xalm)'.If The data volume for needing generation is N ', and generating process is as follows:
(5-1) obeys equally distributed random number ε between randomly generating one 0 to 1.
(5-2) is if ε ∈ [0, π1], then produce a Gaussian distributed N (μ11) sample;IfK=2 ..., K-1, then produce a Gaussian distributed N (μkk) sample;IfThen produce a Gaussian distributed N (μKK) sample.
(5-3) (5-1) and (5-2) N ' that repeats the above steps is secondary, obtains (Xalm)'.Final IPTV user's report hinders data set and isFor follow-up IPTV user's report barrier Forecasting Methodology.
Beneficial effect:
1. the present invention solves go out in IPTV user's report barrier prediction task well by producing IPTV user's report barrier data The classification of existing unbalanced data, the problem of predict not accurate enough.
2. the present invention has modeled the distribution of IPTV user's report barrier data, the spy that the data are held well using mixed model Property, compared with traditional method, new IPTV user's report barrier data produced by the present invention are more representative and classificatory Discrimination.
3. the present invention can avoid the data overlap during the minority class sample generation that traditional SMOTE methods are brought Problem.
Brief description of the drawings
Fig. 1 is IPTV unbalanced datas Treatment Analysis of the invention and prediction flow chart.
Fig. 2 is the present invention and traditional algorithm performance comparison figure.
Embodiment
The invention is described in further detail with reference to Figure of description.
As shown in figure 1, the invention provides a kind of IPTV user's report barrier data synthesis method based on mixed model, the party Method comprises the following steps:
Step 1:If the user experience data collection obtained by IPTV set top box isWherein xiBy eight attributes Composition, be respectively:Packet loss (LOSSRATE), set top box downloading rate (DOWN_BANDWIDH), video downloading rate (MEDIARATE), transmission delay (MDI_DF), transmission of video packet loss (MDIMLR), network transmission quality (VSTQ), video MOS points of (MOS_VALUE), cpu busy percentages (CPU_USAGE).Choose wherein user's report barrier mark(ALARM) it is 1 Data be used as IPTV user report barrier data setXalm∈ X, a total of N number of data.
Step 2:X is represented from gauss hybrid models (GMM)almDistribution, its probability-distribution function expression formula is as follows:
Wherein,For the parameter set of mixed model.The model is made up of K Gaussian Profile, μkk The mean value vector and covariance matrix of respectively k-th Gaussian Profile;πkRepresent XalmCome from the probability of k-th of Gaussian Profile, It meets 0≤πk≤ 1 He
For the ease of expression and follow-up parameter Estimation, for each dataIntroducing one is corresponding random Variable zi, the value mode (taking some integer in 1~K) of variable use " 1-of-K ".By formula (1),And ziJoint ProbabilityIt is expressed as:
Wherein, p (zi=k)=πk,
Step 3:Initialize the parameter of mixed model.Randomly selectK data in data set, it is assumed that The data of taking-up are that k-th of data isIn XalmIt is middle searching withC nearest node of Euclidean distanceComposition withRelated data setThe average and covariance of the data set are tried to achieve, respectively It is used as the mean μ of k-th of composition in mixed modelkWith covariance matrix ΣkInitial value, πkInitial value take 1/K.With this side Formula obtains initial parameter collectionIteration count variable t=1, iteration total degree is set to T.
Step 4:Utilize XalmParameter Estimation is carried out to mixed model, interative computation step is as follows:
(4-1) utilizes existing parameter setAnd Xalm, calculate givenUnder conditions of, zi=k probability γ (i, k):
(4-2), using γ (i, k), the parameter set for updating mixed model isCalculating process is such as Under:
Wherein,
(4-3) calculates the log-likelihood function value LIK under current iteration(t), i.e.,:
If LIK(t)-LIK(t-1)≤ δ, then stop iteration, exports current parameter set Θ(t);Otherwise, t=t+1, continues Run (4-1), into next iteration process.If iterations has reached T, also terminate iterative process.
Step 5:Using the mixed model estimated, new IPTV user's report barrier data set (X is generatedalm)'.If needing life Into data volume be N ', generating process is as follows:
(5-1) obeys equally distributed random number ε between randomly generating one 0 to 1.
(5-2) is if ε ∈ [0, π1], then produce a Gaussian distributed N (μ11) sample;IfK=2 ..., K-1, then produce a Gaussian distributed N (μkk) sample;IfThen produce a Gaussian distributed N (μKK) sample.
(5-3) (5-1) and (5-2) N ' that repeats the above steps is secondary, obtains (Xalm)'.Final IPTV user's report hinders data set and isFor follow-up IPTV user's report barrier Forecasting Methodology.
As shown in Fig. 2 the present invention reports in order to which the IPTV user based on mixed model designed by the present invention is better described Hinder data synthesis method advantage, will using the present invention designed by method produce IPTV user report barrier data application in User's report barrier prediction of IPTV system.In prediction grader is used as from naive Bayesian.By using designed by the present invention IPTV user's report barrier that method is produced predicts the outcome and (represented with GMM) and do not handle (no-SMOTE), Borderline- SMOTE, Kmeans-SMOTE method are compared, so as to evaluate and weigh method involved in the present invention validity and Accuracy.When user's report barrier (minority class sample) quantity and user do not report the quantitative proportion of barrier (more several classes of samples) to reach 1:89 When, method G values proposed by the present invention reach 0.5982, are higher than traditional SMOTE methods, and concrete outcome is as shown in Figure 2.G values It is more high, illustrate that the degree of accuracy for user's report barrier prediction with a small number of class features is higher.Test result indicates that using the present invention Designed IPTV user's report barrier data synthesis method, the classification for significantly improving existing uneven IPTV data sets is predictive Energy.

Claims (3)

1. a kind of IPTV user's report barrier data synthesis method based on mixed model, it is characterised in that methods described following steps:
Step 1:If the user experience data collection obtained by IPTV set top box isWherein xiBy eight set of properties Into respectively:Packet loss (LOSSRATE), set top box downloading rate (DOWN_BANDWIDH), video downloading rate (MEDIARATE), transmission delay (MDI_DF), transmission of video packet loss (MDIMLR), network transmission quality (VSTQ), video MOS points of (MOS_VALUE), cpu busy percentages (CPU_USAGE);Choose wherein user's report barrier mark(ALARM) it is 1 Data be used as IPTV user report barrier data setXalm∈ X, a total of N number of data;
Step 2:X is represented from mixed modelalmDistribution, its probability-distribution function expression formula includes:
<mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>&amp;pi;</mi> <mi>k</mi> </msub> <mi>N</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msub> <mi>&amp;mu;</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>&amp;Sigma;</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
Wherein,For the parameter set of mixed model;The model is made up of K Gaussian Profile, μkkRespectively For the mean value vector and covariance matrix of k-th Gaussian Profile;πkRepresent XalmCome from the probability of k-th of Gaussian Profile, it is expired 0≤π of footk≤ 1 He
Step 3:Randomly selectK data in data set, it is assumed that k-th of data of taking-up are XalmIt is middle searching withC nearest node of Euclidean distanceComposition withRelated locality setThe average and covariance of the locality set are tried to achieve, respectively as the mean μ of k-th of composition in mixed modelk With covariance matrix ΣkInitial value;πkInitial value take 1/K;Initial parameter collection is obtained in this wayIteration count variable t=1, iteration total degree is set to T;
Step 4:Utilize XalmParameter Estimation is carried out to mixed model, including:
(4-1) utilizes existing parameter setAnd Xalm, calculate givenUnder conditions of, zi=k Probability γ (i, k):
<mrow> <mi>&amp;gamma;</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </msubsup> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&amp;pi;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mi>N</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>&amp;mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>&amp;Sigma;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </msubsup> <msubsup> <mi>&amp;pi;</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mi>N</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>&amp;mu;</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>&amp;Sigma;</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>
(4-2), using γ (i, k), the parameter set for updating mixed model isCalculating process is as follows:
<mrow> <msubsup> <mi>&amp;mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>k</mi> </msub> </mfrac> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>&amp;gamma;</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&amp;CenterDot;</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>,</mo> </mrow>
<mrow> <msubsup> <mi>&amp;Sigma;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>k</mi> </msub> </mfrac> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>&amp;gamma;</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>&amp;mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>&amp;mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow>
<mrow> <msubsup> <mi>&amp;pi;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mfrac> <msub> <mi>N</mi> <mi>k</mi> </msub> <mi>N</mi> </mfrac> <mo>,</mo> </mrow>
Wherein,
(4-3) calculates the log-likelihood function value LIK under current iteration(t), i.e.,:
<mrow> <msup> <mi>LIK</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>&amp;pi;</mi> <mi>k</mi> </msub> <mi>N</mi> <mo>(</mo> <mrow> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msub> <mi>&amp;mu;</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>&amp;Sigma;</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
If LIK(t)-LIK(t-1)≤ δ, then stop iteration, exports current parameter set Θ(t);Otherwise, t=t+1, is continued to run with Step (4-1), into parameter estimation procedure next time, if iterations has reached T, also ending said process;
Step 5:Using the mixed model estimated, new IPTV user's report barrier data set (X is generatedalm) ', if needing what is generated Data volume is N ', including:
(5-1) obeys equally distributed random number ε between randomly generating one 0 to 1;
(5-2) is if ε ∈ [0, π1], then produce a Gaussian distributed N (μ11) sample;IfThen produce a Gaussian distributed N (μkk) sample;IfThen produce a Gaussian distributed N (μKK) sample;
(5-3) (5-1) and (5-2) N ' that repeats the above steps is secondary, obtains (Xalm) ', final IPTV user's report barrier data set is
2. a kind of IPTV user's report barrier data synthesis method based on mixed model according to claim 1, its feature exists In the step 2 includes:For each dataIntroduce a corresponding stochastic variable zi, the variable is using " 1- Of-K " value mode, takes some integer in 1~K, according toExpression formula,And ziJoint probabilityIt is expressed as:
<mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&amp;CenterDot;</mo> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>a</mi> <mi>l</mi> <mi>m</mi> </mrow> </msubsup> <mo>|</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
Wherein, p (zi=k)=πk,
3. a kind of IPTV user's report barrier data synthesis method based on mixed model according to claim 1, its feature exists In methods described is used for the distribution situation for describing IPTV user's report barrier data using mixed model, first, is reported according to existing user Barrier data set up mixed model structure, then carry out parameter Estimation, final using the mixed model for estimating parameter, produce newly User's report barrier data.
CN201710247904.8A 2017-04-17 2017-04-17 A kind of IPTV user's report barrier data synthesis method based on mixed model Pending CN107180246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710247904.8A CN107180246A (en) 2017-04-17 2017-04-17 A kind of IPTV user's report barrier data synthesis method based on mixed model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710247904.8A CN107180246A (en) 2017-04-17 2017-04-17 A kind of IPTV user's report barrier data synthesis method based on mixed model

Publications (1)

Publication Number Publication Date
CN107180246A true CN107180246A (en) 2017-09-19

Family

ID=59830937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710247904.8A Pending CN107180246A (en) 2017-04-17 2017-04-17 A kind of IPTV user's report barrier data synthesis method based on mixed model

Country Status (1)

Country Link
CN (1) CN107180246A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109951327A (en) * 2019-03-05 2019-06-28 南京信息职业技术学院 A kind of network failure data synthesis method based on Bayesian mixture models

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN105208343A (en) * 2015-09-25 2015-12-30 珠海安联锐视科技股份有限公司 Intelligent monitoring system and method capable of being used for video monitoring device
CN106056160A (en) * 2016-06-06 2016-10-26 南京邮电大学 User fault-reporting prediction method in unbalanced IPTV data set

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN105208343A (en) * 2015-09-25 2015-12-30 珠海安联锐视科技股份有限公司 Intelligent monitoring system and method capable of being used for video monitoring device
CN106056160A (en) * 2016-06-06 2016-10-26 南京邮电大学 User fault-reporting prediction method in unbalanced IPTV data set

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DONGYUN AN等: "Gaussian Mixture Model Based Interest Prediction In Social Networks", 《2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM)》 *
RUOCHEN HUANG等: "Prediction Model for User’s QoE in Imbalanced Dataset", 《2015 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE THEORY, SYSTEMS AND APPLICATIONS》 *
张爱华等: "非均衡文本分类中基于特征分布的抽样技术研究", 《第六届全国信息检索学术会议论文集》 *
曹鹏: "不均衡数据分类方法的研究", 《中国博士学位论文全文数据库信息科技辑》 *
连军艳: "EM算法及其改进在混合模型参数估计中的应用研究", 《中国优秀博硕士学位论文全文数据库 (硕士)工程科技Ⅱ辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109951327A (en) * 2019-03-05 2019-06-28 南京信息职业技术学院 A kind of network failure data synthesis method based on Bayesian mixture models

Similar Documents

Publication Publication Date Title
CN104657744B (en) A kind of multi-categorizer training method and sorting technique based on non-determined Active Learning
CN107590565A (en) A kind of method and device for building building energy consumption forecast model
CN107622162B (en) Copula function-based water level flow relation curve calculation method
CN107038167A (en) Big data excavating analysis system and its analysis method based on model evaluation
CN107886160B (en) BP neural network interval water demand prediction method
CN111898831B (en) Real-time flood probability forecasting practical method
CN110910004A (en) Reservoir dispatching rule extraction method and system with multiple uncertainties
CN104468728B (en) A kind of method for service selection based on comentropy and variance
CN104182474A (en) Method for recognizing pre-churn users
CN101826090A (en) WEB public opinion trend forecasting method based on optimal model
CN104539601B (en) Dynamic network attack process analysis method for reliability and system
CN106897774A (en) Multiple soft measurement algorithm cluster modeling methods based on Monte Carlo cross validation
CN110457369A (en) A kind of training method and relevant device of model
CN104615866A (en) Service life prediction method based on physical statistic model
CN109033513A (en) Method for diagnosing fault of power transformer and diagnosing fault of power transformer device
CN109787821B (en) Intelligent prediction method for large-scale mobile client traffic consumption
CN106919564A (en) A kind of influence power measure based on mobile subscriber&#39;s behavior
CN110059938B (en) Power distribution network planning method based on association rule driving
CN109299853B (en) Reservoir dispatching function extraction method based on joint probability distribution
CN107180246A (en) A kind of IPTV user&#39;s report barrier data synthesis method based on mixed model
CN104579850A (en) Quality of service (QoS) prediction method for Web service under mobile Internet environment
CN112508254A (en) Method for determining investment prediction data of transformer substation engineering project
CN115022195B (en) Flow dynamic measurement method for IPv6 network
CN115796341A (en) Carbon effect code-based collaborative measure method for enterprise low-carbon economic performance
CN115330085A (en) Wind speed prediction method based on deep neural network and without future information leakage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170919