CN107222787A

CN107222787A - Video resource popularity prediction method

Info

Publication number: CN107222787A
Application number: CN201710409198.2A
Authority: CN
Inventors: 王子磊; 朱策
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2017-06-02
Filing date: 2017-06-02
Publication date: 2017-09-29

Abstract

The invention discloses a kind of video resource Popularity prediction method, including：The viewing-data of certain area in-group user is counted, the rating categorical data and interbehavior data of group of subscribers is obtained, and utilizes the resource popularity of rating categorical data statistics to calculate；Utilize the LDA models for coupling user behavior, traversal rating categorical data and interbehavior data generate correspondence Di Li Cray distributions respectively, the full probability of each behavior pattern is derived by chain rule and the expectation of its Di Li Cray distribution is asked for, behavior pattern matrix is obtained；With reference to neural network model, the resource popularity of institute's statistics and behavior pattern matrix are inputted as neutral net, by training generation forecast model, and then to predict the video resource popularity in future.This method considers the influence that user watched interactive contents data and interbehavior data are predicted resource popularity, and the relation between two class data of research and popularity improves the accuracy predicted resource popularity.

Description

Video resource Popularity prediction method

Technical field

The present invention relates to data mining and machine learning techniques field, a kind of video resource Popularity prediction side is especially designed Method.

Background technology

Video request program is a kind of based on user watched dynamic need, and the service content for transmitting and playing corresponding demand is regarded Frequency play-back technology.User has the initiative to data flow in the selection of rating content and viewing mode, if can ring in time Efficiency using family viewing behavior demand and scheduling of resource is closely related.During scheduling of resource, the popularity of resource information It is the important references standard of dispatching algorithm.

Separately below related research work is introduced from user behavior analysis and resource popularity forecasting research respectively.

In the existing research on user behavior analysis, the definition to user behavior is broadly divided into two classes：What one class referred to It is interactive user operations of the user during program request rating, described interactive operation and the particular problem of research has closely The system of " user behavior analysis device and realization in IPTV system " analysis application of this network technology company of relation, such as Saite, How main research is safeguarded and be run to equipment control model with reference to user behavior, and different user data informations is counted Service provider is fed back to after arrangement, user behavior therein generally refers to user to live, program request, the use feelings reviewed Condition, including viewing duration and viewing number of times etc., user behavior therein are fixed under its company oneself equipment management system framework Justice is simultaneously counted, and the difference between viewing mode (live, program request) is confined to the definition of viewing behavior, without it is real deeply To in demand mode in different user mutual behaviors；Another kind of user behavior refers to the content of user's request, to ask Content represent user behavior and user's request content handled and excavated, such as Wang Pan patent " is regarded based on full-service The IPTV user behavior analysis method of map analysis " matches the customer group square for being adapted to preference according to service attribute centered on business Battle array, user behavior therein generally refers to demand of the user to IPTV value-added services, including information browse, game, visual electricity The value-added services such as words, image space, think to its study limitation to user behavior that use of the user to different business is represented not Same user behavior, is not still deep into the essence of user mutual behavior, it is impossible to react user exactly to rating content Demand.

For resource popularity prediction, " a kind of Resource service system and its resource allocation methods " of CHINAUNICOM mainly will Queued tasks in operation system carry out Clustering according to weight properties, and resource point is carried out to task according to priority marking Match somebody with somebody；Four reach " Proxy Caching for Streaming Media replacement method and the device " of epoch communication network technique Co., Ltd mainly according to principal component The method of analysis and multiple linear regression determines the Popularity prediction value of Streaming Media Object, is determined with reference to the signal to noise ratio of video peak value The comprehensive value of Streaming Media Object is to carry out caching replacement；Zhang Tiankui etc. is " in a kind of information predicted based on content popularit Proposed in heart network-caching method " and clustered each nodes records by similarity analysis, the content in calculate node Popularity simultaneously does cycle statistics, is predicted further according to prediction algorithm pop degree, and carrying out contrast decision with local packets is No renewal nodal cache；Often mark etc. in " a kind of online serialized content Popularity prediction method based on autoregression model " The prediction of online serialized content popularity is solved again, and by crawling the overall playback volume trend of online serialized content, parsing is overall The html source code of the playback volume trend page, utilizes the popularity of the new serialized content of autoregressive model prediction.Come from such scheme See, the prediction of popularity and user mutual behavior are not carried out Conjoint Analysis by existing research so that the forecasting research of popularity The data supporting of user behavior angle is lacked, Popularity prediction precision can not obtain Reliable guarantee.

The content of the invention

It is an object of the invention to provide a kind of video resource Popularity prediction method, user watched interaction content is considered The influence that data and interbehavior data are predicted resource popularity, the relation between two class data of research and popularity is improved The accuracy predicted resource popularity.

The purpose of the present invention is achieved through the following technical solutions：

A kind of video resource Popularity prediction method, including：

The viewing-data of certain area in-group user is counted, the rating categorical data and interbehavior of group of subscribers is obtained Data, and utilize the resource popularity of rating categorical data statistics to calculate；

Using the document subject matter generation model LDA for coupling user behavior, traversal rating categorical data and interbehavior data Correspondence Di Li Cray distributions are generated respectively, and the full probability of each behavior pattern is derived by chain rule and its Di Li Cray is asked for The expectation of distribution, obtains behavior pattern matrix；

It is with reference to neural network model, the resource popularity of institute's statistics and behavior pattern matrix is defeated as neutral net Enter, by training generation forecast model, the video resource popularity in future is predicted according to the forecast model of training.

The viewing-data of the statistics certain area in-group user, obtains rating categorical data and the interaction of group of subscribers Behavioral data includes：

The viewing daily record of all group of subscribers in area is counted, by dividing period and program category, colony is obtained The rating categorical data and the data of interbehavior of different time sections in the certain number of days of user；

Wherein, the division period refers to divide the time of one day, if being divided into 24 periods by one day, often One period was a hour；

Interbehavior refers to broadcast state of the user in demand (telecommunication) service, has 10 kinds：Program collection, first startup are broadcast Put, be again started up broadcasting, F.F. state, fast-rewinding state, halted state, positioning playing, play failure, exit to play and broadcast with memory Put.

The resource popularity using rating categorical data statistics to calculate includes：

By rating categorical data set C={ c_i| i=1,2, K } represent, the element c in set C_iBy rating Type number, c_iI class rating types are represented, accounting of its time being accessed by the user in total rating duration is p_i, then p_iIt is c_i Popularity in set C, and P={ p_i| i=1,2, K be set C resource popularity set；Wherein, K is receipts Depending on the total quantity of type.

It is described to utilize the document subject matter generation model LDA for coupling user behavior, traversal rating categorical data and interbehavior Data generate correspondence Di Li Cray distributions respectively, derive the full probability of each behavior pattern by chain rule and ask for its Di Li The expectation of Cray distribution, obtaining behavior pattern matrix includes：

Assuming that there is K kind rating types V in rating categorical data_wThere is L kind behavior types V in portion's program, interbehavior data_l Individual interbehavior；

To there is the interbehavior of a certain type in the program of a certain rating type, then referred to as behavior pattern；M-th of document The behavior pattern multinomial distribution of middle userIt is K × L dimension matrix, behavior pattern distributionIn z-th positional representation Behavior pattern is user to rating type z_mn1Program take z_mn2Plant interbehavior；Wherein,

z_mn2=(z_mnmod K)

Using the LDA models for coupling user behavior, traversal rating categorical data and interbehavior data generate correspondence respectively Di Li Crays are distributed, and its process is as follows：

Order： Then for each viewing-data n, n ∈ { 1 ..., N_m}；Wherein, N_mFor viewing-data quantity in m-th of document,For kth kind The program multinomial distribution of rating type,For the interbehavior multinomial distribution of l kind behavior types, α is that behavior pattern is more Item formula distributionDirichlet Study firsts, β be program multinomial distributionDirichlet Study firsts, γ for interaction Behavior multinomial distributionDirichlet Study firsts；

Calculate the distribution of Di Li Crays Θ, Φ, Ψ：

Wherein,Behavior pattern, program category, the quantity of behavior type for participating in calculating are represented respectively；Respectively program, interbehavior and behavior pattern set；Represent the set of behavior pattern in m-th of document, Φ For program profile multinomialIn all program probability of happening Di Li Crays distribution, Ψ be interbehavior multinomial distributionIn The Di Li Crays distribution of all interbehavior probability of happening, Θ is behavior pattern multinomial distributionIn all behavior patterns occur The Di Li Crays distribution of probability；

Θ, Φ, Ψ are distributed based on Di Li Crays, and derive by chain rule the full probability of each behavior pattern：

Wherein,Respectively program, interbehavior and behavior pattern set, these three set inner elements w_mn、 t_mn、z_mnRepresent respectively the program of the n-th viewing-data in m-th of document, the interbehavior of the n-th viewing-data in m-th of document, The behavior pattern of n-th viewing-data in m-th of document；Represent behavior pattern setMiddle removal z_mnSet afterwards；

The full probability for being based ultimately upon Di Li Crays distribution Θ, Φ, Ψ and each behavior pattern solves the distribution of Di Li Crays Expectation, be derived by equation below：

Wherein,User is represented to V in kth kind rating type_wThe viewing number of times of portion's program,Distribution is represented in l Plant the V of behavior type_lThe number of times of interbehavior,Represent the frequency of i-th of behavior pattern in m-th of document；θ_mi、Respectively single behavior pattern, program, the probability of happening of interbehavior.

The combination neural network model, regard the resource popularity of institute's statistics and behavior pattern matrix as nerve net Network is inputted, and by training generation forecast model, the video resource popularity in future is predicted according to the forecast model of training to be included：

The Nonlinear Mapping to behavior pattern matrix and the resource popularity of statistics is realized using BP neural network, it is defeated Enter layer and output layer neuron number is relevant with input/output argument, the rating type of the behavior pattern matrix program of input has K Kind, the video resource popularity in future is output as, then input layer and output layer node number are respectively K and 1, node in hidden layer It is set to S；

By the behavior vector of the program of different rating types in the resource popularity of statistics, and behavior pattern matrix Read in, then training data and test data are divided into by data are read in；

Start to initialize BP neural network, training method uses steepest descent method, and batch mode trains behavioral pattern data, Then sample is inputted into BP neural network by the way of batch is trained, calculates the error of each sample；Finally judge whether to receive Hold back, if not restraining, weights are adjusted according to steepest descent method, until convergence, so as to obtain forecast model；

The test data of delimitation is inputted into BP neural network, predicts that the video resource in future is popular using forecast model Degree.

As seen from the above technical solution provided by the invention, colony is carried out by program category to user data in area Analysis, effectively weakens the influence that single film breaks out pop degree precision of prediction in short term；By the LDA moulds for coupling user behavior Type carries out Conjoint Analysis to rating content and interbehavior, it is to avoid loss of learning caused by the single data of research, more accurately Ground finds group of subscribers behavior pattern；Using neural network model, according to the behavior pattern Matrix prediction for having merged behavioral data Video resource popularity, improves the Nonlinear Processing ability of prediction, so as to reduce predicated error；In addition, the present invention is provided Method be applied in the scheduling of resource of Cloud Server, can effectively improve request receptance, lift Consumer's Experience.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, being used required in being described below to embodiment Accompanying drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this For the those of ordinary skill in field, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.

Fig. 1 is a kind of flow chart of video resource Popularity prediction method provided in an embodiment of the present invention；

Fig. 2 is the flow chart that behavior pattern provided in an embodiment of the present invention is found；

Fig. 3 is the schematic diagram of behavior pattern matrix provided in an embodiment of the present invention；

Fig. 4 is the flow chart of the video resource Popularity prediction of fusion user behavior provided in an embodiment of the present invention.

Embodiment

With reference to the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Based on this The embodiment of invention, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to protection scope of the present invention.

The embodiment of the present invention provides a kind of video resource Popularity prediction method, and this method is to user group's data from resource Type dimension is handled, and generates user group's viewing-data；It is not independent handle in rating perhaps for viewing-data analysis Person's interbehavior, but two class data of fusion derive its joint probability distribution, so as to accurately describe the internal relation of two class data； In Forecasting Methodology, joint probability Input matrix to neural network model is predicted to video resource popularity, set up between the two Accurate mapping relations.

As described in Figure 1, it is a kind of flow chart of video resource Popularity prediction method.First, group in statistics certain area The viewing-data of body user, obtains the rating categorical data and interbehavior data of group of subscribers, and utilizes rating categorical data The resource popularity of statistics to calculate；Then, using the document subject matter generation model LDA for coupling user behavior, traversal Rating categorical data and interbehavior data generate correspondence Di Li Cray distributions respectively, and each behavior mould is derived by chain rule The full probability of formula and the expectation for asking for the distribution of its Di Li Cray, obtain behavior pattern matrix；Finally, with reference to neural network model, The resource popularity of institute's statistics and behavior pattern matrix are inputted as neutral net, by training generation forecast model, The video resource popularity in future is predicted according to the forecast model of training.This method considers user watched interaction content number According to the influence predicted with interbehavior data resource popularity, the relation between two class data of research and popularity, raising pair The accuracy of resource popularity prediction, and then it is effectively improved the resource deployment efficiency of Streaming Media Cloud Server, lifting request access With the service quality of response.

It is described in detail below for each step.

First, group of subscribers data processing.

In the embodiment of the present invention, the purpose of group of subscribers data processing is, by counting all group of subscribers in area Viewing daily record, then by dividing period and program category, different time sections in the certain number of days of group of subscribers can be obtained The data of rating categorical data and interbehavior.Meanwhile, the behavior mould of individual user can effectively be ignored to population data processing Formula makes a variation, and behavior pattern variation refers to certain user because with special behavior pattern, such as a F.F. or only sees certain class Program.Can be by the influence reduction of these uncertain variation pop degree predictions to minimum from colony's angle processing.

In the embodiment of the present invention, the division period refers to divide the time of one day, if being divided into 24 by one day Period, then each period is a hour, then the data set sample of 25 days can just be divided into 600 period collection Close.

Rating type can be divided into 15 types, respectively news, finance and economics, variety, physical culture, film, animation, military affairs, TV play, science and education, life, fashion tourism, child-parent education, music, old man's program and juvenile's program.

In addition, the embodiment of the present invention is also using the resource popularity of rating categorical data statistics to calculate, as One input of neutral net.Specifically：By rating categorical data set C={ c_i| i=1,2, K } represent, Element c in set C_iBy rating type number, c_iI class rating types are represented, its time being accessed by the user is in total rating duration In accounting be p_i, then p_iIt is c_iPopularity in set C, and P={ p_i| i=1,2, K be set C resource Popularity set；Wherein, K is the total quantity of rating type.

2nd, behavior pattern is found.

LDA models can obtain a data and concentrate complicated theme as classical body of text generation model Set, under broadcasting and TV data research background, the present invention is ground using the LDA models of coupling user behavior to group of subscribers data Study carefully.As shown in Fig. 2 choosing the viewing-data in user's one period of past, type demarcation is carried out to each program, program Type is as the theme of document generation, and LDA models can accurately generate the set of different program categories；Different program classes The viewing behavior pattern of user is different in type, and therefore, behavior pattern is distributedIntroduce LDA models to be coupled, by row User watched program category is finely divided for pattern, can be so in the rating to group of subscribers in finer granularity Appearance is analysed in depth, and obtains the Joint Distribution of behavior-program, behavior pattern matrix can be obtained in conjunction with subsequent treatment.

The LDA models of user behavior are coupled as the extension of LDA models, Conjoint Analysis can be carried out to behavioral agent.Knot Close Fig. 3, it is assumed that have K kind rating types V in rating categorical data_wPortion's program, then the Joint Distribution Φ ' of type-program is K × V_w Tie up matrix,Represent that K kind ratings type includes V_wThe probability of portion's program；Assuming that there is L kind behavior classes in interbehavior data Type V_lIndividual interbehavior, then the Joint Distribution Ψ ' of type-behavior is L × V_lTie up matrix,Represent in l kind behavior types V_lThe probability of individual interbehavior；

In the embodiment of the present invention, there is the interbehavior of a certain type for the program of a certain rating type, referred to as go For pattern；The behavior pattern multinomial distribution of user in m-th of documentIt is K × L dimension matrix, behavior pattern distributionIn The behavior pattern of z-th of positional representation is user to rating type z_mn1Program take z_mn2Plant interbehavior；Wherein,

z_mn2=(z_mnmod K)

Order： Then for each viewing-data n (including timestamp, programm name, behavior type in rating record), n ∈ { 1 ..., N_m}； Wherein, N_mFor viewing-data quantity in m-th of document,For the program multinomial distribution of kth kind rating type,For l kinds The interbehavior multinomial distribution of behavior type, α is behavior pattern multinomial distributionDirichlet Study firsts, β for section Mesh multinomial distributionDirichlet Study firsts, γ be interbehavior multinomial distributionDirichlet Study firsts； These multinomial distributions are made up of multiple elements, for example, program multinomial distributionK-th of element beTwo other many item number The implication of distributed constant is similar.

Calculate Di Li Crays distribution Θ：

In above formula,Represent to consider Study first and behavior time in Di Li Crays distribution pilot process is calculated Several probability.

The distribution of Di Li Crays Φ, Ψ can similarly be obtained：

Wherein,The behavior pattern, rating type, the quantity of behavior type for participating in calculating are represented respectively；Respectively program, interbehavior and behavior pattern set；Represent the set of behavior pattern in m-th of document, Φ For program profile multinomialIn all program probability of happening Di Li Crays distribution, Ψ be interbehavior multinomial distributionIn The Di Li Crays distribution of all interbehavior probability of happening, Θ is behavior pattern multinomial distributionIn all behavior patterns occur The Di Li Crays distribution of probability.

Wherein,Three set inner element w_mn、t_mn、z_mnNth bar rating number in m-th of document is represented respectively According to program, in m-th of document in the interbehavior of nth bar viewing-data, m-th of document nth bar viewing-data behavior mould Formula；Represent behavior pattern setMiddle removal z_mnSet afterwards；

The implication of above formula refers to Study first (i.e. α, β, γ) and program, behavior and the behavior being distributed according to existing three class The set of pattern is (i.e.) ask for the full probability that each behavior pattern occurs.The calculating of molecule and denominator on the right of formula It is respectively necessary for using Φ, Ψ, Θ in journey, that is,

In addition,Signal table it is as shown in table 1.

Table 1Signal table

The first row first row θ in upper table₁₁To last row of last column θ_MIThe probability of happening combination of all behavior patterns ConstituteNamely a behavior pattern matrix.The probability of happening of any one behavior pattern therein is designated as θ_mi, it is also final herein Want the result asked for.

Wherein,User is represented to V in kth kind rating type_wThe viewing number of times of portion's program,Distribution is represented in l Plant the V of behavior type_lThe number of times of interbehavior,Represent the frequency of i-th of behavior pattern in m-th of document；θ_mi、Respectively single behavior pattern, program, the probability of happening of interbehavior.In the embodiment of the present invention, by Di Li The expectation of Cray distribution is solved, and obtains θ_mi、It is therein, θ_miNamely represent different elements in behavior pattern matrix Probability of happening, be our output.

In the embodiment of the present invention, the implication explanation of parameters is as shown in table 2.

The cLDA Model Parameters of the fusion user behavior of table 2 illustrate table

3rd, the video resource Popularity prediction of user behavior is merged

The video resource Popularity prediction of fusion user behavior refers to, is predicted, will merged using neural network model The behavior pattern Input matrix models of user watched data and interbehavior, makes each behavior pattern matrix be mapped to one Popularity set, according to model prediction future video resource popularity.

In the embodiment of the present invention, realized using BP neural network to behavior pattern matrix and the resource popularity of statistics Nonlinear Mapping, input layer and output layer neuron number be relevant with input/output argument, the behavior pattern matrix section of input Purpose rating type has K kinds, is output as the video resource popularity in future, then input layer and output layer node number are respectively K With 1, node in hidden layer is set to S.

As shown in figure 4, by the program of different rating type in the resource popularity of statistics, and behavior pattern matrix Behavior vector read in, then by read in data be divided into training data and test data；Start to initialize BP neural network, training Method uses steepest descent method, and then batch mode training behavioral pattern data inputs sample by the way of batch is trained BP neural network, calculates the error of each sample, exemplary, and the error margin that can define popularity set is 0.1；Finally Judge whether convergence, if not restraining, weights are adjusted according to steepest descent method, until convergence, so as to obtain forecast model；It will delimit Test data input BP neural network, predicted using forecast model future video resource popularity.Finally, it will measure in advance To video resource popularity and the obtained video popularity of statistics apply respectively on Cloud Server, it is recognised that of the invention There is significant actively impact to performance boost.

Such scheme of the embodiment of the present invention, population analysis, effectively reduction are carried out to user data in area by program category Single film breaks out the influence of pop degree precision of prediction in short term；By coupling the LDA models of user behavior to rating content Conjoint Analysis is carried out with interbehavior, it is to avoid loss of learning caused by research single data, more accurately find that colony uses Family behavior pattern；It is popular according to the behavior pattern Matrix prediction video resource for having merged behavioral data using neural network model Degree, improves the Nonlinear Processing ability of prediction, so as to reduce predicated error；In addition, the method that the present invention is provided is applied to In the scheduling of resource of Cloud Server, request receptance can be effectively improved, Consumer's Experience is lifted.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can To be realized by software, the mode of necessary general hardware platform can also be added to realize by software.Understood based on such, The technical scheme of above-described embodiment can be embodied in the form of software product, the software product can be stored in one it is non-easily The property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in, including some instructions are to cause a computer to set Standby (can be personal computer, server, or network equipment etc.) performs the method described in each embodiment of the invention.

The foregoing is only a preferred embodiment of the present invention, but protection scope of the present invention be not limited thereto, Any one skilled in the art is in the technical scope of present disclosure, the change or replacement that can be readily occurred in, It should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Enclose and be defined.

Claims

1. a kind of video resource Popularity prediction method, it is characterised in that including：

The viewing-data of certain area in-group user is counted, the rating categorical data and interbehavior number of group of subscribers is obtained According to, and utilize the resource popularity of rating categorical data statistics to calculate；

Using the document subject matter generation model LDA for coupling user behavior, traversal rating categorical data and interbehavior data difference Generation correspondence Di Li Cray distributions, derive the full probability of each behavior pattern by chain rule and ask for the distribution of its Di Li Cray Expectation, obtain behavior pattern matrix；

With reference to neural network model, the resource popularity of institute's statistics and behavior pattern matrix are inputted as neutral net, By training generation forecast model, the video resource popularity in future is predicted according to the forecast model of training.

2. a kind of video resource Popularity prediction method according to claim 1, it is characterised in that the certain area of statistics The viewing-data of domain in-group user, obtaining the rating categorical data and interbehavior data of group of subscribers includes：

The viewing daily record of all group of subscribers in area is counted, by dividing period and program category, group of subscribers is obtained The rating categorical data and the data of interbehavior of different time sections in certain number of days；

Wherein, the division period refers to divide the time of one day, if being divided into 24 periods by one day, per for the moment Between section be a hour；

Interbehavior refers to broadcast state of the user in demand (telecommunication) service, has 10 kinds：Program collection, first startup are played, again Secondary startup broadcasting, F.F. state, fast-rewinding state, halted state, positioning playing, broadcasting fail, exit broadcasting and memory broadcasting.

3. a kind of video resource Popularity prediction method according to claim 1, it is characterised in that the utilization rating class The resource popularity of type data statistics to calculate includes：

By rating categorical data set C={ c_i| i=1,2 ..., K } represent, the element c in set C_iBy rating type number, c_iI class rating types are represented, accounting of its time being accessed by the user in total rating duration is p_i, then p_iIt is c_iIn set C Popularity, and P={ p_i| i=1,2 ..., K be set C resource popularity set；Wherein, K is the sum of rating type Amount.

4. a kind of video resource Popularity prediction method according to claim 1, it is characterised in that described to be used using coupling The document subject matter generation model LDA of family behavior, traversal rating categorical data and interbehavior data generate correspondence Di Like respectively Thunder is distributed, and derives the full probability of each behavior pattern by chain rule and asks for the expectation of its Di Li Cray distribution, OK Include for mode matrix：

Assuming that there is K kind rating types V in rating categorical data_wThere is L kind behavior types V in portion's program, interbehavior data_lIt is individual to hand over Mutual behavior；

To there is the interbehavior of a certain type in the program of a certain rating type, then referred to as behavior pattern；Used in m-th of document The behavior pattern multinomial distribution at familyIt is K × L dimension matrix, behavior pattern distributionIn z-th of positional representation behavior Pattern is user to rating type z_mn1Program take z_mn2Plant interbehavior；Wherein,

z_mn2=(z_mnmod K)

Using the LDA models for coupling user behavior, traversal rating categorical data and interbehavior data generate correspondence Di Li respectively Cray is distributed, and its process is as follows：

Order：Then For each viewing-data n, n ∈ { 1 ..., N_m}；Wherein, N_mFor viewing-data quantity in m-th of document,Received for kth kind Depending on the program multinomial distribution of type,For the interbehavior multinomial distribution of l kind behavior types, α is that behavior pattern is multinomial Formula is distributedDirichlet Study firsts, β be program multinomial distributionDirichlet Study firsts, γ be interaction row For multinomial distributionDirichlet Study firsts；

Calculate the distribution of Di Li Crays Θ, Φ, Ψ：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&theta;</mi> <mo>&RightArrow;</mo> </mover> <mi>m</mi> </msub> <mo>|</mo> <msub> <mover> <mi>z</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>m</mi> <mo>,</mo> </mrow> </msub> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>D</mi> <mi>i</mi> <mi>r</mi> <mi>i</mi> <mi>c</mi> <mi>h</mi> <mi>l</mi> <mi>e</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&theta;</mi> <mo>&RightArrow;</mo> </mover> <mi>m</mi> </msub> <mo>|</mo> <msub> <mover> <mi>n</mi> <mo>&RightArrow;</mo> </mover> <mi>m</mi> </msub> <mo>+</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow>

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mi>k</mi> </msub> <mo>|</mo> <mover> <mi>z</mi> <mo>&RightArrow;</mo> </mover> <mo>,</mo> <mover> <mi>w</mi> <mo>&RightArrow;</mo> </mover> <mo>,</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>D</mi> <mi>i</mi> <mi>r</mi> <mi>i</mi> <mi>c</mi> <mi>h</mi> <mi>l</mi> <mi>e</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mi>k</mi> </msub> <mo>|</mo> <msub> <mover> <mi>n</mi> <mo>&RightArrow;</mo> </mover> <mi>k</mi> </msub> <mo>+</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow>

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&psi;</mi> <mo>&RightArrow;</mo> </mover> <mi>l</mi> </msub> <mo>|</mo> <mover> <mi>z</mi> <mo>&RightArrow;</mo> </mover> <mo>,</mo> <mover> <mi>t</mi> <mo>&RightArrow;</mo> </mover> <mo>,</mo> <mi>&gamma;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>D</mi> <mi>i</mi> <mi>r</mi> <mi>i</mi> <mi>c</mi> <mi>h</mi> <mi>l</mi> <mi>e</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&psi;</mi> <mo>&RightArrow;</mo> </mover> <mi>l</mi> </msub> <mo>|</mo> <msub> <mover> <mi>n</mi> <mo>&RightArrow;</mo> </mover> <mi>l</mi> </msub> <mo>+</mo> <mi>&gamma;</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow>

Wherein,Respectively program, interbehavior and behavior pattern set, these three set inner elements w_mn、t_mn、 z_mnRepresent respectively the program of the n-th viewing-data in m-th of document, the interbehavior of the n-th viewing-data in m-th of document, m-th The behavior pattern of n-th viewing-data in document；Represent behavior pattern setMiddle removal z_mnSet afterwards；

The full probability for being based ultimately upon Di Li Crays distribution Θ, Φ, Ψ and each behavior pattern solves the phase that Di Li Crays are distributed Hope, be derived by equation below：

<mrow> <msub> <mi>&theta;</mi> <mrow> <mi>m</mi> <mi>i</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>n</mi> <mi>m</mi> <mi>i</mi> </msubsup> <mo>+</mo> <mi>&alpha;</mi> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>K</mi> <mo>&times;</mo> <mi>L</mi> </mrow> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mrow> <mi>m</mi> <mo>+</mo> <mi>&alpha;</mi> </mrow> <mi>i</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>

<mrow> <msub> <mi>&phi;</mi> <mrow> <msub> <mi>kV</mi> <mi>w</mi> </msub> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>n</mi> <mi>k</mi> <msub> <mi>V</mi> <mi>w</mi> </msub> </msubsup> <mo>+</mo> <mi>&beta;</mi> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <msub> <mi>V</mi> <mi>w</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>V</mi> <mi>w</mi> </msub> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mi>k</mi> <msub> <mi>V</mi> <mi>w</mi> </msub> </msubsup> <mo>+</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>

<mrow> <msub> <mi>&phi;</mi> <mrow> <msub> <mi>lV</mi> <mi>l</mi> </msub> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>n</mi> <mi>l</mi> <msub> <mi>V</mi> <mi>l</mi> </msub> </msubsup> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <msub> <mi>V</mi> <mi>l</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>V</mi> <mi>l</mi> </msub> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mi>l</mi> <msub> <mi>V</mi> <mi>l</mi> </msub> </msubsup> <mo>+</mo> <mi>&gamma;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>

Wherein,User is represented to V in kth kind rating type_wThe viewing number of times of portion's program,Distribution is represented in l kind rows For the V of type_lThe number of times of interbehavior,Represent the frequency of i-th of behavior pattern in m-th of document；θ_mi、Respectively single behavior pattern, program, the probability of happening of interbehavior.

5. a kind of video resource Popularity prediction method according to claim 1, it is characterised in that the combination nerve net Network model, the resource popularity of institute's statistics and behavior pattern matrix are inputted as neutral net, pre- by training generation Model is surveyed, the video resource popularity in future is predicted according to the forecast model of training to be included：

The Nonlinear Mapping to behavior pattern matrix and the resource popularity of statistics, input layer are realized using BP neural network Relevant with input/output argument with output layer neuron number, the rating type of the behavior pattern matrix program of input has K kinds, defeated Go out for following video resource popularity, then input layer and output layer node number are respectively K and 1, and node in hidden layer is set to S It is individual；

The behavior vector of the program of different rating types in the resource popularity of statistics, and behavior pattern matrix is read Enter, then training data and test data are divided into by data are read in；

Start to initialize BP neural network, training method uses steepest descent method, and batch mode trains behavioral pattern data, then Sample is inputted into BP neural network by the way of batch is trained, the error of each sample is calculated；Finally judge whether convergence, if Do not restrain, weights are adjusted according to steepest descent method, until convergence, so as to obtain forecast model；

The test data of delimitation is inputted into BP neural network, the video resource popularity in future is predicted using forecast model.