CN105913159A

CN105913159A - Social network event based user's influence prediction method

Info

Publication number: CN105913159A
Application number: CN201610279983.6A
Authority: CN
Inventors: 程祥; 苏森; 李晓; 杨健宇; 双锴
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2016-08-31

Abstract

The invention provides a social network event based user's influence prediction method, which is performed through the following steps: building a user's influence matrix S through the influences of M users in a social network on N events; building a user's correlation matrix U on the characteristics information of a user; building an event correlation matrix E on the characteristics information of an event; and building a user's influence matrix S, a user's correlation matrix U and an event correlation matrix E based on the user's influence. According to the invention, the correlation of an event and the correlation of a user are integrated into a matrix decomposition prediction model and a new prediction model MF-EUN is put forward to predict the influence of a user based on a social network event, which raises the accuracy of prediction results. Further, with the method, the influence of a user on a social network event can be predicted in a comprehensive manner.

Description

Customer impact force prediction method based on social networks event

Technical field

The present invention relates to data mining technology, particularly relate to a kind of user force prediction side based on social networks event Method, belongs to information science technology field.

Background technology

Along with the fast development of Internet technology, emerged in large numbers substantial amounts of social networks both at home and abroad, as face book (Facebook), Push away spy (Twitter), wechat, microblogging etc..Increasing user selects to be delivered daily record by this kind of social networks, uploaded photograph Sheet, participate in all kinds of Above-the-line etc..By the interaction on social networks, user is possible not only to keep in touch with good friend, and May recognize that more friend, expand social networks.Nowadays, on simple line, exchange and interdynamic cannot meet the need of user Asking, arise at the historic moment based on movable social networks, such as Meetup, Plancast, Google+Events, bean cotyledon are with city etc..This A little application and service in addition to exchange and interdynamic on the line meeting user, additionally provide one issue to user at line platform, tissue, Management and participation doings.

Social influence power shows as the behavior of user and thought is affected the phenomenon changed by other people.Social influence power is divided Analysis is widely used in multiple fields, the existing substantial amounts of achievement of the research of user influence in social network.But, based on thing The social networks of part has the characteristic of its uniqueness, as event has positional information, organizer etc. so that the shadow in tradition social networks Ring power analysis or Forecasting Methodology may be poorly suitable for social networks based on event, it was predicted that result is undesirable, inaccurate.Cause This, need to excavate for user influence in social network Forecasting Methodology based on event, make full use of social networks event Characteristic improves the accuracy of user force prediction.

Summary of the invention

The embodiment of the present invention provide a kind of customer impact force prediction method based on social networks event, can improve for The accuracy of user influence in social network based on event prediction.

The customer impact force prediction method based on social networks event that the embodiment of the present invention provides, including:

User force matrix S is set up, in user force matrix S according to M user influence power in N number of event Element s_ueRepresent that user u affects the ratio of friend in event e, wherein, 1≤u≤M and be integer, 1≤e≤N and be integer, Described M is the integer more than 1, and described N is the integer more than 1；

Characteristic information according to M user sets up user's correlation matrix U, the element u in user's correlation matrix U_uu′Table Show the degree of correlation between user u and user u ', wherein, 1≤u '≤M and be integer；

Characteristic information according to N number of event sets up event correlation matrix E, the element e in event correlation matrix E_ee' table Show the degree of correlation between event e and event e ', wherein, 1≤e '≤N and be integer；

According to user force matrix S, user's correlation matrix U and event correlation matrix E, determine user characteristics vector Matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and event degree of correlation factor of influence matrix Z, its In, P and Q be respectively the eigenvectors matrix of user that described user force matrix S carried out to be obtained after matrix decomposition and The eigenvectors matrix of event, W and Z is respectively user's degree of correlation and the event degree of correlation to the customer impact in social networks event The influence value of power；

According to user characteristics vector matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and event Degree of correlation factor of influence matrix Z, determines the user force in social networks event.

Based on above-mentioned, the customer impact force prediction method based on social networks event that the embodiment of the present invention provides, passes through In social networks, M user influence power in N number of event sets up user force matrix S, is built by the characteristic information of user Vertical user's correlation matrix U, sets up event correlation matrix E by the characteristic information of event, then according to customer impact moment Battle array S, user's correlation matrix U and event correlation matrix E, be fused to matrix decomposition by event correlation and End-user relevance pre- Survey in model, obtain more accurate user characteristics vector matrix P and affair character vector matrix Q, and user's degree of correlation shadow Ring factor matrix W and event degree of correlation factor of influence matrix Z, and then can according to user characteristics vector matrix, affair character to Moment matrix, user's degree of correlation factor of influence matrix and event degree of correlation factor of influence matrix Z, show that accurate user force is pre- Survey result, and the Forecasting Methodology provided by the embodiment of the present invention can predict user in social networks event than more comprehensive Influence power.

Accompanying drawing explanation

In order to be illustrated more clearly that the present invention or technical scheme of the prior art, below will be to embodiment or prior art In description, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is the one of the present invention A little embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to these Accompanying drawing obtains other accompanying drawing.

The flow chart of the customer impact force prediction method that Fig. 1 provides for one embodiment of the invention；

The MF-EUN forecast model block schematic illustration that Fig. 2 provides for one embodiment of the invention；

Fig. 3 is one to randomly select user's influence power distribution schematic diagram in zones of different；

Fig. 4 is a distance probability distribution schematic diagram randomly selecting between all events that user participates in；

The AI-UN method that Fig. 5 provides for the embodiment of the present invention and other neighbour find the Performance comparision schematic diagram of method；

The AI-EN method that Fig. 6 provides for the embodiment of the present invention and other neighbour find the Performance comparision schematic diagram of method.

Detailed description of the invention

For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under not paying creative work premise, broadly falls into the scope of protection of the invention.

User force can apply to the Information Communication in social networks, information recommendation, commodity or service popularization, advertisement In the scenes such as input, by selecting the user that influence power is bigger to promote crowd as first-selection, by their celebrity's appeal, it is possible to Information, commodity or service are promoted to more people.Therefore, identify and utilize the user that influence power is bigger, for promoting network Safety and network economy great significance.

Same, the user force based on social networks event that the embodiment of the present invention provides is predicted for social networks Movable or the popularization of event, publicity have great role.The executive agent of the embodiment of the present invention can be that corresponding offer is put down online Application or the webserver of service of doings is issued, organizes, manages and participated in platform to user.

In server, the collection of user is combined into { u₁,u₂,...,u_M, the collection of event is combined into { e₁,e₂,...,e_N}.M represents all The total quantity of user, M is the integer more than 1.N represents the total quantity of all events, and N is the integer more than 1.First, Wo Menke With the influence power according to the record acquisition of information event aspect user in server, and set up user force matrix S with this.

Concrete, user's influence power in event can be obtained according to the ratio of the friend that user u affects in event e Sue, i.e.Wherein, w_ueThe friend's quantity affected in event e for user u, F (u) is friend's collection of user u Close, friend's quantity that | F (u) | is user u.

Owing to having M user and N number of event, the user set up according to each user influence power in each event Influence power matrix S is the matrix on M × N rank.Element s in user force matrix S_ueRepresent that user u affects in event e The ratio of friend, 1≤u≤M and be integer, 1≤e≤N and be integer.

It should be noted that due in server the quantity of user a lot, the quantity of event is also a lot, and each user is not May can affect friend in each event, user's influence power in a lot of events is non-existent.Accordingly, user It is unknown for having a lot of element in influence power matrix S, say, that user force matrix S is a sparse matrix.Due to Family influence power matrix S is a sparse matrix, and in S, the element value of the overwhelming majority is missing from.Following embodiment in the present invention In, how to utilize known influence power data in user force matrix S to predict the influence power data of the unknown by introducing.

Owing to social networks based on event has the characteristic of its uniqueness, as event has positional information, organizer etc., use Family has topic influence, regional influence etc..Accordingly, each user u and each event e can correspond respectively to a spy Levy vector P_uAnd Q_e, P_uIn element reflect the degree of correlation of user and individual features, Q_eIn element reflect event and phase Answer the degree of correlation of feature.User u influence power in event e just can be predicted by their inner product of characteristic vector.Institute There is characteristic vector P of user_uCharacteristic vector Q with all events_eUser characteristics vector matrix P and affair character are separately constituted Vector matrix Q.Matrix P and Q features the feature of user and event respectively, and the dimension of matrix P and Q can be specified, and dimension is the highest The user portrayed and the feature of event are the most, and the precision of calculating the most also can improve.The prediction that inner product according to P and Q obtains The element value of disappearance in user force matrix S can be supplemented by result, and then the customer impact after can being supplemented Moment battle array S '.Obviously S '=P^TElement in Q, S 'And then can be according to the user force matrix after supplementing S ' compares the prediction of comprehensive user force.

Matrix decomposition (Matrix Factorization is called for short MF) algorithm is higher with computational accuracy, and extensibility is preferable, And the relatively low advantage of computation complexity is widely used in forecast model.The basic thought of MF algorithm is, utilizes two dimensions The product of relatively low matrix P and Q approaches the user force matrix S that oneself knows.

In the training learning process of MF forecast model, the element in first random initializtion matrix P and Q, then continuous edge The direction that gradient is contrary updates the element in Iterative Matrix P and Q, until P and Q restrains.

Eigenvectors matrix P and Q of user and event is obtained, according to S '=P according to the training study of MF forecast model^TQ obtains Arrive the user force matrix S ' after supplementing.But, owing to the predicated error of MF forecast model is the biggest, it is thus possible to still All users cannot be predicted for the influence power of all events, and the degree of accuracy predicted the outcome is the highest.

In embodiments of the present invention, can be according to the characteristic information of event (such as event content, event location and event organization Person) find the correlation between event, and characteristic information (social information, the user's shadow on topic such as user of user Ring power, the user influence power on region and user's influence power on organizer) find the correlation between user, then will Event correlation and End-user relevance are fused in MF forecast model, by having merged the MF of event correlation and End-user relevance Forecast model (Matrix Factorization with Event and User Neighborhood is called for short MF-EUN) enters User force prediction in row social networks event, improves the accuracy predicted the outcome.

The flow chart of a kind of customer impact force prediction method that Fig. 1 provides for the embodiment of the present invention, as it is shown in figure 1, this reality The customer impact force prediction method based on social networks event that executing example provides includes:

S11, sets up user force matrix S according to M user influence power in N number of event；

Exemplary, can according to the influence power of the record acquisition of information event aspect user in the webserver, and with This sets up user force matrix S.Concrete, user can be obtained according to the ratio of the friend that user u affects in event e and exist Influence power s in event_ue。

S12, sets up user's correlation matrix U according to the characteristic information of M user；

Exemplary, interest similar between user and hobby can be found according to the social network relationships between user, And using the social network relationships of user as the characteristic information of user, set up user's correlation matrix U.

On the one hand, optionally, can be according to the social information existed between user (such as friend relation or old boy network etc. Deng) set up user's correlation matrix U.

On the other hand, optionally, it is also possible to according to the social network relationships between user force matrix S structuring user's. Such as, according to correlation technique, being correlated with between user can be calculated by the mode such as cosine similarity and Pearson correlation coefficients Degree.

S13, sets up event correlation matrix E according to the characteristic information of N number of event；

Exemplary, can according to the content of event, hold the characteristic information such as position, organizer and find the phase between event Guan Xing.It is understood that can also be according to the degree of correlation between user force matrix S calculating event, such as according to relevant Technology, by the degree of correlation between the mode calculating event such as cosine similarity and Pearson correlation coefficients.

During it should be noted that there is social information between user, there is not the relation in sequential in S11 and S12；When with Social information between family not in the presence of, need to first carry out S11, set up user force matrix S, perform S12 the most again, according to Social network relationships between user force matrix S structuring user's, sets up correlation matrix U of user, such as, root with this According to existing user influence power data in event in user force matrix S, calculate the Pearson came phase relation between user Number obtains the degree of correlation between user.Same, exist between event relevant characteristic information (content is relevant, position is relevant, Organizer is relevant) time, the most there is not the relation in sequential in S11 and S13；Relevant characteristic information is there is not between event Time, need to first carry out S11, set up user force matrix S, perform S13 the most again, construct thing according to user force matrix S Correlation between part, sets up correlation matrix E of event with this, such as, according to existing use in user force matrix S Family influence power data in event, the degree of correlation between Pearson correlation coefficient acquisition event between calculating event.Meanwhile, It is understood that there is not the relation in sequential in S12 and S13.

S14, according to user force matrix S, user's correlation matrix U and event correlation matrix E, determines user characteristics Vector matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and event degree of correlation factor of influence matrix Z；

Wherein, P and Q is respectively the feature that described user force matrix S carries out the user obtained after matrix decomposition Vector matrix and the eigenvectors matrix of event, W and Z is respectively user's degree of correlation and the event degree of correlation in social networks event The influence value of user force.

S15, according to user characteristics vector matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and Event degree of correlation factor of influence matrix Z, determines the user force in social networks event.

As it has been described above, event correlation and End-user relevance are fused in MF forecast model by the embodiment of the present invention, propose New forecast model MF-EUN carries out user force based on social networks event prediction.

Concrete, MF-EUN forecast model has merged End-user relevance factor of influence matrix on the basis of MF forecast model W and event correlation factor of influence matrix Z.Can be predicted by following formula:

S_{u e}^{''} = P_{u}^{T} Q_{e} + W_{u} + Z_{e}

Wherein,Predict the outcome for based on matrix decomposition, W_uFor merge End-user relevance factor of influence,Z_eFor merge event correlation factor of influence,

N^t(e, u) represents t the user more than preset value of the correlation in user's correlation matrix U and between user u, Below by N^t(e u) is referred to as neighbour's set of user u.Obviously, N^t(e u) can determine according to user's correlation matrix U.

N^k(u, e) represents k the event more than preset value of the correlation in event correlation matrix E and between event e, Below by N^k(u e) is referred to as neighbour's set of event e.Obviously, N^k(u e) can determine according to event correlation matrix E.

For user u_iTo with The weighing factor of family u,For event e_jWeighing factor to event e.

It is to say, the formula of MF-EUN forecast model is:

The parameter of visible MF-EUN forecast model includes P_u、Q_e、And

First, the object function of definition MF-EUN forecast model is:

Wherein,For preventing over-fitting just Then change item.

To object functionSeek local derviation:

\frac{\partial L}{\partial P_{u}} = - 2 (S_{u e} - S_{u e}^{''}) \cdot Q_{e} + 2 λ \cdot P_{u}

\frac{\partial L}{\partial Q_{e}} = - 2 (S_{u e} - S_{u e}^{''}) \cdot P_{u} + 2 λ \cdot Q_{e}

\frac{\partial L}{\partial μ_{{ee}_{j}}} = - 2 (s_{u e} - s_{u e}^{''}) \cdot {| N^{k} (u, e) |}^{- \frac{1}{2}} \underset{e_{j} &Element; N^{k} (u, e)}{Σ} μ_{{ee}_{j}} (s_{{ue}_{j}} - \overset{&OverBar;}{s_{u}}) + 2 {λμ}_{{ee}_{j}}

It follows that we use stochastic gradient descent (Stochastic gradient descent is called for short SGD) method It is optimized study and obtains parameter P of optimum_u、Q_e、And

P_{u} = P_{u} - η \cdot \frac{\partial L}{\partial P_{u}}

Q_{e} = Q_{e} - η \cdot \frac{\partial L}{\partial Q_{e}}

μ_{{ee}_{j}} = μ_{{ee}_{j}} - η \cdot \frac{\partial L}{\partial μ_{{ee}_{j}}}

Use in MF-EUN model training learning process, first random initializtion P_u、Q_e、WithIn element, Then constantly iteration P is updated along the direction that gradient is contrary_u、Q_e、WithIn element, until P_u、Q_e、WithConvergence, wherein η is learning efficiency.Finally can according to M user and the characteristic vector of N number of event and factor of influence weighted value To have obtained user characteristics vector matrix P, affair character vector matrix Q of optimization that optimize, and the user's degree of correlation optimized The event degree of correlation factor of influence matrix Z of factor of influence matrix W and optimization, further according to MF-EUN model prediction formulaCalculate the value of user force.

The customer impact force prediction method based on social networks event that the present embodiment provides, by M in social networks User's influence power in N number of event sets up user force matrix S, sets up user's degree of correlation square by the characteristic information of user Battle array U, sets up event correlation matrix E by the characteristic information of event, then according to user force matrix S, user's degree of correlation Matrix U and event correlation matrix E, be fused to event correlation and End-user relevance in MF forecast model, proposes new pre- Survey model M F-EUN and carry out user force based on social networks event prediction, improve the accuracy predicted the outcome.It addition, The customer impact force prediction method provided by the embodiment of the present invention can be than user in more comprehensively prediction social networks event Influence power.

In the above-described embodiments, owing to user force matrix S is a sparse matrix, and in matrix S between two row or The element overlapped between person two row is fewer, and with classical method for measuring similarity, (such as, cosine similarity is relevant with Pearson Coefficient) it is difficult to find reliable neighbour.Therefore, further, also proposed in another embodiment of the invention a kind of based on The neighbour of characteristic information finds method, is used for determining user's correlation matrix U and event correlation matrix E.

The MF-EUN forecast model block schematic illustration that Fig. 2 provides for one embodiment of the invention, as in figure 2 it is shown, this model bag Include three parts:

Part I, social influence moment battle array builds, specifically can be by building user's shadow in above-mentioned embodiment illustrated in fig. 1 The method ringing moment battle array S builds social influence moment battle array；

Part II, neighbour based on extraneous information finds method, utilizes the characteristic of social networks based on event, proposes User neighbour finds that method and event neighbour find method；

Part III, it was predicted that model M F-EUN, is dissolved into user neighbour and event neighbour in MF forecast model, specifically , the principle of MF-EUN forecast model is identical with embodiment illustrated in fig. 1 with prediction process, and here is omitted.

In the present embodiment, will be described in detail the discovery method of user neighbour and event neighbour.

In neighbour's discovery method based on extraneous information, we consider use unique in social networks based on event Influence power on region of family characteristic information user influence power on topic, user and user's impact on organizer Power, and affair character message event content, event location and event organiser.

First aspect, user neighbour finds method.

U (u, u ') is made to represent the degree of correlation between user u and user u', U_t(u,u′)、U_r(u, u ') and U_o(u, u ') difference Represent two users influence power similarity on topic, the influence power similarity on region and the impact on organizer Power similarity.Based on this, we have proposed similarity calculating method based on linear fusion, it may be assumed that U (u, u ')=β₁U_t(u,u′)+ β₂U_r(u,u′)+β₃U_o(u, u '), wherein β₁, β₂And β₃It is respectively user's influence power similarity on topic, user in region On influence power similarity and the weight of user's influence power similarity on organizer.Finally, by calculating any two use Similarity between family, sets up user's correlation matrix U.

For any one user u, the correlation in user's correlation matrix U and between user u can be found out and be more than T user of preset value gathers N as the neighbour of user u^t(e,u)。

Below the most respectively to influence power similarity on region of user's influence power similarity on topic, user and use The determination method of family influence power similarity on organizer is illustrative.

1) user's influence power similarity on topic；

According to correlation technique, user's influence power on different topics is different, and therefore, we are at topic aspect degree Similarity between amount user.Exemplary, it is possible to use document subject matter generates model (Latent Dirichlet Allocation, be called for short LDA) obtain all events topic distribution.

Make st_uRepresenting user u influence power on topic, we calculate st by following formula_u:

{st}_{u} = \frac{\underset{e_{i} &Element; {HE}_{u}}{Σ} s_{{ue}_{i}} \cdot θ_{e_{i}}}{| {HE}_{u} |}

Wherein For event e_iTopic distribution, HE_uRepresent the set of all events that user u participated in, | HE_u| represent the quantity of the event that user u participated in the past.

Then, KL-JS divergence is utilized can to calculate any two user influence power similarity on topic: U_t(u,u′) =1-D_JS(st_u,st_u′), wherein, D_JS(st_u,st_u') it is st_uAnd st_u′Between KL-JS divergence,

It should be noted that in probability theory and statistics, JS (Jensen Shannon) divergence is used to measure probability A kind of method of distribution distance (similarity degree), KL divergence (Kullback Leibler divergence) is to describe two generally A kind of method of rate distribution P and Q difference.Wherein,

2) user's influence power similarity on region；

Finding with the analysis of city data set according to bean cotyledon, each user influence power on different regions is different , Fig. 3 is one to randomly select user's influence power distribution schematic diagram in zones of different.Based on this, we can be in regional level The influence power of measure user.

First, according to the position of the event that user participates in the past, we define user u influence power on region is user The mean value of the influence power of the event that u participates on the area:

s_{{uR}_{m}} = \frac{\underset{e_{i} &Element; {HE}_{u} \cap e_{i} &Element; E_{R_{m}}}{Σ} s_{{ue}_{i}}}{n_{u} (R_{m})}

Wherein,Represent user u at region R_mOn influence power,HE_uRepresent that user u participated in is all The set of event,Represent at region R_mOn the event sets held, n_u(R_m) represent user u at region R_mOn the thing participated in Number of packages amount.

Then, sr is made_uRepresent the vector of user u influence power on all regions, then Wherein f is region sum.

Finally, according to any two user influence power on all regions, cosine similarity is utilized Calculate this any two user influence power similarity on region.

3) user's influence power similarity on organizer；

Being similar to, user's influence power on different tissues person is also different.Therefore, we can also be at organizer's layer The influence power of face measure user.

First, according to the organizer of the event that user participates in the past, we define user's influence power on organizer and are The mean value of the influence power of the event of this organizer tissue that this user participates in:

s_{{uO}_{j}} = \frac{\underset{e_{i} &Element; {HE}_{u} \cap e_{i} &Element; E_{O j}}{Σ} s_{{ue}_{i}}}{n_{u} (O_{j})}

Wherein,Represent user u at organizer O_jOn influence power,HE_uRepresent the institute that user u participated in There is the set of event,Represent organizer O_jThe event sets organized, n_u(O_j) represent user and participated in organizer O_jTissue The total number of events amount crossed.

Then, so is made_uRepresent the vector of user u influence power on all organizers, then Wherein l is organizer's sum.

Finally, according to any two user influence power on all organizers, cosine similarity is utilized Calculate the influence power similarity on organizer between this any two user.

Second aspect, event neighbour finds method.

The degree of correlation making E (e, e') represent between event e and event e', makes E_c(e,e')、E_l(e, e') and E_o(e, e') point Do not represent the content similarity of two events, location similarity and organizer's similarity.Based on this, similarly, by based on The similarity calculating method of linear fusion, it may be assumed that E (e, e')=α₁E_c(e,e')+α₂E_l(e,e')+α₃E_o(e, e') calculates event e And the degree of correlation between event e', wherein, α₁, α₂And α₃It is respectively event content similarity, event location similarity and event group The weight of the person's of knitting similarity.Finally, by calculating the similarity between any two event, event correlation matrix E is set up.

For any one event e, the correlation in event correlation matrix E and between event e can be found out and be more than K event of preset value gathers N as the neighbour of event e^k(u,e)。

Respectively the determination method of event content similarity, event location similarity and event organiser's similarity is entered below Row exemplary illustration.

1) event content similarity；

Obtain the topic distribution of all events first with classical topic model LDA, the distribution of this topic represents The kind of event, then utilizes KL-JS divergence to calculate the content similarity between any two event.

Make θ_eAnd θ_e′The topic distribution of event e of being respectively and event e', utilizes KL-JS divergence can calculate any two thing The content similarity of part: E_c(e, e ')=1-D_JS(θ_e,θ_e′), wherein, D_JS(θ_e,θ_e′) it is θ_eAnd θ_e′Between KL-JS divergence,

2) event location similarity；

Carried out data analysis according to bean cotyledon with city data set, and calculate between all events that user participates in away from From, finding that power-law distribution obeyed by the probability density distribution of these distances, Fig. 4 is between all events randomly selecting user's participation Distance probability distribution schematic diagram.It is to say, the distance between the social networks event of user's participation is smaller. If it is therefore believed that the position of two events is the nearest, then the similarity between the two event is the highest.

Therefore, the location similarity that we can utilize gause's rule to define two events is: Wherein, l_eAnd l_e′Be respectively event e and event e' holds position, dis (l_e,l_e′) it is l_eAnd l_e′Between distance.

3) event organiser's similarity

In social networks based on event, each event has an organizer.Whether user participates in an event also Affected by event organiser.Meanwhile, an organizer may organize multiple events.Therefore, we define two events at tissue Similarity on person is:Wherein, O (e) and O (e ') is respectively event e and event e' Organizer.

In order to the advantage of customer impact force prediction method of based on social networks event that the present invention provide is better described, In another embodiment of the present invention, we use widely used module root-mean-square error (RootMean Square Error, is called for short RMSE) and mean absolute error (Mean Absolute Error, abbreviation MAE) schematically illustrate.Its In, the computational methods of two modules of RMSE and MAE are as follows:

R M S E = \sqrt{\frac{1}{| S |} \underset{(u, e) &Element; S}{Σ} {(s_{u e} - s_{u e}^{''})}^{2}}

M A E = \frac{1}{| S |} \underset{(u, e) &Element; S}{Σ} | s_{u e} - s_{u e}^{''} |

Concrete, we use the True Data collection crawled with city at bean cotyledon to carry out experimental verification.Event in data set Participate in record during 2013/02/01 to 2014/10/31.We delete and participate in the event number user less than 5 (greatly Account for the 5% of total number of users) and the event participant's quantity event (constituting about the 3% of total event number) less than 8.Finally, We have 11123 users, 29342 events and 356052 customer incidents pair.The influence power matrix S that whole data set is constituted Degree of rarefication be 99.9%.

Significantly, since bean cotyledon arranges sequentially in time with the participant of event on city, therefore, if User u_fThe time clicking on " I to participate in " is later than user u, it is believed that user u_fParticipation event is affected by user u.

In an experiment, 11123 users are randomly divided into different size of data set by random, including 1000 users' Data set, the data set of 5000 users and the data set of 11123 users.Further, we randomly choose 50% respectively, 70% He The given data of 90% is as training dataset, and remaining element is tested as test data set.

We regulate parameter involved in model to optimal value by experiment effect.Below by analyzing experimental data The performance of MF-EUN forecast model is described.

First we compare user neighbour based on extraneous information proposed by the invention and find method (Additional Information User Neighborhood, be called for short AI-UN) and event neighbour based on extraneous information find method (Additional Information Event Neighborhood is called for short AI-EN) finds the prediction of method with other neighbours Performance.

For the AI-UN that the embodiment of the present invention provides, Fig. 5 finds that method and other neighbour find the Performance comparision signal of method Figure.As it is shown in figure 5, other neighbour finds that method includes: user neighbour based on topic finds method (Topic-User Neighborhood, be called for short T-UN), user neighbour based on region find method (Region-User Neighborhood, letter Claim R-UN), user neighbour based on organizer find method (Organizer-User Neighborhood, be called for short O-UN), base User neighbour in topic and region find method (Topic-Region User Neighborhood, be called for short TR-UN), based on The user neighbour of topic and organizer find method (Topic-Organizer User Neighborhood, be called for short TO-UN), User neighbour based on region and organizer find method (Organizer-Region Neighborhood, be called for short RO-UN), User neighbour based on Pearson's similarity finds method (Pearson-User Neighborhood is called for short P-UN)

The AI-EN method that Fig. 6 provides for the embodiment of the present invention and other neighbour find the Performance comparision schematic diagram of method.As Shown in Fig. 6, other neighbour finds that method includes: event near neighbor method (Content-Event based on event content Neighborhood, be called for short C-EN), event near neighbor method (Location-Event based on event location Neighborhood, be called for short L-EN), event near neighbor method (Organizer-Event based on event organiser Neighborhood, be called for short O-EN), event near neighbor method (Content-Location-based on event content and position Event Neighborhood, be called for short CL-EN), event near neighbor method (Content-based on event content and organizer Organizer-Event Neighborhood, be called for short CO-EN), event near neighbor method LO-based on event location and organizer EN (Location-Organizer-Event Neighborhood is called for short) and event neighbour based on Pearson's similarity Method (Pearson-Event Neighborhood is called for short P-EN).

From figs. 5 and 6, it can be seen that AI-UN and the AI-EN method that the embodiment of the present invention proposes is substantially better than the near of other Neighbor discovery method.Owing to other neighbour finds method (T-UN, R-UN, O-UN, TR-UN, TO-UN, RO-UN and C-EN, L- EN, O-EN, CL-EN, CO-EN, LO-EN), only consider the feature that one or two embodiment of the present invention are put forward, so prediction Accuracy than simultaneously merge three kinds of features neighbour find that the degree of accuracy of method is low.Additionally, due to the embodiment of the present invention proposes Neighbour based on extraneous information find that method (AI-UN and AI-EN) considers unique characteristic of social networks based on event, The neighbour making us finds that method is higher than the prediction accuracy of traditional near neighbor method (P-EN and P-UN) method.

Then, we compare the performance under different size of data set of the method involved by above-described embodiment.Table 1 is base Carry out the result of experimental verification in bean cotyledon with the True Data collection that city crawls, refer to shown in table 1.

From the experiment results of table 1 it can be seen that MF-EUN is in the case of all data sets and training set difference Effect is all best.Owing to MF-EUN nearly neighbor discovery method has been dissolved in matrix decomposition, played neighbour's discovery simultaneously Method and the advantage of matrix decomposition, its prediction effect is better than simple neighbour and finds method and matrix disassembling method.Meanwhile, MF- Event neighbour and user neighbour have been dissolved in matrix decomposition by EUN, and event neighbour or user neighbour are only dissolved into by its ratio The prediction accuracy of MF-EN and MF-UN in matrix decomposition is high.

Furthermore, it is necessary to explanation, at correlation technique document (P.Cui, F.Wang, S.Liu, M.Ou, S.Yang, and L.Sun,“Who should share what？:item-level social influence prediction for Users and posts ranking, " in SIGIR, 2011, pp.185 194.) in have studied data entries aspect (item- Level) influence power, i.e. thinks that same user influence power on different data entries is different.This document propose one Plant HF-NMF (Hybrid Factor Non-Negative Matrix Factorization) method and predict that user is good to it The influence power of friend, and utilize Projected matrix factorization method to be solved.Although HF-NMF by user in microblogging and The feature of microblogging entry is dissolved in Non-negative Matrix Factorization, and its prediction effect is better than simple matrix disassembling method (MF).But, Find the advantage of method owing to the MF-EUN method in the embodiment of the present invention has played matrix decomposition and neighbour simultaneously, i.e. matrix divides Solution considers the global information of influence power matrix S, and Neighborhood Model (neighbour's set) considers the neighbor information of user and event, Our MF-EUN is higher than the prediction accuracy of HF-NMF method.

Table 1 is the result carrying out experimental verification based on bean cotyledon with the True Data collection that city crawls

It addition, it is noted that different size of test set in contrast and experiment, it appeared that test set is the biggest, Prediction accuracy is the highest, say, that in our forecast model, and the degree of rarefication of matrix is the least, and the effect of algorithm is the best.

One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each method embodiment can be led to The hardware crossing programmed instruction relevant completes.Aforesaid program can be stored in a computer read/write memory medium.This journey Sequence upon execution, performs to include the step of above-mentioned each method embodiment；And aforesaid storage medium includes: ROM, RAM, magnetic disc or The various media that can store program code such as person's CD.

Last it is noted that various embodiments above is only in order to illustrate technical scheme, it is not intended to limit；To the greatest extent The present invention has been described in detail by pipe with reference to foregoing embodiments, it will be understood by those within the art that: it depends on So the technical scheme described in foregoing embodiments can be modified, or the most some or all of technical characteristic is entered Row equivalent；And these amendments or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology The scope of scheme.

Claims

1. a customer impact force prediction method based on social networks event, it is characterised in that including:

User force matrix S is set up, in described user force matrix S according to M user influence power in N number of event Element s_ueRepresent that user u affects the ratio of friend in event e, wherein, 1≤u≤M and be integer, 1≤e≤N and be integer, Described M is the integer more than 1, and described N is the integer more than 1；

Characteristic information according to described M user sets up user's correlation matrix U, the element in described user's correlation matrix U u_uu′Represent the degree of correlation between user u and user u ', wherein 1≤u '≤M and be integer；

Characteristic information according to described N number of event sets up event correlation matrix E, the element in described event correlation matrix E e_ee′The degree of correlation between expression event e and event e ', wherein 1≤e '≤N and be integer；

According to described user force matrix S, described user's correlation matrix U and described event correlation matrix E, determine user Eigenvectors matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and event degree of correlation factor of influence square Battle array Z, wherein, P and Q is respectively the characteristic vector that described user force matrix S carries out the user obtained after matrix decomposition Matrix and the eigenvectors matrix of event, W and Z is respectively user's degree of correlation and the event degree of correlation to the use in social networks event The factor of influence matrix of family influence power；

According to described user characteristics vector matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and event Degree of correlation factor of influence matrix Z, determines the user force in social networks event.

Method the most according to claim 1, it is characterised in that the characteristic information of described user includes that user is on topic Influence power on region of influence power, user and user's influence power on organizer；

The described characteristic information according to described M user sets up user's correlation matrix U, including:

User influence power similarity matrix U on topic is set up according to described M user influence power on topic_t；

User influence power similarity matrix U on region is set up according to described M user influence power on region_r；

User influence power similarity matrix U on organizer is set up according to described M user influence power on organizer_o；

According to U (u, u ')=β₁U_t(u,u′)+β₂U_r(u,u′)+β₃U_o(u, u ') sets up described user's correlation matrix U, wherein β₁, β₂And β₃It is respectively user's influence power similarity on topic, the user influence power similarity on region and user at tissue The weight of the influence power similarity on person.

Method the most according to claim 2, it is characterised in that the described influence power according to described M user on topic Set up user influence power similarity matrix U on topic_t, including:

According toDetermine described user influence power on topic, wherein For event e_i's Topic is distributed, HE_uRepresent the set of all events that user u participated in；

According to U_t(u, u ')=1-D_JS(st_u,st_u′) determine any two user influence power similarity on topic, wherein, D_JS (st_u,st_u') it is st_uAnd st_u′Between KL-JS divergence.

Method the most according to claim 2, it is characterised in that the described influence power according to described M user on region Set up user influence power similarity matrix U on region_r, including:

According toDetermine described user influence power on region, whereinRepresent user u to exist Region R_mOn influence power,HE_uRepresent the set of all events that user u participated in,Represent at region R_mOn The event sets held, n_u(R_m) represent user u at region R_mOn the event number participated in；

According toDetermining user's influence power vector on all regions, wherein, f is that region is total Number；

According toDetermine any two user influence power similarity on region.

Method the most according to claim 2, it is characterised in that the described impact according to described M user on organizer Power sets up user influence power similarity matrix U on organizer_o, including:

According toDetermine described user influence power on organizer, whereinRepresent user u At organizer O_jOn influence power,HE_uRepresent the set of all events that user u participated in,Represent organizer O_jThe event sets organized, n_u(O_j) represent user and participated in organizer O_jThe total number of events amount organized；

According toDetermining described user influence power on all organizers, wherein l is tissue Person's sum；

According toDetermine any two user influence power similarity on organizer.

6. according to the method described in any one of Claims 1 to 5, it is characterised in that the characteristic information of described event includes event Content, event location and event organiser；

The described characteristic information according to described N number of event sets up event correlation matrix E, including:

Content according to described N number of event sets up event content similarity matrix E_c；

Event location similarity matrix E is set up in position according to described N number of event_l；

Organizer according to described N number of event sets up event organiser similarity matrix E_o；

According to E (e, e')=α₁E_c(e,e')+α₂E_l(e,e')+α₃E_o(e, e') sets up described event correlation matrix E, wherein, α₁, α₂And α₃It is respectively event content similarity, event location similarity and the weight of event organiser's similarity.

Method the most according to claim 6, it is characterised in that the described content according to described N number of event is set up in event Hold similarity matrix E_c, including:

According to E_c(e, e ')=1-D_JS(θ_e,θ_e′) determine the content similarity of any two event, wherein, θ_eAnd θ_e′It is respectively thing The topic distribution of part e and event e', D_JS(θ_e,θ_e′) it is θ_eAnd θ_e′Between KL-JS divergence.

Method the most according to claim 6, it is characterised in that event position is set up in the described position according to described N number of event Put similarity matrix E_l, including:

According toDetermine the location similarity of any two event, wherein, l_eAnd l_e′Point Not Wei event e and event e' hold position, dis (l_e,l_e′) it is l_eAnd l_e′Between distance.

Method the most according to claim 6, it is characterised in that the described organizer according to described N number of event sets up event Organizer similarity matrix E_o, including:

According toDetermine organizer's similarity of any two event, wherein, O (e) and O (e ') is respectively event e and the organizer of event e'.

Method the most according to claim 1, it is characterised in that described according to described user force matrix S, described use Family correlation matrix U and described event correlation matrix E, determine user characteristics vector matrix P, affair character vector matrix Q, User's degree of correlation factor of influence matrix W and event degree of correlation factor of influence matrix Z, including:

Initialising subscriber eigenvectors matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and event phase Pass degree factor of influence matrix Z；

According to described user force matrix S, described user's correlation matrix U and described event correlation matrix E, update described User characteristics vector matrix P, affair character vector matrix Q, user's degree of correlation factor of influence matrix W and the event degree of correlation affect because of Submatrix Z, until affair character vector matrix Q, described user's degree of correlation factor of influence described in described user characteristics vector matrix P Matrix W and described event degree of correlation factor of influence matrix Z convergence.