CN109242710A

CN109242710A - Social networks node influence power sort method and system

Info

Publication number: CN109242710A
Application number: CN201810931729.9A
Authority: CN
Inventors: 熊菲; 杨佳佩; 刘云; 张振江
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2019-01-18
Anticipated expiration: 2038-08-16
Also published as: CN109242710B

Abstract

The present invention provides a kind of social networks node influence power sort method and systems, it is related to digital information processing field, information that this method collects individual subscriber home tip first, user posts and user are to information, information is posted to personal homepage information, user and user pre-processes information, forms training set and test set；Then according to training set, the transfer matrix model of model is established, simulation calculation is carried out to transfer matrix model, obtains optimal training parameter；It finally combines optimal training parameter to establish the test transfer matrix of model forwarding according to test set, test transfer matrix is calculated, the social networks node influence power ranking results are obtained.The present invention is it can be found that concealed nodes pay close attention to possibility, so that the data network imperfect to multidate information, missing is more serious carries out influence power ranking analysis；Social networks node influence power more can be accurately analyzed because of missing social network data.

Description

Social networks node influence power sort method and system

Technical field

The present invention relates to digital information processing fields, and in particular to a kind of social networks node influence power sort method And system.

Background technique

Information age, the analysis of human relationship are to measure individual value, promote Related product, realize public sentiment monitoring, One of the important foundation stone of the related construction of planning.End in March, 2017 according to statistics, Android and apple application market possess about altogether 5000000 sections of APP, how preferably to recommend APP to user is the critical problem for promoting user experience and increasing enterprise's business revenue, But existing APP Generalization bounds are based primarily upon userspersonal information, and there is no consider that user group's influence power is to it in social networks The influence of generation.However user will necessarily be influenced when downloading or buying APP by social networks good friend.Therefore, personalized The income of the social influence power between the network user is incorporated in proposed algorithm and be all social network analysis institute the problems such as how to incorporate The problem of concern.Social networks is to include under personal line at one's side with regard to friend-making relational network, also includes social application foundation on line Social networks, this kind of network can be divided into the weak linked network of unidirectional concern formula and the strong linked network of two-way good friend's formula again.

The method for paying close attention to human relationship is gradually diffused into information interconnected network from traditional sociology and psychology mode Field is achieved pair by feat of bulk information acquisition capability and mass data mining algorithm and relevance ranking rank algorithm Social network analysis on line.

User can show to influence power is exchanged on line by the mutual activity situation of user, i.e., the network user is dynamic Work and thinking are influenced to change by remaining human action and thinking.And in network capability of influence compared with big people network struction, expanding, It is posted etc. in multiple step links and plays the huge key effect of impact effect.Therefore, how to evaluate social network user influences energy Power is ranked up user in community network, and obtaining great influence power user node is network individual influence discussion on line The most basic problem requirement in the inside.Community network interior joint influence power and sequence be often subsequent relevant community network discussion with grind Study carefully basis.

Network node influence power and the Early analysis method of sequence are mainly using non-networkization and digitization mode, for example ask Volume is filled in, telephone poll etc., and this kind of mode obtains that data is few, and time delay is big, and there are problems.

With the rapid development of Internet technology and personal mobile network's technology, made by social networks mass data on line For data support, usage mode mainly includes analysis concern relation network structure, is posted record and User Activity/content meaning of a word is distinguished Analysis, is posted possibility to message to extract user, and the number of successful spread pair learns shadow between sequencing statistical user accordingly Power is rung, and estimates the probability of spreading between user as influence power by Bernoulli Jacob's model and Jie Kaerde exponential model.

In the meantime, many outstanding algorithms are suggested and apply, if Pagerank algorithm is to all nodes to a phase Then same initial algorithm value carries out number wheel iteration, algorithm values are basically unchanged after iteration, and the algorithm values of node are exactly at this time The algorithm values of final ranking foundation, the value is bigger, and node influence power is bigger.Since Pagerank is not unique etc. solving sequence Prime number amendment has been used when problem, has caused matrix structure that moderate finite deformation occurs, therefore has improved LeaderRank algorithm and is suggested, more It reduces amendment well to affect, guarantees that result is reliable.The analysis of community network is a complicated problem, be not depend merely on it is a certain Kind method can solve, but needs to comprehensively consider various factors and optimize combination, identify final social role and shadow Ring power.

As network security and information leakage risk are increasingly taken seriously, previous algorithm encounters one in information collection A little problems, for crawling information to Sina weibo crawler, at present under usual manner, microblogging avoids to protect user information Information leakage risk, does not allow to crawl concern people and bean vermicelli information, and user pays close attention to collect and establish facing to complex situations.And Be also increasingly specification and stringent for user's history content protecting of posting, this cause possibly can not to obtain many users post and It is posted information.And these information are often classical influence power parser necessity data.

Summary of the invention

The purpose of the present invention is to provide one kind can integrate social network user information, accurate judgement social activity user node Influence power and its method and system of sequence, to solve existing social networks node influence power present in above-mentioned background technique point The technical problem that analysis method Consideration is unilateral, result is inaccurate.

To achieve the goals above, this invention takes following technical solutions:

On the one hand, the present invention provides a kind of social networks node influence power sort method, and this method includes following process step It is rapid:

Step S110: individual subscriber home tip, user are collected and posts information and user to information, to the personal homepage Information, the user post information and the user pre-processes information, form training set and test set；

Step S120: according to the training set, establishing the transfer matrix model of model, carries out to the transfer matrix model Simulation calculation obtains optimal training parameter；

Step S130: square is shifted in conjunction with the test that the optimal training parameter establishes model forwarding according to the test set Battle array, calculates the test transfer matrix, obtains the social networks node influence power ranking results.

Further, the step S110 is specifically included:

It collects personal homepage information, user and posts information and user to information, form data set；Wherein, described personal main Page information is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number；

The user posts information, several and model by comment number including at least being forwarded for model；

The user is to the concern relation that information includes between user and user；

The data set is cut into training set and test set on demand, the training set includes training set family personal information With training set user to information；The test set includes test set family personal information and test set user to information.

Further, in the step S120, the transfer matrix model for establishing model according to the training set is specific Include:

Step S121: determine the user of model forwarding in the training set to impact factor f₁:

Wherein, I_UIndicate that user U is concerned number, S_VIndicate the concern number of user V；

Step S122: user's itself affect factor f that model forwards in the training set is determined₂:

Wherein, X indicates the tradeoff that is posted and comments on significance level of the social networks to user U, M_UIndicate posting for user U Sum, T_UIndicate that user U's enlivens duration, Z_UIndicate the model of user U is turned note number, P_UIndicate being commented for the model of user U By number；

Step S123: total impact factor f that model forwards in the training set is determined_uv:

f_uv=1-exp (- (f₁)^m×(f₂)^1-m), wherein m indicates training parameter, i.e. f₁And f₂Tradeoff parameter；

Step S124: K-shell decomposition algorithm is utilized, the probability that user U in the training set forwards oneself model is obtained p_uu:

Wherein, n indicates user node number, K_suIndicate the K-shell value of user U；

Step S125: determine the model of user U in the training set is forwarded probability P_uv:

Step S126: according to p_uuAnd P_uv, obtain the trained transfer matrix P:

Further, described that the optimal training ginseng of emulation acquisition is carried out to the transfer matrix model in the step S120 Number specifically includes:

Number average ranking is forwarded according to model successively to choose in the training set C user and corresponding true turn Paste number M_c, multiple and different m values corresponds to multiple and different training transfer matrix P, using independent cascade model respectively to each P into Row propagates emulation experiment, and the expectation for obtaining C user is averagely posted several F_c；

Determine error MAPE value:

Wherein, c={ 1 ..., C }；

Select the corresponding training parameter of the smallest P of MAPE value as optimal training parameter.

Further, described that note is established in conjunction with the optimal training parameter according to the test set in the step S130 The test transfer matrix of son forwarding specifically includes:

According to the test set, select the optimal training parameter as f₁And f₂Tradeoff parameter, according to the step The method of S121- step S126 establishes the test transfer matrix.

Further, described that the test transfer matrix is carried out to calculate the acquisition social network in the step S130 Network node influence power ranking results specifically include:

If the initial value of each user's value vector St is 1, stable convergence value is obtained using Markov iteration, was calculated Journey are as follows:

St=(1 ... 1)_1×n×P_m,

Following procedure is repeated, when user is worth vector Euclid norm error less than predetermined accuracy twice for front and back, is stopped Only iterative process obtains stable convergence algorithm values S:

S=St_1×n×P_m,

Using each entry value of gained stable convergence algorithm values S as the algorithm values of each user, its size is compared, described in acquisition Social networks node influence power ranking results.

On the other hand, the present invention provides a kind of social networks node influence power ordering system, which includes:

Data preprocessing module, for collecting individual subscriber home tip, user post information and user to information, to institute It states personal homepage information and user information of posting pre-processes, form training set and test set；

Training module, for the transfer matrix model of model being established, to the transfer matrix model according to the training set Simulation calculation is carried out, optimal training parameter is obtained；

Test module, for being shifted according to the test set in conjunction with the test that the optimal training parameter establishes model forwarding Matrix calculates the test transfer matrix, obtains the social networks node influence power ranking results.

Further, the data preprocessing module specifically includes:

Further, the training module includes:

User is to factor of influence determining unit, for being concerned number and another according to user in the user couple The concern number of a user determines the user of model forwarding to impact factor；

User's itself affect factor specifying unit, for according to one user post sum, enliven duration, model Be forwarded several and model by comment number, determine user's itself affect factor of the model forwarding；

Total factor of influence determining unit, for according to the user to impact factor and user's itself affect factor Weigh parameter, determines total impact factor of model forwarding；

User, for utilizing K-shell decomposition algorithm, obtains one user and forwards institute from probability determining unit is forwarded State the probability of model；

Model is forwarded probability determining unit, for determine one user model by other forever user forward Probability；

Transfer matrix model foundation unit, for forwarding the probability of oneself model and the quilt of the model according to the user Forwarding probability establishes the transfer matrix model of model；

Optimal training parameter establishes unit, imitative for propagate to transfer matrix model respectively using independent cascade model True experiment obtains expectation and is averagely posted number, determines error MAPE value, selects the corresponding training ginseng of the smallest transfer matrix of MAPE value Number is used as the optimal training parameter；

Further, the test module includes:

Test transfer matrix establishes unit, turns for establishing test in conjunction with the optimal training parameter according to the test set Move matrix；

Influence power, which sorts, establishes unit, obtains the social networks node for calculate to the test transfer matrix Influence power ranking results.

The invention has the advantages that: it can be found that concealed nodes pay close attention to possibility, so as to imperfect to multidate information, scarce It loses more serious data network and carries out influence power ranking analysis, provide general algorithm because of benefit when missing data can not be analyzed Scheme is filled, social networks node influence power is more accurately analyzed.

The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.

Fig. 1 is social networks node influence power ordering system functional block diagram described in the embodiment of the present invention one.

Fig. 2 is social networks node influence power sort method flow diagram described in the embodiment of the present invention two.

Fig. 3 is social networks node influence power ordering system functional block diagram described in the embodiment of the present invention three.

Fig. 4 is posted situation schematic diagram with corresponding user for training set K-shell value described in the embodiment of the present invention four.

Fig. 5 is the calculated result figure of optimal transfer matrix parameter m described in the embodiment of the present invention four.

Fig. 6 is posted situation schematic diagram with corresponding user for test set K-shell value described in the embodiment of the present invention four.

Fig. 7 is contrast images compliance test result figure described in the embodiment of the present invention five.

Fig. 8 is that comparison kendall described in the embodiment of the present invention five examines proof diagram.

Fig. 9 is the specific ranking proof diagram of comparison described in the embodiment of the present invention five.

Figure 10 is social networks node influence power sort method flow diagram described in the embodiment of the present invention four.

Specific embodiment

Embodiments of the present invention are described below in detail, the example of the embodiment is shown in the accompanying drawings, wherein from beginning Same or similar element or module with the same or similar functions are indicated to same or similar label eventually.Below by ginseng The embodiment for examining attached drawing description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, which should be understood that, to be had and the meaning in the context of the prior art The consistent meaning of justice, and unless defined as here, it will not be explained in an idealized or overly formal meaning.

In order to facilitate understanding of embodiments of the present invention, further by taking specific embodiment as an example below in conjunction with attached drawing to be solved Explanation is released, and embodiment does not constitute the restriction to the embodiment of the present invention.

Those of ordinary skill in the art are it should be understood that attached drawing is the schematic diagram of one embodiment, the portion in attached drawing Part or device are not necessarily implemented necessary to the present invention.

Embodiment one

As shown in Figure 1, the embodiment of the present invention one provides a kind of social networks node influence power ordering system, the system packet It includes:

In the specific embodiment of the invention one, the data preprocessing module is specifically included:

Data preprocessing module described in the embodiment of the present invention one in practical applications, is mainly used for obtaining data set, institute The data stated include 3 classes: the first kind is personal homepage information, is posted sum, user gradation or work including at least User ID, user Jump duration, user's bean vermicelli (user's first concern user's second then first be referred to as second bean vermicelli) number, user pay close attention to people (user's first concern user Second then second be referred to as first concern people) number；Second class is that user posts information, being forwarded number and commented including at least part model By number；Third class is concern relation between user, including at least the concern relation between certain customers；

Data set is simply pre-processed, form needed for generating, the simple cleaning such as advertisement filter is carried out to data set, is pressed Demand is cut into training set and test set, then generates following demand form respectively: processing personal homepage statistical information table closes And third class data and primary sources, the interior addition user of table are averagely posted several and average by comment several two；Cleaning is used Family concern relation table, it is ensured that every a pair of of concern information, bean vermicelli and concern people are in userspersonal information's table.

In specific embodiments of the present invention one, the training module includes:

In specific embodiments of the present invention one, the test module includes:

Embodiment two

As shown in Fig. 2, it is provided by Embodiment 2 of the present invention it is a kind of utilize system described in embodiment one carry out social networks The method of node influence power sequence, this method includes following process step:

Step S110: individual subscriber home tip, user are collected and posts information and user to information, to the personal homepage Information and the user information of posting pre-process, and form training set and test set；

In specific embodiments of the present invention two, the step S110 is specifically included:

It is described that model is established according to the training set in the step S120 in specific embodiments of the present invention two Transfer matrix model specifically includes:

Step S126: according to p_uuAnd P_uv, obtain the trained transfer matrix P:

It is described that the transfer matrix model is imitated in the step S120 in specific embodiments of the present invention two Optimal training parameter is really obtained to specifically include:

Determine error MAPE value:

Wherein, c={ 1 ..., C }；

In specific embodiments of the present invention two, in the step S130, it is described according to the test set in conjunction with it is described most The test transfer matrix that excellent training parameter establishes model forwarding specifically includes:

According to the test set, select the optimal training parameter as f₁And f₂Tradeoff parameter, according to step S121- Step 126 establishes the test transfer matrix.

It is described that the test transfer matrix is counted in the step S130 in specific embodiments of the present invention two The acquisition social networks node influence power ranking results are calculated to specifically include:

St=(1 ... 1)_1×n×P_m,

S=St_1×n×P_m,

Embodiment three

As shown in figure 3, the embodiment of the present invention three provides a kind of social networks node influence power ordering system, the system packet It includes:

Data pre-processing unit 21 obtains Sina weibo data set MicroblogPCU from network, and carries out simple pre- place Reason, generate the design needed for form training set and test set 4, be respectively as follows: training set userspersonal information's table, training set User pays close attention to information form, test set userspersonal information's table, and test set user pays close attention to information form；

First training unit 22 generates the transfer matrix Pm of model forwarding according to training set, wherein taking m is 0~1 with 0.1 It is corresponding to generate different transfer matrix Pm for equally spaced 11 sampling values；

Second training unit 23, according to 11 different transfer matrix Pm, the model for carrying out " model is by revolution ", which is propagated, to be imitated True experiment screens the optimal value in transfer matrix Pm according to MAPE value, obtains trained values m；

First test unit 24 generates transfer matrix P, raw cost algorithm design row according to test set and training result m Name；

Second test cell 25 generates remaining algorithm ranking according to test set and other algorithms；

Third test cell 26 carries out consistency check, card according to this algorithm ranking result and remaining algorithm ranking result The superiority of bright algorithm ranking results；

The data pre-processing unit 21 specifically includes:

Data set obtains subelement 211, obtains Sina weibo data set MicroblogPCU, this data combines in 2015.3.17 being obtained by Jun Liu et al. people from Sina weibo.Data set owner will include weibo_user.csv (individual subscriber letter Breath), followe-followee.csv (user pays close attention to information), user_post.csv (content information of posting) and post.csv (content information of posting) 4 files, wherein weibo_user.csv includes 700+ User ID, title, gender, grade, individual's letter Breath, postcode, bean vermicelli value, the information such as concern people's sum；Followe-followee.csv has included about 140,000 bean vermicelli-concern people Concern relation pair, including the user for not being embodied in weibo_user；And user_post.csv and post.csv are recorded These users post content, and model ID, post people ID, be posted quantity, number of reviews etc.；The data set have passed through simple clear It washes, eliminates corpse number, trumpet etc. interferes content, but there are still several missing values, needs to reject manually；

Data set pre-processes subelement 212, is simply pre-processed, and training set and test set are generated；Handle individual subscriber Information form, addition is average to be posted number and averagely by comment number information, it is ensured that there is following project information: ID to each user, Title, user gradation, number of posting, number of fans pay close attention to number, and user's model is average to be posted/by number of reviews；User is cleaned to close Infuse information form, it is ensured that every a pair of of concern information, bean vermicelli and concern people are in userspersonal information's table.So far, 4 are generated A table, is respectively as follows: training set userspersonal information's table, and training set user pays close attention to information form, test set individual subscriber letter Table is ceased, test set user pays close attention to information form.

First training unit 22 specifically includes:

The transfer matrix Pm is to be posted probability for describing model that may be present, when being posted probability greater than 0, i.e., Make to turn note record or concern relation between user currently without observing, following there is also be posted possibility；

Kshell value computation subunit 221 calculates training set kshell value, and steps are as follows for the calculating of kshell value: choosing first Degree is 0 point of peeling in network selection network；Then select in network it is all be judged as degree be 1 point peeling, later in newly-generated network Face certain customers' degree can change, continue to select it is all be judged as that degree is 1 point of peeling, repeat until newly-generated network not Until having peelable drop point again, all 1 points of peeling degree are referred to as 1shell；Repeat above step, the kshell that obtains 2shell ..., directly It is peeled off to all nodes, node each in this way has the kshell value of one's own integer；

As shown in figure 4, being posted situation map with corresponding user for training set kshell value of the present invention, it can be seen that kshell Be worth and be posted number and be not presented apparent correlation, show it is poor, cannot be separately as the method for analysis such network ranking.

Training parameter drafts subelement 222, takes 0~1 numerical value at equal intervals, such as 0.1 interval respectively to m, that is, divides 11 completions Following process；

Fuv and Puu computation subunit 223 calculates fuv and Puu；

1 is defined, user is to impact factor f1:

Wherein Iu indicates the number of fans of user u, and Sv indicates the concern number of user v；

2 are defined, user's itself affect factor f2:

The wherein parameter that x is 0~1, expression social application value being posted and commenting on the tradeoff of degree, without loss of generality Value can be set as 0.5；

3 are defined, transition probability fuv:

f_uv==1-exp {-(f1)^m*(f2)^1-m}

The parameter for the 0.1 interval sampling that wherein m is 0~1, indicates the tradeoff to f1 and f2, needs to train by totally 11；

4 are defined, transition probability Puu:

Wherein n is user node number, and ksu is kshell value.

Puv computation subunit 224 calculates Puv；

5 are defined, probability P uv is forwarded:

Transfer matrix P generates subelement 225, generates transition probability matrix P, and different m is expressed as Pm；

According to Puv and Puu, transfer matrix Pm is obtained:

Second training unit 23 specifically includes:

Data extract subelement 231, and " model is posted 20 users before quantity ranking " is extracted according to data set contents It obtains user list and corresponding note is posted quantity；

Propagation experimentation subelement 232 has different transfer matrix P to different m, to each transfer matrix Pm, with 20 users Make single starting point and independently cascade propagation experimentation, experiment is repeated 10 times every time, obtains average value Fc；

MAPE value computation subunit 233 calculates MAPE value using Fc and Mc to each Pm；

Wherein C is number of users, and c is specific user, and MAPE expression is error between prediction data and truthful data；

Trained values select subelement 234, select minimum MAPE value, correspond to optimal m, are training result；

As shown in figure 5, be the optimal m calculated result figure of two trained values of the embodiment of the present invention, m take 0~1 with 0.1 for interval 11 sample values, obtain m=0, then 0.1 ... 1 totally 11 transition probability matrix Pm calculate separately MAPE value, take its minimum, obtain To corresponding optimal m value, the best m value obtained in this embodiment is 0.5.

The first test unit 24 specifically includes:

Kshell value computation subunit 241 calculates test set kshell value；

As shown in fig. 6, being posted situation map with corresponding user for two test set kshell value of the embodiment of the present invention, can see Out, the performance of kshell value is poor, cannot be separately as the method for analysis such network ranking.

Transfer matrix computation subunit 242, using trained values m, test set data and transfer matrix model calculation formula, Obtain test set transfer matrix P；

Sort computation subunit 243, if initial value vector is complete 1 vector, multiplied by transition probability matrix P, continuous iteration is straight To convergence, algorithm values are obtained, and obtain preceding 10 ranking.Markov mode iteration can be used and obtain stable convergence value.Order changes It is 1 for initial value, calculates:

St=(1 ... 1)_1*n*P_n*n

Following procedure is repeated, until error delta meets required precision, obtains stable convergence algorithm values vector S:

St=St_1*n*P_n*n

It can be used two norms (euclideam norm) when calculating Δ, calculating front and back, St difference vector length satisfaction is wanted twice It asks, it is believed that convergence.

Second test cell 25 specifically includes:

Centrad is spent as local influence power algorithm and represents reference, during betweenness center and close centers degree approach in other words Disposition is represented as global impact power algorithm and is referred to；Therefore embodiment two is applied to test set data using these three indexs, point Preceding ten ranking is not obtained, is used for subsequent contrast.

The third test cell 26 specifically includes:

Contrast images compliance test result subelement 261 is drawn algorithms of different and truthful data checking image, and is compared；First For after test set calculates, therefrom obtaining 10 users before ranking using this algorithm, it is flat that their models are obtained from data set It is posted quantity, is then made using user's algorithm evaluation as abscissa, it is vertical that corresponding user's model, which is averagely posted number, Whether coordinate, both observations show positive correlation or whether have comformity relation；For other algorithms, centrad is such as spent, Betweenness center degree, close centers degree index equally can be to correspond to user by number is posted using algorithm or index value as horizontal axis The longitudinal axis makes above-mentioned comformity relation inspection figure, and observing these algorithm patterns seems no to show positive correlation or comformity relation；It sees Examine whether this method with truthful data has comformity relation, intuitively whether this method shows better than its other party from image Method；

As shown in fig. 7, be two contrast images compliance test result figure of the embodiment of the present invention, including before ranking of the present invention 10 with it is true Be posted data consistency (res.myAlgo), spend centrad before 10 be really posted data consistency (res.inDgCent), tightly Before close centrad 10 with really be posted data consistency (res.closeCent), before betweenness center degree 10 with the true data that are posted The result and contrast images of consistency (res.betweenCent) can be can be visually seen by image comparison, and close centers degree refers to It is very bad to mark effect, without obvious consistency；Betweenness center degree and degree centrad effect can receive, as value increase is posted Model number is not reduced, and is in monotonicity, but is spent centrad and lacked distinguishing, and betweenness center degree is since complexity is higher, engineering Application cost defect is obvious；And the algorithm values under the present invention will not be hidden in number presentation consistency, high-impact user is posted The user group and algorithm calculated result value of algorithm calculated result value lower (thinking that influence power ranking is lower) (think influence power between two parties Ranking is placed in the middle) user group among, can by this algorithm reduce the stronger user of capability of influence screening range in algorithm Calculated result is worth in biggish region, realizes preferable ranking effect；

It compares kendall and examines verifying subelement 262, calculate algorithms of different and truthful data kendall is examined, research inspection Testing out result and numerical value, (kendall consistency check refers to and carries out distinct methods ranking to same sample, then to every two Kind ranking, comparing calculation ranking similarity, one of method is exactly kendall consistency check, and the method for calculating is mainly Consider same ordered pair and inverted sequence pair, if in method A user's first ranking be higher than user, and in method B user's first ranking also above User's second is then same sequence, and symbol just, otherwise is inverted sequence, and symbol is negative, count with the positive and negative of sequence and inverted sequence and, the bigger explanation of value More with ordered pair, ranking is closer, if the kendall of some way and true ranking examines numerical value bigger, this method is obtained Ranking it is more accurate)；10 rankings before test set obtains are calculated to application this algorithm, preceding 10 ranking that remaining algorithm obtains and true 10 rankings (being posted 10 users of several highests) before real data obtains kendall using kendall consistency check mode and examines As a result, the numerical value means consistency between two vectors, if kendall value is bigger, illustrate that two vector orders are more consistent, and The case where truthful data is calculated that kendall coefficient is bigger, illustrates result that algorithm sorts with truthful data is more consistent, From numerically clearly perception method effectiveness；

As shown in figure 8, comparing kendall for the embodiment of the present invention two examines proof diagram, it can be seen that close centers degree (clCent) very poor with true precedence data (realRepo) consistency that is posted, it is unable to get completely correctly using close centers degree Ranking result；For betweenness center degree (bwCent) since complexity is higher, it is more difficult that there are Project Realizations, and cost is higher to be lacked It falls into；And use this algorithm and degree centrad (dgCent) gap little, preferable pertinent trends can be obtained, therefore can contract Small high-impact user screens range.

Specific ranking verifying subelement 263 is compared, 10 users and algorithm/true number before algorithms of different and truthful data are exported Value is specific to compare；To application, this algorithm calculates 10 rankings before test set obtains, and 10 are set out before other algorithms and truthful data Come, the project of enumerating includes each algorithm ranking/truthful data ranked users ID, each algorithm algorithm values/be really averagely posted model Number obtains algorithm effect analysis from specific ranking result and algorithm values.

As shown in figure 9, comparing specific ranking proof diagram for the embodiment of the present invention two, analysis chart in truthful data it is found that deposit 3 users are 3 before inventive algorithm ranking before ranking, this illustrates high-impact user there are high algorithm values, and the present invention is effective , furthermore inventive algorithm resolution ratio is higher, and different user algorithm difference is relatively steady, and the result compared to remaining algorithm has obviously Advantage.

Used time counts subelement 264, output program total used time, considers whether the time can receive, in embodiment operation Beginning and end establishing time stamp calculates time difference；By in Fig. 8 it is found that including that 700+ user's network analysis interconnected is real It applies example two to run the used time 42 seconds, this is identical with o (n^2) complexity of theory analysis, when using operation due to embodiment two Between slower python language, and more time is spent to export comparison test image, this used time still can advanced optimize And reduction.

Example IV

As shown in Figure 10, the embodiment of the present invention four provide it is a kind of utilize system described in embodiment three carry out social networks section The method of point influence power sequence.This method mainly includes following process step:

Step 11 obtains data set, and is simply pre-processed, the training set of form needed for generating the design and test Collection；

Step 12, the transfer matrix Pm forwarded according to training set, generation model, wherein different training parameter m is generated not Same transfer matrix Pm；

Step 13, according to different transfer matrix Pm, carry out the emulation experiment of model propagation, screen in transfer matrix Pm Optimal value；

Step 14, according to test set and training result m, generate transfer matrix P, raw cost algorithm designs ranking.

The step 11 includes:

Step 111, data set is obtained, the data include 3 classes: the first kind is personal homepage information, includes at least and uses Family ID, user post sum, user gradation or enliven duration, user's bean vermicelli (user's first concern user's second then first be referred to as second powder Silk) number, user pay close attention to people (user's first concern user's second then second be referred to as first concern people) count；Second class is that user posts information, Number is forwarded and by comment number including at least part model；Third class is concern relation between user, includes at least certain customers Between concern relation；

Step 112, data set is simply pre-processed, form needed for generating, the letters such as advertisement filter is carried out to data set Single cleaning, is cut into training set and test set on demand, then generates following demand form respectively: processing personal homepage statistics letter Table is ceased, third class data and primary sources are merged, addition user is averagely posted several and average by comment several two in table ?；Clean user's concern relation table, it is ensured that every a pair of of concern information, bean vermicelli and concern people are in userspersonal information's table In.

The step 12 specifically:

Transfer matrix Pm generation method is as follows:

1 is defined, user is to impact factor f1: user's number of fans I is more, and influence power is stronger, and user's concern number S is more, Sensibility is stronger, and if user's concern number S is bigger, individual node is opposite to its influence power to be diluted, therefore sets Set parameter:

Wherein Iu indicates the number of fans of user u, and Sv indicates the concern number of user v, and f1 shows to describe user U to user V User to influence, and considering cannot be that 0 may bring contingency question except 0 and f1；

2 are defined, user's itself affect factor f2: on the one hand can consider node active degree=quantity of posting/active total Between timing, and enlivening the total time can be embodied by user gradation；Another aspect model quality can be by being posted number and comment number It embodies, therefore parameter is set

The wherein parameter that x is 0~1, expression social application value being posted and commenting on the tradeoff of degree, without loss of generality Value can be set as 0.5, if a user there are multiple models to be crawled, be averaged and be posted, average review number, and consider except 0 asks Topic and f2 may bring contingency question for 0；

3 are defined, transition probability fuv: showing that different user U influences V, i.e. V is posted U model probability

f_uv=1-exp {-(f1)^m*(f2)^1-m}

It is to increase and increase tendency with f1, f2 because meeting fuv using exponential form, m is the key training of value 0~1 Parameter, for distributing user to impact factor and user's itself affect factor in communication process proportion, the i.e. power of f1 and f2 Weighing apparatus；

4 are defined, transition probability Puu: indicating that user itself is directed toward itself probability；In view of a user is turned by remaining user Patch probability is more, then oneself being directed toward oneself may be more micro-, and user in this way is usually core customer in network, i.e. kshell is decomposed Value is more a little bigger, therefore

Wherein n is user node number, and Ksu is kshell value, and when kshell value is bigger, expression node is located in network The heart, it is relatively smaller that node is directed toward oneself possibility；

5 are defined, probability P uv is forwarded: matrix pattern can be obtained by defining 3 and defining 4, but probability forwarding matrix is contemplated Definition, needs to meet

∑p_uv+P_uu=1

Therefore have

In view of Puv is generally too small at this time, it is unfavorable for subsequent training, therefore enables Sv < average (Sv) (Sv's is averaged Value) when, taking fuv is minimum.

According to Puv and Puu, transfer matrix Pm is obtained:

The step 13 includes:

The propagation emulation experiment of model is carried out, the optimal value in transfer matrix Pm is screened, propagates emulation experiment using independent Cascade model takes out corresponding data and " average model is by 20 users before revolution amount ranking " is concentrated to obtain user list and correspond to true It is posted several Mc；

Brief introduction independence cascade model: in independent cascade model, there are two types of states for each node: activation and un-activation, Middle activation indicates that the node receives or propagate certain information (such as forwarding on microblogging, the behaviors such as thumb up) [Li Guoliang, Chu Ya The maximizing influence of duckweed, the more social networks of the strong of Feng Jianhua, Xu Yao analyzes [J] Chinese journal of computers, 2016,39 (04): 643- 656]；Independent cascade model is such a modeling situation, and when one piece of node u is infected inside the model, it can be attempted with can Energy property Puv infects neighbor node v, and this infection is used only once on a direction between a pair of of user, and u is to all neighbours V infection does not interfere with each other, all different users to v infection does not interfere with each other equally, until u trial infected all neighboring user v, so Afterwards to user v has been infected, sequentially gone on according to the way of front；The node of excitation can not be activated again, That is, information cannot be posted by the same user is secondary；Process is posted using independent cascade model message are as follows:

I, gives an initial user or multiple users, and successively infection becomes starting point at the beginning；If user u is infected If, then u will be possible to infect all good friends and only have an opportunity respectively, each process infection potential Puv, itself is solely It is vertical；When Puv is more, illustrate that infection potential is more, u is more possible to infection v；

If II, t moment node w is not infected, all w have infected neighbours to be attempted to infect to node w, but does not include Course of infection was attempted, if w is infected, t+1 moment, node w is transferred to Infection Status；

III, is repeated the above process, until it is all can infect trial all be completed, that is, reach maximum can infection scope, Infection scope is information maximum propagation range since start node at this time, is averaged；

There are different transfer matrix Pm to different m, respectively with independent cascade model propagation experimentation 10 times, is averagely posted Number Fc；

6 are defined, error MAPE value: expression is error between prediction data and truthful data, if the result calculated Be it is smaller, mean that the error calculated is smaller, in other words mean that corresponding P and m value is better, more received；

Wherein C is number of users, and c is specific user；

The Fc and MAPE for calculating different m select the smallest m of MAPE for optimum value training.

The step 14 includes:

Step 141, transfer matrix P is generated, according to the trained values of the data of test set and optimal m, by step 12 method Generate transfer matrix P；

Step 142, raw cost algorithm ranking is obtained if each user's value vector St initial value is 1 using Markov iteration To stable convergence value, calculating process are as follows:

St=(1 ... 1)_1*n*P_n*n

St==St_1*n*P_n*n

Comparing front and back, user is worth vector Euclid norm error delta twice, when error delta is less than predetermined accuracy, stops Iterative process compares its size and obtains the use of this algorithm using each entry value of gained user's value St as the algorithm values of each user Family ranking；

If the convergence during this can prove that transfer matrix P restrains, need to meet 3 conditions:

P is random matrix；

P is irreducible；

P is aperiodic；

For first item requirement, random matrix: enabling the i row j that Pij is P arrange, there is an any i=1,2 ... n and j=1,2 ... n, Pij >=0, and meet any i=1 simultaneously, 2 ... n, Pij are 1 to j summation, it is clear that matrix P is non-negative, and meet every a line and It is 1；

For second condition, matrix P be the matrix that meets the requirements and if only if only and the corresponding network of P it is oriented Image be strong continune (any two node reachable) network image i.e. any two points between can find access, and due to this calculation In method transfer matrix P, all elements be all it is complete just therefore certainly exist such access, therefore matrix P meet it is irreducible this Part；

For third condition, periodically refer to that iterative value is changed by regular repeatability, due to that can obtain according to relevant knowledge Know that aperiodic and prime matrix is relationship of equal value, and it is positive matrices that prime matrix, which refers to that matrix has the number of a power, because P is complete Portion's element is positive, and P also necessarily meets equivalents certainly, that is to say, that meets the aperiodic condition of third；

The transfer matrix that mainly spends time at of this method calculates simultaneously, and independent cascade model calculates and Markov changes In generation, needs to spend time o (n^2) altogether, element meter to the independent operation of all whole elements inside transition probability matrix in algorithm It is relatively easy to calculate itself, data can be directly obtained from table, carries out simple addition, division and exponent arithmetic, therefore shift square It is o (n^2) that battle array, which generates time complexity,；Inside independent cascade model, worst condition is exactly one user of single infection, until complete Portion's infection, time-consuming at this time is also o (n^2) order of magnitude, therefore time complexity is o (n^2).Although and Markov iterative process phase To taking a long time, but some are similar with classics Pagerank transfer matrix for the process in this method, same in Pagerank algorithm One user's transition probability is also to approach to divide equally, and research has shown that, Pagerank can generally be restrained in 50-75 iteration, to sum up, Total time complexity of the present invention is o (n^2), and time complexity can be received by being one.

In conclusion a kind of social networks node influence power sequence that present example described in the embodiment of the present invention provides System is suitable for the incomplete social networks of multidate information.It include social network user personal information, post information by obtaining With the data set of concern relation information, and data cleansing is carried out, simple pretreatment, the form needed for generating this system such as project merges Training set and test set；According to training set, the generation of social networks model transfer matrix and screening model are established, use is successively considered Model transition probability between family personal network position, user partial network influence, user itself model transition probability and user, it is raw The transfer matrix forwarded at model；According to different transfer matrixes, the emulation experiment of model propagation is carried out respectively, is compared model and is passed The actual value of range and the relative error of simulation value are broadcast, selects the smallest error to filter out optimal transfer matrix and corresponding instruction Practice parameter；According to test set and training parameter, transfer matrix is generated using identical modeling method, finally obtains stable influence Power sequence.

As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can It realizes by means of software and necessary general hardware platform.Based on this understanding, technical solution of the present invention essence On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment or embodiment of the invention Method described in part.

Apparatus and system embodiment described above is only schematical, wherein it is described as illustrated by the separation member Unit may or may not be physically separated, and component shown as a unit may or may not be object Manage unit, it can it is in one place, or may be distributed over multiple network units.It can select according to the actual needs Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying wound In the case that the property made is worked, it can understand and implement.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of protection of the claims Subject to.

Claims

1. a kind of social networks node influence power sort method, which is characterized in that this method includes following process step:

Step S110: individual subscriber home tip, user are collected and posts information and user to information, the personal homepage is believed Breath, the user post information and the user pre-processes information, form training set and test set；

Step S120: according to the training set, the transfer matrix model of model is established, the transfer matrix model is emulated It calculates, obtains optimal training parameter；

Step S130: establishing the test transfer matrix of model forwarding according to the test set in conjunction with the optimal training parameter, right The test transfer matrix is calculated, and the social networks node influence power ranking results are obtained.

2. the method according to claim 1, wherein the step S110 is specifically included:

It collects personal homepage information, user and posts information and user to information, form data set；Wherein, the personal homepage letter Breath is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number；

The data set is cut into training set and test set on demand, the training set includes training set family personal information and instruction Practice collection user to information；The test set includes test set family personal information and test set user to information.

3. described to be built according to the training set according to the method described in claim 2, it is characterized in that, in the step S120 The transfer matrix model of vertical model specifically includes:

Wherein, X indicates the tradeoff that is posted and comments on significance level of the social networks to user U, M_UIndicate the sum of posting of user U, T_UIndicate that user U's enlivens duration, Z_UIndicate the model of user U is turned note number, P_UIndicate user U model by comment number；

Step S124: K-shell decomposition algorithm is utilized, the Probability p that user U in the training set forwards oneself model is obtained_uu:

Step S126: according to p_uuAnd P_uv, obtain the trained transfer matrix P:

4. described to the transfer matrix mould according to the method described in claim 3, it is characterized in that, in the step S120 Type carries out the optimal training parameter of emulation acquisition and specifically includes:

Number average ranking is forwarded according to model successively to choose in the training set C user and corresponding be really posted number M_c, multiple and different m values corresponded to multiple and different training transfer matrix P, passed respectively to each P using independent cascade model Emulation experiment is broadcast, the expectation for obtaining C user is averagely posted several F_c；

Determine error MAPE value:

Wherein, c={ 1 ..., C }；

5. described to be assembled according to the test according to the method described in claim 4, it is characterized in that, in the step S130 Close the optimal training parameter establish model forwarding test transfer matrix specifically include:

According to the test set, select the optimal training parameter as f₁And f₂Tradeoff parameter, according to the step S121- The method of the step S126 establishes the test transfer matrix.

6. described to shift square to the test according to the method described in claim 5, it is characterized in that, in the step S130 Battle array carries out the calculating acquisition social networks node influence power ranking results and specifically includes:

If the initial value of each user's value vector St is 1, stable convergence value, calculating process are obtained using Markov iteration Are as follows:

St=(1 ... 1)_1×n×P_m,

Following procedure is repeated, when user is worth vector Euclid norm error less than predetermined accuracy twice for front and back, stops changing For process, stable convergence algorithm values S is obtained:

S=St_1×n×P_m,

Using each entry value of gained stable convergence algorithm values S as the algorithm values of each user, its size is compared, the social activity is obtained Network node influence power ranking results.

7. a kind of social networks node influence power ordering system, which is characterized in that the system includes:

Data preprocessing module, for collecting individual subscriber home tip, user post information and user to information, to described People's home tip and the user information of posting pre-process, and form training set and test set；

Training module carries out the transfer matrix model for establishing the transfer matrix model of model according to the training set Simulation calculation obtains optimal training parameter；

Test module, for shifting square in conjunction with the test that the optimal training parameter establishes model forwarding according to the test set Battle array, calculates the test transfer matrix, obtains the social networks node influence power ranking results.

8. system according to claim 7, which is characterized in that the data preprocessing module specifically includes:

9. system according to claim 8, which is characterized in that the training module includes:

User is to factor of influence determining unit, for being concerned number and another use according to user in the user couple The concern number at family determines the user of model forwarding to impact factor；

User's itself affect factor specifying unit, for quilt total, enlivening duration, model according to posting for one user Forwarding number and model by comment number, determine user's itself affect factor of the model forwarding；

Total factor of influence determining unit, for the tradeoff according to the user to impact factor and user's itself affect factor Parameter determines total impact factor of model forwarding；

User, for utilizing K-shell decomposition algorithm, obtains one user and forwards the note from probability determining unit is forwarded The probability of son；

Model is forwarded probability determining unit, for determine one user model by other forever user forward it is general Rate；

Transfer matrix model foundation unit, for forwarding the probability of oneself model and being forwarded for the model according to the user Probability establishes the transfer matrix model of model；

Optimal training parameter establishes unit, for propagate emulation in fact to transfer matrix model respectively using independent cascade model It tests, obtains expectation and be averagely posted number, determine error MAPE value, the corresponding training parameter of the smallest transfer matrix of MAPE value is selected to make For the optimal training parameter.

10. system according to claim 9, which is characterized in that the test module includes:

Test transfer matrix establishes unit, for establishing test transfer square in conjunction with the optimal training parameter according to the test set Battle array；

Influence power, which sorts, establishes unit, obtains the social networks node influence for calculate on the test transfer matrix Power ranking results.