A kind of maximizing influence start node choosing method based on nodal properties
Technical field
The invention belongs to field of computer technology, it is related to a kind of maximizing influence start node choosing based on nodal properties
Take method.
Background technology
The development of internet is not only that we bring convenient life style, us the mode of exchanges and communication is occurred
Huge change.We make friends becomes more to enrich many with sharing the approach of wisdom also with the development of online community network
Sample.As increasing people is using the more easily data exchange service such as mobile terminal, our social structure and
Social relation network becomes more complicated and close.Generally, we use graph structure to the person to person in social groups
Between relation be modeled, node on behalf individual, and side or arc represent the relation between individual.Pass through online community network
Relation between middle user, information can be propagated with minimum cost at a terrific speed, and Just because of this, influence power exists
Propagation in community network and it is distributed as viral marketing and brings unprecedented opportunities and challenge, how finds initial user
Colony causes the influence spread scope of information finally is maximum to turn into one of hot research field.
For maximizing influence problem, the research work of Most current is all based on cascading traditional classical influence power
The optimization of model, or the degree of accuracy of heuritic approach is improved, the assessment for influence power is then based primarily upon network knot
Structure and node angle value, the behavioral similarity between the characteristic of user itself and user and user are seldom mined and are applied to pair
In the assessment of node influence power.
For above-mentioned deficiency, it is proposed that a kind of evaluation scheme for node initial effects power, this method is combined
User personality, behavior similarity is commented the influence power between node between incidence relation and user between user and behavior
Valency.Meanwhile, we are allocated according to the influence power evaluation criterion between the node proposed to credit, and are obtained with reference to greedy algorithm
Node set is maximized to initial effects power.
The content of the invention
The present invention proposes a kind of more authentic and valid maximizing influence start node based on nodal properties and chosen
Method, combines user activity, user's susceptibility and user's cohesion are to section between evaluation node during influence power
Influence power between point is evaluated, and according to the user force size between time response calculate node, and combines network
Structure and User action log are distributed to credit and the communication process of influence power is built, and are finally combined greedy algorithm and are chosen side
The node of border Income Maximum obtains initial effects power and maximizes node set.Comprise the following steps that:
Step 1:Online community network data set is handled, real User action log and network structure text is obtained
Part;
Step 2:Traverse user user behaviors log, to each node in network, calculates user activity, Yong Humin respectively
Sensitivity and user's cohesion, to u node, user activity act (u) is defined as:
The number of the behavior of node u execution is represented,Represent what node u was influenceed by neighbor node and passively performed
Behavior number,The behavior total number of training centralized recording is represented, parameter lambda is controlled to two kinds of behavior quantitative indexes, value
Scope is (0,1), user's susceptibilityIt is defined as follows:
At the time of recording in node u all neighbor nodes process performing a first, tu(a) node u finally quilts are represented
At the time of influenceing and perform identical behavior a, τuRepresent the average delay time between node u and its neighbor node;When two
The time span at quarterIt is longer,Value just smaller, user's cohesion pV, uCalculation formula is as follows:
Node u and node v process performing species union of sets collection are represented,Represent that node u and node v performs row
For species intersection of sets collection;
Step 3:User's susceptibility and user's cohesion are normalized respectively, the node u average sensitivity of user
DegreeWith the average cohesion of userComputation rule is as follows:
The behavior of training centralized recording is represented,N (u) expression nodes u adjacent node set, v ∈ N (u), just
Beginning user force is defined as:
Given average delay time τ between node u and its neighbor node vV, uWith node u initial user influence power
Iniful (u), enters line translation to the influence power between adjacent node v and node u, calculation formula is such as using continuous attenuation function
Under:
Node v is represented to user forces of the node u for behavior a,It is node v process performings a
At the time of, Nout(u) represent that node v's goes out neighbor node set, u ∈ Nout(v), tu(a) node u is represented finally to be affected and hold
At the time of row identical behavior a;
Step 4:Node v is distributed in definition allows it to influence node u credit, for arbitrary two node v and node u,
Giving node v allows it to influence node u total credit to be defined as:
Wherein, Nin(u) represent that node u's enters neighbor node set, node w enters neighbours, w ∈ N for node u'sin(u),
γW, u(a) represent and give node w and allow it to influence adjacent node u direct credit, formula represent for arbitrary two node v and
Node u, give node v allow its influence node u total credit be equal to using node u it is all enter neighbours as intermediate node, give and save
Point v allows it to influence the node u credit sum of products, makes γW, u(a) it is equal to node w to customer impacts of the node u for behavior a
Power, i.e.,Similarly, giving a node set S allows it to influence node u total credit calculation formula
It is as follows:
Step 5:Daily record is propagated by traverse user behavior, credit assignment is inversely carried out along behavior propagation path, is calculated
Marginal benefits of the node x for all behaviors:
σcd(x) it is the influence power propagation function for node x, S is current start node set, V represents all in network
Total collection of node,It is allowed to influence node u credit to give node x in node set V-S by behavior a,To give total credit that node v allows its other nodes of the influence in addition to current start node set S,
ΓS, x(a) represent for behavior a, give the S of current start node set credit value, the credit value of node is higher to represent influence
Power is bigger, and the maximum node insertion start node set S of marginal benefit is chosen with reference to greedy algorithm recurrence;
Step 6:Judge whether the number of element in start node set has reached the number k of requirement, if reached
Arrive, then obtain final start node set, if not up to, between the node in addition to current start node set
Credit distribution is updated, and comes back to step 5.
The present invention is a kind of selection evaluation scheme of the maximizing influence start node based on nodal properties, this method
User personality is combined, behavior similarity more truly evaluation node between incidence relation and user between user and behavior
Between influence power.Meanwhile, the evaluation criterion of influence power and credit value is improved according to time response, meet node it
Between the characteristic that decays with the time of influence power, and combine network structure and User action log credit is distributed and influence power biography
The process of broadcasting is built, and by learning the real behavior record of user, the inventive method can more accurately and effectively be chosen just
Beginning node, and obtain more preferable influence power communication effect.
Brief description of the drawings
Fig. 1 is a kind of maximizing influence start node choosing method flow based on nodal properties proposed by the present invention
Figure;
Fig. 2 is a kind of key of maximizing influence start node choosing method based on nodal properties proposed by the present invention
Step;
Fig. 3 is that different methods in embodiment 1 choose consumed run time comparison diagram for start node, and left figure is
4 kinds of methods run time experimental result in 0 to 160 minutes spans is contrasted, right figure be 2 kinds of methods 1 to 3 minutes when
Between run time experimental result contrast in span;
Fig. 4 is that different methods in embodiment 1 choose consumed memory headroom comparison diagram for start node, and left figure is
4 kinds of methods are 2 kinds of methods in 0 sky for arriving 50MB in the 0 experimental result contrast that internal memory is consumed into 100MB spatial extents, right figure
Between in span consumption internal memory experimental result contrast;
Fig. 5 is influence power communication effect comparison diagram of the different methods in embodiment 2 for start node set, and left figure is
The experimental result curve representation that 4 kinds of methods are propagated for the start node set influence power of selection, right figure is 2 kinds of methods pair
In the histogram graph representation form of the start node set influence power propagation experimentation result of selection.
Fig. 6 is that the influence power in embodiment 2 for test set user behavior predicts the outcome comparison diagram, and left figure is 2 kinds of methods
The scatter diagram representation predicted the outcome to test set behavioral implications power, right figure is that 2 kinds of methods predict the outcome to behavioral implications power
Histogram graph representation form.
Embodiment
Below in conjunction with accompanying drawing, theory analysis and emulation experiment, the present invention is described in further detail.
Community network is configured to a non-directed graph G=(V, E) by the present invention, and wherein V represents the set of all nodes in figure,
E represents the set of relation between Different Individual, and S represents start node set, and σ (S) represents influence power propagation function, in the present invention
The middle size that influence power is presented as to the credit value for assigning node, the allocated higher influence power for representing node of credit value of node
It is bigger, so σcd(S) it is defined as giving total credit that present node set S allows it to influence remaining node, i.e.,Maximizing influence problem is exactly to find the start node set that number is k so that finally whole
The desired value of the node number successfully influenceed in network is maximum.
The weights of node embody the specificity of node.On the one hand, different individuals often have different hobby and not
With preference orientation, when different people faces identical things, if receive the things depend on individual in it
Evaluation and cognitive acceptance threshold size.On the other hand, each individual is also different to other people influence power, for example, one
As in the case of, the recommendation from friend or relatives can produce stronger influence power compared to stranger, our in-service evaluations
Intimate degree or trusting degree between body embody this characteristic.The inventive method proposes one kind by information in live network
It is middle to propagate record to create and simulate behavior communication process, in combination with user activity, user's susceptibility and user's cohesion 3
Aspect is estimated to the influence power between adjacent node.Fig. 1 and Fig. 2 are respectively proposed by the present invention a kind of based on nodal properties
Maximizing influence start node choosing method flow chart and committed step, specific implementation step is as follows.
Step 1:Online community network data set is handled, real User action log and network structure text is obtained
Part;
Step 2:Traverse user user behaviors log, to each node in network, calculates user activity, user respectively
Susceptibility and user's cohesion, to u node, user activity act (u) is defined as:
The number of the behavior of node u execution is represented,Represent what node u was influenceed by neighbor node and passively performed
Behavior number,The behavior total number of training centralized recording is represented, parameter lambda is controlled to two kinds of behavior quantitative indexes, value
Scope is (0,1), user's susceptibilityIt is defined as follows:
At the time of recording in node u all neighbor nodes process performing a first, tu(a) node u finally quilts are represented
At the time of influenceing and perform identical behavior a, τuRepresent the average delay time between node u and its neighbor node;When two
The time span at quarterIt is longer,Value just smaller, user's cohesion pV, uCalculation formula is as follows:
Node u and node v process performing species union of sets collection are represented,Represent that node u and node v performs row
For species intersection of sets collection;
Step 3:User's susceptibility and user's cohesion are normalized respectively, the node u average sensitivity of user
DegreeWith the average cohesion of userComputation rule is as follows:
The behavior of training centralized recording is represented,N (u) expression nodes u adjacent node set, v ∈ N (u), just
Beginning user force is defined as:
Given average delay time τ between node u and its neighbor node vV, uWith node u initial user influence power
Iniful (u), enters line translation to the influence power between adjacent node v and node u, calculation formula is such as using continuous attenuation function
Under:
Node v is represented to user forces of the node u for behavior a,It is node v process performings a
At the time of, Nout(v) represent that node v's goes out neighbor node set, u ∈ Nout(v), tu(a) node u is represented finally to be affected and hold
At the time of row identical behavior a;
Step 4:Node v is distributed in definition allows it to influence node u credit, for arbitrary two node v and node u,
Giving node v allows it to influence node u total credit to be defined as:
Wherein, Nin(u) represent that node u's enters neighbor node set, node w enters neighbours, w ∈ N for node u'sin(u),
γW, u(A) represent and give node w and allow it to influence adjacent node u direct credit, formula represent for arbitrary two node v and
Node u, give node v allow its influence node u total credit be equal to using node u it is all enter neighbours as intermediate node, give and save
Point v allows it to influence the node u credit sum of products, makes γW, u(a) it is equal to node w to customer impacts of the node u for behavior a
Power, i.e.,Similarly, giving a node set S allows it to influence node u total credit calculation formula
It is as follows:
Step 5:Daily record is propagated by traverse user behavior, credit assignment is inversely carried out along behavior propagation path, is calculated
Marginal benefits of the node x for all behaviors:
σcd(x) it is the influence power propagation function for node x, S is current start node set, V represents all in network
Total collection of node,It is allowed to influence node u credit to give node x in node set V-S by behavior a,To give total credit that node v allows its other nodes of the influence in addition to current start node set S,
ΓS, x(a) represent for behavior a, give the S of current start node set credit value, the credit value of node is higher to represent influence
Power is bigger, and the maximum node insertion start node set S of marginal benefit is chosen with reference to greedy algorithm recurrence;
Step 6:Judge whether the number of element in start node set has reached the number k of requirement, if reached
Arrive, then obtain final start node set, if not up to, between the node in addition to current start node set
Credit distribution is updated, and comes back to step 5.
In order to verify effectiveness of the invention, below just according to the marginal benefit of influence power function pair node carry out theory push away
Lead analysis.
Giving a node set S allows it to influence node u total credit ΓS, u(a) it is equal to and gives set S and allow it first to influence
All node u's directly enters neighbor node w, then influences node u credit sum, that is, is expressed as:
The node u neighborhood that enters is expressed as Nin(u), node w enters neighbours, w ∈ N for node uin(u) ∩ w ∈ S,
ΓS, w(a) it is to give set S for behavior a to allow it to influence w credit, γW, u(a) it is allowed to influence immediate neighbor node to give w
U credit.
, it is necessary to which other in network are saved after start node set S being inserted into as a node x because of marginal benefit maximum
Point credit distribution is updated because node x is elected as after start node, initial time he by as the transmission of information
Person, and the propagation of the influence power as relay function is shielded, so calculating any two points in addition to current start node set S
Between credit distribution when need to subtract the credit value of all path allocations by node x
Then node x is inserted after initial sets S, the difference of credit is
Since it is known node x is not in original start node set, so credit difference is to give x node to give
Node x allows it to influence node u credit and total credit that the distribution path without node x gives S only poor, i.e.,
Can be by according to formula (2)Abbreviation isBecause behavior is passed
Broadcast path and obey temporal characteristicses, it is impossible to there is loop, thus node v can not possibly in node x and node u distribution path,
ThereforeIt is by formula (4) abbreviation
Marginal benefit calculation formula of the node x for all behaviors can be obtained according to formula (5):
Extract on the right of formulaCan using abbreviation as
Line translation is entered into parameter position in formula, can be obtained
From above-mentioned theory analyze, calculate node x marginal benefit need to only calculate give node x allow its influence except currently
Total credit of other nodes outside start node set SAnd for behavior a, give current initial
The S of node set credit value ΓS, x(a) accuracy and authenticity of the inventive method, are demonstrated.
In an experiment, the data set that we obtain is from real photo sharing website Flickr collections, total data set
Include 105938 photos.4 parts are divided into according to the source of photo, we select one of them as experimental subjects, bag
Containing 2602 nodes and 222292 sides and 24648 photos.Because time-constrain and network structure are obeyed in credit distribution,
We are handled initial data, obtain two files, the wherein incidence relation between the author of map file recording photograph, are used
Family user behaviors log file is included to be recorded with the user behavior of sequence of event.User behavior record is divided into two by us again
Point, i.e., the training set user behavior comprising 2724 kinds of behaviors records and included the test set user behavior record of 1816 kinds of behaviors.
75269 records are included in User action log file.
Embodiment 1:
In this embodiment, we make parameter lambda=0.5, the purpose for the arrangement is that for the row of balanced node u active initiations
For with the proportion shared by the behavior for being influenceed and passively performing by neighbor node.
From the figure 3, it may be seen that the run time needed for start node is chosen by the inventive method (CD2) is slightly above in tradition letter
With the selection on distributed model (CD) to node, the run time distribution recorded from figure is as can be seen that with start node
Increase, the growth trend of run time is linear, and is considerably less than in independent cascade model (IC) and linear threshold model
(LT) the execution time of equivalent amount start node is chosen on.Aspect has the results show the inventive method at runtime
High efficiency and scalability.
As shown in Figure 4, with the increase of initial sets node, the inventive method (CD2) is chosen for start node to be disappeared
The memory headroom of consumption is slightly above traditional credit distributed model (CD), because adding assessment and the customer impact of nodal properties
The evaluation work of power, but memory headroom consumption will be less than independent in the case where choosing equivalent amount start node for the inventive method
Cascade model (IC) and linear threshold model (LT), this advantage increases and more obvious with what start node was chosen.It is real
Test result to show, the work chosen for maximizing influence start node, the inventive method is either in terms of time operation
Or higher advantage and efficiency is shown in terms of internal memory space consuming.
Embodiment 2:
As shown in Figure 5, this experimental method (CD2) is compared has stronger description and biography to influence power with conventional model (CD)
Broadcast ability.Compared to independent cascade model (IC) and linear threshold model (LT), the inventive method another advantage is that it
It is that the real behavior record of user is learnt, and combines user characteristics rather than only only in accordance with network structure to customer impact
Power is evaluated, so can more truly reflect that user behavior and influence power are propagated, with higher authenticity and reliably
Property.
Contrast carries out influence power biography using two methods of the inventive method (CD2) and CD for the behavior of test centralized recording
Prediction is broadcast, test set includes all 1816 kinds of behaviors, and we propagate result to difference according to real influence power after experiment terminates
Behavior be ranked up, and experiment predicted the outcome contrasted with actual value.As shown in fig. 6, the inventive method and tradition side
Method is below real influence power propagation values to the influence power propagation forecast result of test set behavior sample, but from comparing result
As can be seen that compared to conventional method (CD), the inventive method has a certain degree of optimization to user's behavior prediction effect and carried
Rise, and with higher influence power prediction accuracy.
It was found from being tested more than, no matter the inventive method shows height in terms of run time or memory headroom consumption
The characteristic of effect, propagates record by learning real behavior, can more truly reflect the propagation of user behavior and influence power, remove
Outside this, experiment proves that selection of the inventive method to start node has higher accuracy and reliability.