Summary of the invention
The present invention proposes a kind of more authentic and valid maximizing influence start node choosing method based on nodal properties, between evaluation node influence power process in conjunction with user's liveness, user's susceptibility and user's cohesion are evaluated the influence power between node, according to the user force size between time response computing node, and in conjunction with network structure and User action log, the communication process to credit distribution and influence power builds, and finally chooses the maximum node of marginal income in conjunction with greedy algorithm and obtains initial effects power maximization node set.Concrete steps are as follows:
Step 1: process online community network data set, obtains real User action log and network structure file;
Step 2: traverse user user behaviors log, to each node in network, calculate user's liveness respectively, user's susceptibility and user's cohesion, to u node, user's liveness act (u) is defined as:
the number of the behavior that representation node u performs,
representation node u is subject to neighbor node to be affected and the behavior number of passive execution,
the total number of behavior of representative training centralized recording, parameter lambda controls two kinds of behavior quantitative indexes, and span is (0,1), user's susceptibility
be defined as follows:
record the moment of act of execution a first in all neighbor nodes of node u, t
ua () representation node u is finally affected and performs the moment of identical behavior a, τ
uaverage delay time between representation node u and its neighbor node; When the time span in two moment
it is longer,
value less, user cohesion p
v, ucomputing formula is as follows:
represent node u and node v act of execution kind union of sets collection,
represent node u and node v act of execution kind intersection of sets collection;
Step 3: respectively user's susceptibility and user's cohesion are normalized, user's average sensitivity of node u
cohesion average with user
computation rule is as follows:
the behavior of representative training centralized recording,
n (u) represents the adjacent node set of node u, v ∈ N (u), and initial user influence power is defined as:
τ average delay time between given node u and its neighbor node v
v, uwith initial user influence power iniful (u) of node u, use continuous attenuation function to convert the influence power between adjacent node v and node u, computing formula is as follows:
representation node v to the user force of node u for behavior a,
the moment of node v act of execution a, N
outu () represents that node v's goes out neighbor node set, u ∈ N
out(v), t
ua () representation node u is finally affected and performs the moment of identical behavior a;
Step 4: node v is distributed in definition allows it affect the credit of node u, for arbitrary two node v and node u, gives node v and allows its total credit affecting node u be defined as:
Wherein, N
inu () represents that node u's enters neighbor node set, node w be node u enter neighbours, w ∈ N
in(u), γ
w, ua () representative gives node w allows it affect the direct credit of adjacent node u, formula represents for arbitrary two node v and node u, give node v allow its total credit affecting node u equal with node u all enter neighbours for intermediate node, giving node v allows it affect the credit sum of products of node u, makes γ
w, ua () equals node w to the user force of node u for behavior a, namely
similarly, total credit computing formula that a node set S allows it affect node u is given as follows:
Step 5: propagate daily record by traverse user behavior, along behavior travel path counter movement credit assignment, computing node x is for the marginal income of all behaviors:
σ
cdx () is the influence power propagator for node x, S is current start node set, and V represents the general collection of all nodes in network,
it is allowed to affect the credit of node u for being given node x in node set V-S by behavior a,
for giving total credit that node v allows other nodes of its impact except current start node S set, Γ
s, xa () representative, for behavior a, gives the credit value of the S of current start node set, the credit value of node is higher, and to represent influence power larger, chooses the maximum node city start node S set of marginal income in conjunction with greedy algorithm recurrence;
Step 6: judge whether the number of element in start node set meets the requirements of number k, if reached, then obtain final start node set, if do not reached, then the credit distribution between the node except current start node set is upgraded, and come back to step 5.
The present invention be a kind of maximizing influence start node based on nodal properties choose evaluation scheme, this methods combining user personality, the influence power of behavior similarity more truly between evaluation node between incidence relation and user between user and behavior.Simultaneously, improve according to the evaluation criterion of time response to influence power and credit value, meet the characteristic that influence power between node decays in time, and in conjunction with network structure and User action log, the communication process to credit distribution and influence power builds, by the real behavior record of study user, the inventive method can choose start node more accurately and effectively, and obtains better influence power communication effect.
Embodiment
Below in conjunction with accompanying drawing, theoretical analysis and emulation experiment, the present invention is described in further detail.
Community network is configured to a non-directed graph G=(V by the present invention, E), wherein V represents the set of all nodes in figure, E represents the set of relation between Different Individual, S represents start node set, and σ (S) represents influence power propagator, in the present invention influence power is presented as the size of the credit value giving node, the influence power of the higher representation node of the credit value that node is assigned with is larger, so σ
cd(S) be defined as and give present node S set and allow it affect total credit of all the other nodes, namely
maximizing influence problem is exactly find number to be the start node set of k, makes final maximum by the expectation value of the node number successfully affected in the entire network.
The weights of node embody the specificity of node.On the one hand, different individualities often has different hobbies and different preference orientations, when different people is in the face of identical things time, whether accepts this things and depends on individuality the evaluation of its inherence and cognitive acceptance threshold size.On the other hand, each individuality is also different to other people influence power, such as, generally, recommendation from friend or relatives can produce stronger influence power compared to stranger, and the intimate degree between our in-service evaluation individuality or trusting degree embody this characteristic.The inventive method proposes one and in live network, propagates record create by information and simulate behavior communication process, assesses in conjunction with user's liveness, user's susceptibility and user's cohesion 3 aspect to the influence power between adjacent node simultaneously.Fig. 1 and Fig. 2 is respectively a kind of maximizing influence start node choosing method process flow diagram based on nodal properties and the committed step of the present invention's proposition, and concrete implementation step is as follows.
Step 1: process online community network data set, obtains real User action log and network structure file;
Step 2: traverse user user behaviors log, to each node in network, calculate user's liveness respectively, user's susceptibility and user's cohesion, to u node, user's liveness act (u) is defined as:
the number of the behavior that representation node u performs,
representation node u is subject to neighbor node to be affected and the behavior number of passive execution,
the total number of behavior of representative training centralized recording, parameter lambda controls two kinds of behavior quantitative indexes, and span is (0,1), user's susceptibility
be defined as follows:
record the moment of act of execution a first in all neighbor nodes of node u, t
ua () representation node u is finally affected and performs the moment of identical behavior a, τ
uaverage delay time between representation node u and its neighbor node; When the time span in two moment
it is longer,
value less, user cohesion p
v, ucomputing formula is as follows:
represent node u and node v act of execution kind union of sets collection,
represent node u and node v act of execution kind intersection of sets collection;
Step 3: respectively user's susceptibility and user's cohesion are normalized, user's average sensitivity of node u
cohesion average with user
computation rule is as follows:
the behavior of representative training centralized recording,
n (u) represents the adjacent node set of node u, v ∈ N (u), and initial user influence power is defined as:
τ average delay time between given node u and its neighbor node v
v, uwith initial user influence power iniful (u) of node u, use continuous attenuation function to convert the influence power between adjacent node v and node u, computing formula is as follows:
representation node v to the user force of node u for behavior a,
the moment of node v act of execution a, N
outv () represents that node v's goes out neighbor node set, u ∈ N
out(v), t
ua () representation node u is finally affected and performs the moment of identical behavior a;
Step 4: node v is distributed in definition allows it affect the credit of node u, for arbitrary two node v and node u, gives node v and allows its total credit affecting node u be defined as:
Wherein, N
inu () represents that node u's enters neighbor node set, node w be node u enter neighbours, w ∈ N
in(u), γ
w, ua () representative gives node w allows it affect the direct credit of adjacent node u, formula represents for arbitrary two node v and node u, give node v allow its total credit affecting node u equal with node u all enter neighbours for intermediate node, giving node v allows it affect the credit sum of products of node u, makes γ
w, ua () equals node w to the user force of node u for behavior a, namely
similarly, total credit computing formula that a node set S allows it affect node u is given as follows:
Step 5: propagate daily record by traverse user behavior, along behavior travel path counter movement credit assignment, computing node x is for the marginal income of all behaviors:
σ
cdx () is the influence power propagator for node x, S is current start node set, and V represents the general collection of all nodes in network,
it is allowed to affect the credit of node u for being given node x in node set V-S by behavior a,
for giving total credit that node v allows other nodes of its impact except current start node S set, Γ
s, xa () representative, for behavior a, gives the credit value of the S of current start node set, the credit value of node is higher, and to represent influence power larger, chooses the maximum node city start node S set of marginal income in conjunction with greedy algorithm recurrence;
Step 6: judge whether the number of element in start node set meets the requirements of number k, if reached, then obtain final start node set, if do not reached, then the credit distribution between the node except current start node set is upgraded, and come back to step 5.
In order to verify validity of the present invention, below just according to influence power function, theory deduction analysis is carried out to the marginal income of node.
Giving a node set S allows it affect total credit Γ of node u
s, ua () equals to give that S set allows it first affect all node u directly enters neighbor node w, then affect the credit sum of node u, is namely expressed as:
The neighborhood that enters of node u is expressed as N
in(u), node w be node u enter neighbours, w ∈ N
in(u) ∩ w ∈ S, Γ
s, wa () allows it affect the credit of w for giving S set for behavior a, γ
w, ua () allows it affect the credit of immediate neighbor node u for giving w.
After a node x is inserted into start node S set because marginal income is maximum, need to upgrade the credit distribution of other nodes in network, because after electing node x as start node, initial time he by the sender as information, and shielding is as the propagation of the influence power of relay function, so need when calculating the distribution of the credit except current start node S set between any two points the credit value deducting all path allocation by node x
After then node x being inserted initial sets S, the difference of credit is
Since it is known node x is not in original start node set, so credit difference is the node giving x is that to give total credit that credit that node x allows it affect node u and the dispense path without node x give S only poor, namely
Can be by according to formula (2)
abbreviation is
because behavior travel path obeys temporal characteristics, can not loop be there is, so node v can not in the dispense path of node x and node u, therefore
by formula (4) abbreviation be
The marginal income computing formula of node x for all behaviors can be obtained according to formula (5):
Extract on the right of formula
can abbreviation be
parameter position in formula is converted, can obtain
Analyzed from above-mentioned theory, the marginal income of computing node x only need calculate and give the node x total credit allowing other nodes of its impact except current start node S set
and for behavior a, give the credit value Γ of the S of current start node set
s, xa (), demonstrates accuracy and the authenticity of the inventive method.
In an experiment, the data set that we obtain gathers from real photo sharing website Flickr, and total data collection comprises 105938 photos.Be divided into 4 parts according to the source of photo, we select one of them as experimental subjects, comprise 2602 nodes and 222292 limits and 24648 photos.Because time-constrain and network structure are obeyed in credit distribution, so we process raw data, obtain two files, the incidence relation between the author of wherein map file recording photograph, User action log file comprises the user behavior record with sequence of event.User behavior record is divided into two parts again by us, namely comprises the training set user behavior record of 2724 kinds of behaviors and comprises the test set user behavior record of 1816 kinds of behaviors.75269 records are comprised in User action log file.
Embodiment 1:
In this embodiment, we make parameter lambda=0.5, and the object done like this is the behavior initiatively initiated in order to balanced node u and be subject to neighbor node and affect and proportion shared by the behavior of passive execution.
As shown in Figure 3, working time needed for start node is chosen a little more than choosing node on traditional credit distributed model (CD) by the inventive method (CD2), distribution working time of recording as can be seen from figure, along with the increase of start node, the rising tendency of working time is linear, and is obviously less than the execution time choosing equivalent amount start node on independent cascade model (IC) and linear threshold model (LT).The results show the inventive method operationally between aspect there is high efficiency and extensibility.
As shown in Figure 4, along with the increase of initial sets node, the inventive method (CD2) chooses consumed memory headroom a little more than traditional credit distributed model (CD) for start node, this is because add the assessment of nodal properties and the evaluation work of user force, but memory headroom consumption will lower than independent cascade model (IC) and linear threshold model (LT) when choosing equivalent amount start node for the inventive method, this advantage is more obvious along with increasing of choosing of start node.Experimental result shows, for the work that maximizing influence start node is chosen, no matter the inventive method is all show higher advantage and efficiency in time operation or in memory headroom consumption.
Embodiment 2:
As shown in Figure 5, this experimental technique (CD2) is compared and is had stronger description and transmission capacity with conventional model (CD) to influence power.Compared to independent cascade model (IC) and linear threshold model (LT), another advantage of the inventive method is that it learns the real behavior record of user, and in conjunction with user characteristics instead of only according to network structure, user force is evaluated, so can reflect that user behavior and influence power are propagated more truly, there is higher authenticity and reliability.
Contrast uses the inventive method (CD2) and CD two kinds of methods to carry out influence power propagation forecast for the behavior of test centralized recording, test set comprises whole 1816 kinds of behaviors, after experiment terminates, we propagate result according to real influence power and sort to different behaviors, and experiment are predicted the outcome and to contrast with actual value.As shown in Figure 6, the inventive method and classic method to the influence power propagation forecast result of test set behavior sample all lower than real influence power propagation values, but as can be seen from the comparison result, compared to classic method (CD), the inventive method has optimization to a certain degree and lifting to user's behavior prediction effect, and has higher influence power prediction accuracy.
From above experiment, no matter the inventive method all shows efficient characteristic from working time or memory headroom consumption aspect, record is propagated by study real behavior, the propagation of user behavior and influence power can be reflected more truly, in addition, experiment proves that the inventive method has higher accuracy and reliability to choosing of start node.