CN106202614A - The method that anomalous structure evolution in dynamic network finds - Google Patents

The method that anomalous structure evolution in dynamic network finds Download PDF

Info

Publication number
CN106202614A
CN106202614A CN201610474974.2A CN201610474974A CN106202614A CN 106202614 A CN106202614 A CN 106202614A CN 201610474974 A CN201610474974 A CN 201610474974A CN 106202614 A CN106202614 A CN 106202614A
Authority
CN
China
Prior art keywords
role
node
network
frequent mode
dynamic network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610474974.2A
Other languages
Chinese (zh)
Inventor
李川
李艳梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201610474974.2A priority Critical patent/CN106202614A/en
Publication of CN106202614A publication Critical patent/CN106202614A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/18Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to technical field of data processing, it is provided that the method that the anomalous structure evolution in a kind of dynamic network finds, the method comprising the steps of: after a given dynamic network, excavates the frequent mode of role's Temporal Evolution in whole network;Carry out role by the different degree of comparison node to frequent mode to develop anomaly.The present invention uses role to portray the architectural feature of network, propose role first to develop abnormal concept, temporal characteristics in conjunction with dynamic network, propose role based on mode excavation to develop anomaly algorithm, analysis of the present invention and excavate the abnormal phenomena occurred during network structure changes over and be conducive to understanding the dynamic behaviour of complication system.

Description

The method that anomalous structure evolution in dynamic network finds
Technical field
The invention belongs to technical field of data processing, find particularly to the anomalous structure evolution in a kind of dynamic network Method.
Background technology
At present, the network that dynamic network is node and limit can change in time, the many complication systems of real world are all There is the feature of dynamic network.Big to the social networks of social system, scientific research cooperative person's network, mail network, little to ecosystem The protein network etc. of system.In view of the constraint such as Commercial Secret Protection etc. of current conditions, many networks cannot obtain enough volumes External information, thus the analysis of network infrastructure always is the focus of research.Constantly advance however as the time, dynamic network Structure be in changing during, the abnormal node of mining structure evolutionary process is to understanding that complication system dynamic behaviour has Significant, also it is a new problem of complex network research field.
For the problem of representation of network structure, document 1 (K.Henderson, B.Gallagher, L.Li, L.Akoglu, T.Eliassi-Rad,H.Tong,and C.Faloutsos.RolX:Role Extraction and Mining in Large Networks.In KDD, 2012.) propose to portray with potential " role " (role) the structure behavior of node, Jiao Sedai first The type of certain network structure of table, the node with analog structure belongs to same role, such as Centroid, fringe node etc.. Be different from community discovery, the usual position of node in community be close, and role portrays be node in a network Structure, the node with same role is likely distributed in any position of network.It is true that role is permissible after finding step Obtaining role's value distribution for each node, the number of role determines the dimension of vector, such as 3 role R1~R3 table respectively Show Centroid, Bridge Joints, fringe node, then role's distribution of certain node may be R1:0.8, R2:0.1, R3:0.1}, Represent that the value on three roles of this node is respectively 0.8,0.1 and 0.1.Owing to role's value of this node substantially biases toward angle Color R1, then can directly take the role that R1 Centroid is this node.
From the point of view of abstract mathematical model, dynamic network may be considered a figure snapshot sequence.A given Dynamic Networks Network, can pass through analysis node role's evolution trend at role's distribution situation analysis network of each snapshot time.If only examining Consider 3 kinds of simple character types R1~R3, represent that Centroid, Bridge Joints, fringe node, each node can obtain respectively To a corresponding role distribution, connecting according to the time sequential of dynamic network and obtain role's distribution series, this sequence can be with table Show certain evolution trend of network role, as sequence < 1:{R1:0, R2:0.1, R3:0.8}, 2:{R1:0.2, R2:0.3, R3: 0.5}, 3:{R1:0.6, R2:0.3, R3:0.1} > the part edge node in network may be represented the most gradually Develop into the process of Centroid.But in reality, always can there is the node that evolutionary process is abnormal, such as in social activity In network, the less marginal position being in network of user's vermicelli being newly added, the responsible consumer with numerous vermicelli is often net The Centroid of network, in general the user of marginal position needs constantly to save bit by bit human connection and just can be increasingly becoming important center and use Family, thus this evolutionary process can be represented by above-mentioned sequence, if but this user certain moment occur suddenly one important Social events or user's deliberate propagation behavior, then role's distribution series of this node is certain to be become by fringe node suddenly Centered by node, as may be < 1:{R1:0, R2:0.1, R3:0.8}, 2:{R1:0, R2:0.1, R3:0.8}, 3:{R1:0.8, R2:0.1, R3:0.1} >.It is that role develops abnormity point (Role that the present invention defines the abnormal node of this role's evolution trend Evolving Outliers, is abbreviated as REOutliers).From definition, REOutliers is based purely on the structure angle of node Color proposes, it is not necessary to depend on any extra network information, and the network system of real world is limited to factors and often believes Breath scarcity, thus the excavation of REOutliers is likely to can help to disclose the anomalous event being hidden within dynamic network, this A little abnormity point itself also should be paid close attention to.
Summary of the invention
[solving the technical problem that]
It is an object of the invention to provide the method that the anomalous structure evolution in a kind of dynamic network finds, by node role Evolution condition, role's distribution of node, excavate the node R EOutliers set that evolutionary process is abnormal.
[technical scheme]
The present invention is achieved by the following technical solutions.
The present invention relates to the method that the anomalous structure evolution in a kind of dynamic network finds, including step:
After A, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network;
B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
As one preferred embodiment, described step A specifically includes step:
A1, each figure snapshot is carried out role's discovery respectively, obtain role's distribution series of each node;
A2, according to Aprioir algorithm excavate role's distribution series frequent mode set, the frequent mode that excavation is obtained Gather the role's evolution trend as network.
As another preferred embodiment, each figure snapshot is carried out role by described step A1 respectively and finds to obtain The method of role's distribution series of each node is:
To each node, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and obtain To recursive feature, basic feature and recursive feature collectively form role's distribution series of node.
As another preferred embodiment, described step A2 excavates role's distribution series according to Aprioir algorithm Method be:
The most respectively role's distribution series in each moment is clustered, all cluster centres all conducts obtained Length-1 pattern, by design support screening Length-1 frequent mode, and closes downwards according to Apriori algorithm checking Attribute, then calculates longer frequent mode.
As another preferred embodiment, described support is configured according to the distance of node to cluster centre.
As another preferred embodiment, described step B specifically includes step:
The identical frequent mode of point of all cover times is merged and obtains a time structure;
Each time structure is calculated respectively the frequent mode most preferably mated with this node, obtains node and all structures pair The optimal coupling frequent mode set answered, then summing node obtains finally with all optimal intensity of anomalys mating frequent mode Abnormality score.
As another preferred embodiment, described dynamic network is node set or limit set.
The present invention is described in detail below.
The many complication systems of real world all have the feature of dynamic network, analyze and excavate network structure and change over During occur abnormal phenomena be conducive to understanding the dynamic behaviour of complication system.Therefore, a given dynamic network is fast According to sequence, this invention address that the evolution condition of research node role, object of study is role's distribution of node, and target is to excavate The node R EOutliers set that evolutionary process is abnormal.The present invention, by the definition of REOutliers, proposes a kind of based on mould The role of formula develops, and (Pattern-based Role Evolving Outliers Detection, writes a Chinese character in simplified form Outlier Detection Algorithm For P-EROD).
Specifically, the inventive method includes step:
After A, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network;
B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
In step, two processes are specifically included:
Step A1, each figure snapshot is carried out role's discovery respectively, obtain role's distribution series of each node.
In step A1, role is the basis of the present invention, and the detailed process that role finds is:
The problem input of step A1 definition is a dynamic network, and first the present invention defines dynamic network.
Define 1. dynamic networks (Dynamic Networks): set D=< N, E > and represent a dynamic network, N=< N1,N2,…,NT> is node set, E=< E1,E2,…,ET> is limit set.The present invention only considers the architectural feature of network, Thus D is non-directed graph.D=< N, E > are divided D=< S on a time period1,S2,…,ST>, wherein T represents snapshot number, St =< Nt,Et> represents the network snapshot in t, NtRepresent the node set of t network, EtRepresent limit set, StEqually For non-directed graph.
Role finds that target is to portray the structure type appeared in network respectively with a series of potential roles.In with As a example by heart node, Bridge Joints and three kinds of network structures of fringe node, it is also desirable to it appeared that Three role is respectively Corresponding three kinds of structures.The role of node finds to include that feature extraction and role find two steps.
Feature extraction purpose is that the basic structural feature (such as degree, participating in the number with triangle) with node represents joint Point.To each node v, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and passed Returning feature, basic feature and recursive feature to collectively form the characteristic vector of node, Characteristic Number is automatically determined by algorithm.
Define 2. nodes-eigenmatrix sequence V=< V1,V2,…,VT>: to StCarry out features above extraction process can obtain Node-eigenmatrixWherein ntFor the node number of t, f is from StThe Characteristic Number of middle extraction.To each Snapshot carries out feature extraction respectively, obtains node-eigenmatrix sequence V=< V1,V2,…,VT>.
Define 3. nodes-role matrix sequence G=< G1,G2,…,GT>: given node-eigenmatrixWith And a positive integer r < min (nt, f), use Non-negative Matrix Factorization NMF (Non-negative Matrix Factorization) method looks for nonnegative matrixAndMake GtF≈Vt, wherein r is role Number,It it is a node-role's matrix.Still with Centroid, Bridge Joints and three kinds of network knots of fringe node As a example by structure, there are r=3, GtLine n to be node n be distributed the role of t, such as { R1:0.1, R2:0.1, R3:0.8}.By V=< V1,V2,…,VT> can obtain whole node-role matrix sequence G=< G1,G2,…,GT>.
After obtaining node-role's matrix, the role of maximum occurrences directly can be ignored it as the role of node The value of he all roles.As can by be above given fringe node role's sequence < 1:{R1:0.1, R2:0.1, R3:0.8}, 2:{R1:0.2, R2:0.3, R3:0.5}, 3:{R1:0.6, R2:0.3, R3:0.1} > be directly expressed as 1:{R3}, 2:{R3}, 3: R1} > it is { R1:0.1, R2:0.1, R3:0.8}.Although do so can realize a certain degree of simplification, but can cause opening up Show node role's real change process, it is clear that simplifying later role's sequence cannot develop to Centroid by reflecting edge node Details.Therefore present invention preserves complete role's distribution, the role of node is regarded as a kind of soft probability distribution, all roles The common structure representing node.
Step A2, excavate the frequent mode set P of role's distribution series according to the thought of Aprioir algorithm, as network Role's evolution trend.
In step A2, it is thus achieved that node-role matrix sequence G=< G1,G2,…,GTAfter >, the most respectively to t Node-role matrix GtClustering, all cluster centres obtained are all as Length-1 pattern (similar Frequent Pattern Mining In project), by design support screening Length-1 frequent mode, next according to document 1 (Han J, Kamber M, Pei J.Data mining:concepts and techniques [M] .Morgan kaufmann, 2006.) in Attribute is closed downwards in Apriori algorithm checking, then calculates longer frequent mode.Next this section describes mode excavation in detail Process and the design of support.
Definition 4.Length-1 patternRespectively to GtUse document 2 (Ruhnau B.Eigenvector- Centrality a node-centrality [J] .Social networks, 2000,22 (4): 357-365.) Xmeans Method clusters, and the method need not the most given cluster number, is more suitable for present invention needs.The cluster centre obtained is still Being the probability sequence of value on each role, the cluster centre on all time points is all Length-1 pattern.
Such as, if obtaining 2 cluster centres on time point 1, represent with<1:1>,<1:2>, same on time point 2 To 2 cluster centres, represent with<2:1>,<2:2>, then<1:1>,<1:2>,<2:1>,<2:2>are length-1 patterns.
Definition 5.Length-1 patternSupport: similar with the support definition in tradition Frequent Pattern Mining, order Represent Length-1 pattern (i.e. certain cluster centre of t),Represent role's probability of t node n Distribution vectorWith cluster centreDistance (present invention uses Euclidean distance),Represent that t owns Node arrivesUltimate range, then have Length-1 patternSupport such as formula (1) shown in:
The reasonability defined by analytical proof support: (1) with shouldCloser to node to the contribution of its support more Greatly;(2) withClose node is the most, and support is the biggest;(3)
Definition 6.Length-k (k > 1) pattern support (p): Length-k pattern p is really k cluster centre composition Sequence, the corresponding snapshot of each cluster centre, as pattern p=<1:1,2:1,3:1>is exactly a Length-3 pattern, by 3 Individual cluster centre<1:1>,<2:1>,<3:1>are constituted.If setting TpThe time point set covered for pattern p,For in t p Cluster centre, then have shown in the support such as formula (2) of Length-k pattern p:
By its reasonability of analytical proof:
(1) node that corresponding with p on the most time points cluster centre is close, contributes the biggest to the support of p;(2) Comprising the pattern that (1) interior joint number is the most, support is the biggest;(3)
Prove that 1. close downwards attribute: close downwards attribute and refer to, if Length-k pattern p meets minimum support, I.e. frequent mode, then all of non-NULL subpattern must all is fulfilled for minimum support.As to Length-3 pattern p=< 1:1, 2:1,3:1>, p has 6 non-NULL subpatterns<1:1,2:1>,<2:1,3:1>,<1:1,3:1>,<1:1>,<2:1>,<3:1>.Due to Formula (2) needs when computation schema support to tire out successively according to time point set to take advantage of single Length-1 pattern support, and p The time point number that subpattern is covered is respectively less than the time point number that p is covered, and arbitrarily Length-1 patternHaveIt can thus be appreciated that the definition of above support meets closes downwards attribute.
The method for digging of more long pattern can be obtained by Apriori algorithm.If p1,p2It is two Length-L frequent modes, if having p1Remove the result after first time point, lucky and p2The result removing last time point is identical, then can be merged p1、p2Obtain Length-(L+1) pattern.Example: p1, p2For<1:1,2:3,3:2>, p2For<2:3,3:2,5:6>, can get The candidate pattern<1:1,2:3,3:2,5:6>of one Length-4.
Step B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
In stepb, by discussing how according to the frequent mode set P being previously obtained, define the abnormality score value of node, Find out role to develop abnormal REOutliers.Owing to Length-1 frequent mode is certain cluster centre on single snapshot, nothing Method portray " trend ', therefore all Length-1 frequent modes are removed from P by the present invention.
Solve REOutliers set according to frequent mode set P, have the method that two comparisons are simple: (1) design saves Point is to the difference measurement of frequent mode, and the exceptional value between summing node to each frequent mode is cumulative as abnormality score;(2) Only consider that cover time point number is most, the longest frequent mode, calculate the node exceptional value conduct to the longest frequent mode Abnormality score.But two kinds of methods are the most unreasonable: (1) node only may can be well matched with some frequent mode and deviate from Every other pattern, this can cause the mark that peels off of normal node higher;(2) only considered longer pattern to REOutliers The impact pinpointed the problems and have ignored short pattern, in fact in true dynamic network, it is little for excavating the long pattern obtained , short pattern is the most universal.
Define 7. structures (Configuration): structure is exactly the set of a series of time point, by all cover time points Identical frequent mode merges and obtains a time structure.Such as pattern p1=< 1:1,2:2 > and p2=< 1:2,2:2 >, Correspondence structure c=< 1,2 > at the same time.Merge the frequent mode of all identical time points in set P, obtain structured set C。
The problem that the concept of structure can solve above two method.Each c ∈ C is corresponding some has identical time point Frequent mode, calculate each node abnormality score time, can first each structure c be calculated respectively optimal with this node The frequent mode of coupling, obtains the optimal coupling frequent mode set that node is corresponding with all structuresThen cumulative joint Point with in the intensity of anomaly of all patterns obtain last abnormality score.Node and pattern are given below mates score definition.
Define the mark that mates of 8. node n and pattern p: the time point set T covered by frequent mode pPIt is divided into And θpnTwo parts,The time point set that intermediate scheme p and node n matches, θpnIntermediate scheme p is unmatched with node n Time point set, hasSuch as, there are frequent mode p=< 1:1,2:1,3:2 >, Tp=1,2,3}, if joint Point n cluster centre belonging on first three time point is the 1st, then haveθpn={ 3}.
Then have node n and pattern p to mate score definition as follows:
Intuitively: the time point number that (1) node n mates with pattern p is the most, then coupling mark is the highest;(2) mate with n Cluster centre the most important (support is the highest), then coupling mark the highest;(3) cluster represented by the cluster centre mated with n The compactest (average cluster is the least), coupling mark is the highest.
Define 9. node n and most preferably mate frequent mode pcDiscrepancy score: intuitively, it should only consider n and pcDo not mate Time point, and node n is the most remote with corresponding cluster centre record on these unmatched time points, and difference value is the highest.Equally Also should be according to pattern pcSupport be weighted.Thus can obtain shown in the final abnormality score of node such as formula (4):
Define 10. node abnormality score: according to pcTry to achieve each node to each pcAbnormality score, then can enter one The final abnormality score that step is obtained node n by formula (5) represents:
The REOutliers detection algorithm P-REOD of the present invention can be with abstract as follows:
Algorithm 1:Pattern-based Role Evolving Outliers Detection Frame (P- REOD)
Input: The dynamic network D=< S1,S2,…,ST
Output: Top-K REOutliers Set
r:#roles minSup:the threshold of the support
Step:
1.Input(D)
2.V=< V1,V2,…,VT> ← Feature Extract (D)
3.G=< G1,G2,…,GT> ← Role Discover (V, r)
4.P ← PatternMining (G, minSup)
5.C←Configuration Mining(P)
6.For each configuration c∈C
7.For each n∈N
8.pc←FindBestMatchingPattern(c,n)using(3)
9.S←Compute the outlierScore for node n using(4)and(5)
10.endForendFor
11.Output(Top-K REOutliers in S)
Wherein, the PatternMining algorithm in P-REOD realizes as shown in Algorithm 2:
Algorithm 2:PatternMining
Input: The role-role matixes sequence
Output: Frequent patterns set P
minSup:the threshold of the support
Step:
1.Frequent patterns set
2.Let Ck be the set of length-k candidate patterns,
3.C1←all the clusters in each timestamps,
4.Let Lk be the set of length-k frequent patterns,
5.L2←{f|f∈C1andsup(f)≥minSup},
6.For k=2to T
7.Ck←getCandidates(Lk-1)
8.Lk←{f|f∈Ckandsup(f)≥minSup}
9.P←P∪Lk
10.endFor
From algorithm above, there are two important parameters: role's number r and minimum support threshold value minSup.Role Number r decides the number to type of network topology classification, and node structure type can be portrayed thinner by bigger r value Cause, but model can be allowed to become complicated and bring relatively large overhead;Though the least r can with reduced model reduce structure type Between discrimination.The present invention uses the shortest description length to demonstrate the problems of value of r, finds that r value is optimal when being 3 or 4. The present invention same role number is 4.
Minimum support threshold value minSup then determines the number of final frequent mode.Given dynamic network data set, too The pattern that high minSup value may make some actual frequent occur cannot be mined, and then causes some abnormity point cannot be by Excavation is arrived, even there will be some node can not find match pattern and exceptional value is the situation of 0.The least minSup but may Cause some frequent mode obtained can not represent the important trend in data, affect end product equally.The present invention is respectively Take different minSup value many experiments, selected minSup on the basis of non-zero exceptional value can be obtained by whole nodes.
[beneficial effect]
The technical scheme that the present invention proposes has the advantages that
(1) present invention firstly provides role to develop the concept of abnormity point (REOutliers), i.e. role's evolutionary process is run counter to The node of network entirety role's evolution trend.This abnormity point is based purely on the structure role of node and proposes, it is not necessary to depend on any The extra network information, excavates such node and can help to disclose the anomalous event being hidden in network, and this is general for research Have great importance all over the real world network system lacking enough information.
(2) according to the definition of REOutliers, the present invention proposes a kind of REOutliers detection method based on pattern, letter It is written as P-REOD.The method anomaly that role developed is divided into schema extraction and abnormality detection two parts, first basis Aprioir algorithm calculates frequent mode that all nodes occur at the role's evolutionary process trend as whole network, then root According to the network trends detection REOutliers obtained.
(3) present invention tests, to obtain respectively on three truthful data collection (Enron, Facebook, DBLP) REOutliers is made that and reasonably explains and analyze.
Accompanying drawing explanation
Fig. 1 is the flow chart of the method that the anomalous structure evolution in the dynamic network that the embodiment of the present invention one provides finds.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below by the detailed description of the invention to the present invention Carry out clear, complete description.
Embodiment one
Anomalous structure in the dynamic network that Fig. 1 provides for the embodiment of the present invention one develops the flow chart of the method found. As it is shown in figure 1, the method comprising the steps of S1~S2, the detailed process in following steps is referred to the content in description.
After step S1, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network.
Step S2, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
Separately below step S1 and step S2 are described in detail.
Step S1 specifically includes step:
Each figure snapshot is carried out role's discovery respectively, obtains role's distribution series of each node.Wherein to each figure Snapshot carries out the method for role's distribution series that role's discovery obtains each node respectively:
To each node, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and obtain To recursive feature, basic feature and recursive feature collectively form role's distribution series of node.
The frequent mode set of role's distribution series, fuzzy frequent itemsets excavation obtained is excavated according to Aprioir algorithm Cooperation is role's evolution trend of network.According to the method for Aprioir algorithm excavation role's distribution series it is wherein:
The most respectively role's distribution series in each moment is clustered, all cluster centres all conducts obtained Length-1 pattern, by design support screening Length-1 frequent mode, and closes downwards according to Apriori algorithm checking Attribute, then calculates longer frequent mode.
As can be seen from the above embodiments, the embodiment of the present invention anomaly that role developed is divided into schema extraction and exception Detection two parts, first calculate, according to the thought of Aprioir algorithm, the frequent mode that all nodes occur at role's evolutionary process As the trend of whole network, then according to the network trends detection REOutliers obtained.
It is to be appreciated that the embodiment of foregoing description is a part of embodiment of the present invention rather than whole embodiment, the most not It it is limitation of the present invention.Based on embodiments of the invention, those of ordinary skill in the art are not paying creative work premise Lower obtained every other embodiment, broadly falls into protection scope of the present invention.
Below by the method experiment Analysis of embodiment one.
Experimental analysis
In experimental analysis part, the result mainly P-REOD algorithm tested on truthful data collection, and effectively Property analyze, specifically include procedure below:
1, data set describes & feature extraction
The present invention is tested respectively on three data sets that scale is different, is briefly described as follows:
(1) Enron: the mail network of Enron Corp employee.Enron was once the energy that the U.S. enjoys great reputation Company, but go bankrupt suddenly between calendar year 2001, thus cause the interest of numerous researcher.The present invention is elite take January calendar year 2001~ The data construct dynamic network of December is analyzed, and is monthly a network snapshots, and wherein node represents employee, while represent employee Between mail contact.
(2) Facebook-wall (being called for short Facebook): user's wall data of Facebook, each user has oneself Wall, other users can leave a message on the wall.The present invention chooses in January, 2008~the data of December, is monthly a snapshot, wherein The node of network represents a Facebook user, while the message relation represented between user.
(3) DBLP: scientific research cooperative person network, the author jointly published thesis has partnership.The present invention chooses 2002 Year~the data construct dynamic network of totally 10 years in 2011, be a snapshot every year, network node represents author, while represent author Between partnership.Remove the sum author less than 10 papers that publishes thesis.
Experiment finds the interstitial content comprising each snapshot after dynamic network division and differs, and i.e. there is midway and goes out Now or disappear node, these nodes are likely to become last abnormity point very little due to the pattern that can mate.Need to analyze Want, present invention preserves complete network structure, but only consider always present in the set of node in network when calculating abnormity point Close.Three data concentrate the node set size respectively 2114 always existed, 5111, the details of 29747. data sets are shown in Table 1.
The present invention uses the method that the time decays to calculate the weight on limit, i.e. in the weight calculating two node current times Time, consider the weight of all historical junctures simultaneously, it is believed that from current time more away from affect the least, see formula (6).Wherein wiRepresent ti The weight of relation between moment node a and b, sets weight threshold as w*, filter all weights less than w*Limit.The present invention take λ= 1,w*=0.1,
Three data sets are carried out feature extraction operation respectively, obtains node diagnostic matrix, V=< V1,V2,…,VT>, The Characteristic Number extracted is without artificial given, and the number of features of the biggest extraction of network size is the most.
Table 1 data set details
2, role's extraction is explained with role
Although document 3 (K.Henderson, B.Gallagher, L.Li, L.Akoglu, T.Eliassi-Rad, H.Tong, and C.Faloutsos.RolX:Role Extraction and Mining in Large Networks.In KDD, 2012.) reasonability to role is made that strong explanation and proof, but in order to preferably explain the present invention's REOutlier, first this experiment extends the method for document 3, and making explanations role as a example by Enron data set, (other are two years old Individual data set is similar to).Choosing role's number r is 4, and the snapshot corresponding to 12 months data set calendar year 2001s of Enron carries out angle respectively Color extraction can obtain angle of rotation of joint colour moment battle array G=< G1,G2,…,GT>, whereinEvery string correspondence one angle Color.
The 4 kinds of roles obtained for reasonable dismissal, the present invention extends in document 3 and explains role's with common peer metric Method, has selected 6 kinds of common tolerance (degree of node, band measures and weights, betweenness, eigenvector centrality, Weighted characterization vector center And PR value).First calculate node and can get peer metric matrix at 6 kinds of metrics of tWherein m is degree Amount number (embodiment of the present invention one is set to 6).Then reuse Non-negative Matrix Factorization and seek matrixMake GtEt ≈Mt, wherein GtIt it is still the angle of rotation of joint colour moment battle array of t.Matrix EtEvery a line correspondence one role, respectively to should role Value in 6 kinds of tolerance.The role's metric matrix calculating Enron data set according to above method obtains E=< E1,E2,…, ET>, obtains the meansigma methods of 4 kinds of roles after being averaging, result shows that 4 kinds of roles have respective obvious characteristic: R1 at 6 kinds of degree There is in amount bigger value, R2 only in eigenvector centrality tolerance value be not 0, R3 value in 6 kinds of tolerance all connects Being bordering on, R4 is then equal to 0.
3, REOutliers analyzes
Owing to lacking the background knowledge of concrete application, the assessment to REOutliers becomes more difficulty.In order to this is described The effectiveness of invention algorithm, the present invention first calculate all node roles distribution meansigma methods as measurement, then choose different The minimum and maximum node of ordinary index is analyzed.It is the most still main analysis inventive algorithm with Enron data set Reasonability.
4, Enron data set
Selecting minSup is 0.1 to there are 5511 frequent modes, and the pattern merging identical time point obtains structure 555 Individual.Two nodes (No. 17172 and No. 29659 nodes) that the present invention chooses abnormality score maximum are analyzed, and calculate all The node that the overall situation is average and exceptional value is minimum of node role's value as a comparison, draws role's evolution diagram respectively.By dividing Analysis understands, and 17172 there occurs different with the structure role of 29,659 two nodes in the middle of the evolutionary process of 12 months calendar year 2001s really Reason condition.
As shown in the above, four kinds of role R1~R4 of this data set represent respectively Centroid, have important directly Neighbor node, general node and fringe node four kinds, thus can continue to analyze the exception knot of gained abnormity point on this basis Structure situation of change.As a example by No. 17172 nodes, analyze the structure obtaining this node and be found that once abnormal, in June in June This node one was directly subordinate to the Centroid of network in the past, suddenly became the node with important neighbours June, after August Then become fringe node.In order to verify that the anomalous structure of No. 17172 nodes changes further, analyze further and obtain, node 17172 at strictly one node being in center in May, suddenly becomes only one of which immediate neighbor when June i.e. Node 3286, it is a particularly important node when June that analysis obtains node 3286, thus demonstrates above-mentioned analysis.Thing Shi Shang Enron really there occurs the anomalous events such as replacing CEO in calendar year 2001 6~August, and this confirms this most from another point of view The reasonability of invention abnormity point.
5, Facebook & DBLP data set
Facebook and DBLP data set chooses minSup and is 0.11, respectively obtain frequent mode be 6286, 4723, the node still choosing exceptional value the highest is analyzed, with global mean value and the exceptional value lowest section of node role Point, as to contrast, draws node role's evolution diagram.From role's evolution diagram, compare overall situation draw and exceptional value is minimum Node, the abnormity point that P-REOD algorithm obtains is implicitly present in obvious abnormal conditions.
Comprehensive the above, the carried role of the embodiment of the present invention one abnormity point that develops can be portrayed in dynamic network data set There is abnormal node and the abnormal moment occur in node structure, excavates such abnormal nodes and can help to find in data The event hidden and special joint, if enough nodal informations can be obtained, then can help to make a policy further.

Claims (7)

1. the anomalous structure in a dynamic network develops the method found, it is characterised in that include step:
After A, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network;
B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described step Rapid A specifically includes step:
A1, each figure snapshot is carried out role's discovery respectively, obtain role's distribution series of each node;
A2, according to Aprioir algorithm excavate role's distribution series frequent mode set, the frequent mode set that excavation is obtained Role's evolution trend as network.
The method that anomalous structure evolution in dynamic network the most according to claim 2 finds, it is characterised in that described step The method that each figure snapshot carries out in rapid A1 role's distribution series that role's discovery obtains each node respectively is:
To each node, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and passed Feature, basic feature and recursive feature is returned to collectively form role's distribution series of node.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described step In rapid A2, the method according to Aprioir algorithm excavation role's distribution series is:
Clustering role's distribution series in each moment the most respectively, all cluster centres obtained are all as Length-1 Pattern, by design support screening Length-1 frequent mode, and closes downwards attribute, so according to Apriori algorithm checking The frequent mode that rear calculating is longer.
The method that anomalous structure evolution in dynamic network the most according to claim 4 finds, it is characterised in that described Degree of holding is configured according to the distance of node to cluster centre.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described step Rapid B specifically includes step:
The identical frequent mode of point of all cover times is merged and obtains a time structure;
Each time structure is calculated respectively the frequent mode most preferably mated with this node, obtains node corresponding with all structures The frequent mode set of optimal coupling, then summing node obtains last different with all optimal intensity of anomalys mating frequent mode Ordinary index.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described dynamic State network is node set or limit set.
CN201610474974.2A 2016-06-24 2016-06-24 The method that anomalous structure evolution in dynamic network finds Pending CN106202614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610474974.2A CN106202614A (en) 2016-06-24 2016-06-24 The method that anomalous structure evolution in dynamic network finds

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610474974.2A CN106202614A (en) 2016-06-24 2016-06-24 The method that anomalous structure evolution in dynamic network finds

Publications (1)

Publication Number Publication Date
CN106202614A true CN106202614A (en) 2016-12-07

Family

ID=57461279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610474974.2A Pending CN106202614A (en) 2016-06-24 2016-06-24 The method that anomalous structure evolution in dynamic network finds

Country Status (1)

Country Link
CN (1) CN106202614A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423743A (en) * 2017-03-20 2017-12-01 重庆邮电大学 The dynamic network state evolution method for visualizing of introduced feature component similarity
CN107609330A (en) * 2017-08-31 2018-01-19 中国人民解放军国防科技大学 Access log mining-based internal threat abnormal behavior analysis method
CN107705212A (en) * 2017-07-07 2018-02-16 江苏开放大学 A kind of role recognition method based on population random walk
CN109446836A (en) * 2018-10-09 2019-03-08 上海交通大学 A kind of social networks personal information propagation access control method
CN109818892A (en) * 2019-01-18 2019-05-28 华中科技大学 Construct Cyclic Spectrum characteristic parameter extraction model and signal modulation mode recognition methods
CN110557294A (en) * 2019-09-25 2019-12-10 南昌航空大学 PSN (packet switched network) time slicing method based on network change degree
CN111626891A (en) * 2020-06-03 2020-09-04 四川大学 Dynamic sale network community discovery method based on extended nodes
CN112527784A (en) * 2020-12-08 2021-03-19 天津大学 Abnormal mode mining and incremental abnormal detection method based on complex network
CN115114488A (en) * 2022-07-15 2022-09-27 中国西安卫星测控中心 Dynamic information network abnormal evolution node detection method based on role discovery

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091206A (en) * 2014-06-18 2014-10-08 北京邮电大学 Social network information transmission prediction method based on evolutionary game theory
CN105243448A (en) * 2015-10-13 2016-01-13 北京交通大学 Method and device for predicting evolution trend of internet public opinion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091206A (en) * 2014-06-18 2014-10-08 北京邮电大学 Social network information transmission prediction method based on evolutionary game theory
CN105243448A (en) * 2015-10-13 2016-01-13 北京交通大学 Method and device for predicting evolution trend of internet public opinion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张永辉 等: "基于结构分析的信息网络社团趋势预测", 《计算机科学与探索》 *
李艳梅 等: "动态信息网络中的角色演化异常及其发现", 《计算机科学与探索》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423743A (en) * 2017-03-20 2017-12-01 重庆邮电大学 The dynamic network state evolution method for visualizing of introduced feature component similarity
CN107705212A (en) * 2017-07-07 2018-02-16 江苏开放大学 A kind of role recognition method based on population random walk
CN107705212B (en) * 2017-07-07 2021-06-15 江苏开放大学 Role identification method based on particle swarm random walk
CN107609330B (en) * 2017-08-31 2019-12-06 中国人民解放军国防科技大学 Access log mining-based internal threat abnormal behavior analysis method
CN107609330A (en) * 2017-08-31 2018-01-19 中国人民解放军国防科技大学 Access log mining-based internal threat abnormal behavior analysis method
CN109446836A (en) * 2018-10-09 2019-03-08 上海交通大学 A kind of social networks personal information propagation access control method
CN109446836B (en) * 2018-10-09 2022-02-15 上海交通大学 Social network personal information propagation access control method
CN109818892A (en) * 2019-01-18 2019-05-28 华中科技大学 Construct Cyclic Spectrum characteristic parameter extraction model and signal modulation mode recognition methods
CN110557294A (en) * 2019-09-25 2019-12-10 南昌航空大学 PSN (packet switched network) time slicing method based on network change degree
CN111626891A (en) * 2020-06-03 2020-09-04 四川大学 Dynamic sale network community discovery method based on extended nodes
CN111626891B (en) * 2020-06-03 2023-08-01 四川大学 Dynamic sales network community discovery method based on extension node
CN112527784A (en) * 2020-12-08 2021-03-19 天津大学 Abnormal mode mining and incremental abnormal detection method based on complex network
CN115114488A (en) * 2022-07-15 2022-09-27 中国西安卫星测控中心 Dynamic information network abnormal evolution node detection method based on role discovery
CN115114488B (en) * 2022-07-15 2024-03-26 中国西安卫星测控中心 Dynamic information network abnormal evolution node detection method based on role discovery

Similar Documents

Publication Publication Date Title
CN106202614A (en) The method that anomalous structure evolution in dynamic network finds
Cherifi et al. On community structure in complex networks: challenges and opportunities
CN109005055B (en) Complex network information node importance evaluation method based on multi-scale topological space
Takaffoli et al. Incremental local community identification in dynamic social networks
US8605092B2 (en) Method and apparatus of animation planning for a dynamic graph
Preciado et al. Moment-based spectral analysis of large-scale networks using local structural information
CN110322356B (en) Medical insurance abnormity detection method and system based on HIN mining dynamic multi-mode
Marcus et al. Efficient counting of network motifs
WO2018203956A1 (en) Systems and methods to detect clusters in graphs
Guillaume et al. Relevance of massively distributed explorations of the internet topology: Simulation results
CN110909173A (en) Non-overlapping community discovery method based on label propagation
Shang Generalized k-core percolation in networks with community structure
Chang et al. Relative centrality and local community detection
Gogoi et al. A rough set–based effective rule generation method for classification with an application in intrusion detection
Yin et al. Measuring directed triadic closure with closure coefficients
Kim et al. Relational flexibility of network elements based on inconsistent community detection
Corneli et al. The dynamic stochastic topic block model for dynamic networks with textual edges
Flossdorf et al. Change detection in dynamic networks using network characteristics
Han et al. On the complexity of counterfactual reasoning
Godziszewski et al. Attacking similarity-based sign prediction
CN103164487B (en) A kind of data clustering method based on density and geological information
Porter et al. Analytical models for motifs in temporal networks
CN107276807B (en) Hierarchical network community tree pruning method based on community dynamic compactness
Wienöbst et al. Recovering causal structures from low-order conditional independencies
CN116668105A (en) Attack path reasoning system combined with industrial control safety knowledge graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161207