CN106202614A - The method that anomalous structure evolution in dynamic network finds - Google Patents
The method that anomalous structure evolution in dynamic network finds Download PDFInfo
- Publication number
- CN106202614A CN106202614A CN201610474974.2A CN201610474974A CN106202614A CN 106202614 A CN106202614 A CN 106202614A CN 201610474974 A CN201610474974 A CN 201610474974A CN 106202614 A CN106202614 A CN 106202614A
- Authority
- CN
- China
- Prior art keywords
- role
- node
- network
- frequent mode
- dynamic network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000002547 anomalous effect Effects 0.000 title claims abstract description 19
- 238000009412 basement excavation Methods 0.000 claims abstract description 9
- 230000002123 temporal effect Effects 0.000 claims abstract description 6
- 230000008878 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 238000013461 design Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 4
- 230000013011 mating Effects 0.000 claims description 2
- 230000002159 abnormal effect Effects 0.000 abstract description 14
- 238000004458 analytical method Methods 0.000 abstract description 11
- 238000012545 processing Methods 0.000 abstract description 2
- 239000011159 matrix material Substances 0.000 description 15
- 238000000605 extraction Methods 0.000 description 14
- 230000005856 abnormality Effects 0.000 description 11
- 230000010429 evolutionary process Effects 0.000 description 9
- 238000001514 detection method Methods 0.000 description 8
- 238000005065 mining Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000013450 outlier detection Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/18—Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Geometry (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to technical field of data processing, it is provided that the method that the anomalous structure evolution in a kind of dynamic network finds, the method comprising the steps of: after a given dynamic network, excavates the frequent mode of role's Temporal Evolution in whole network;Carry out role by the different degree of comparison node to frequent mode to develop anomaly.The present invention uses role to portray the architectural feature of network, propose role first to develop abnormal concept, temporal characteristics in conjunction with dynamic network, propose role based on mode excavation to develop anomaly algorithm, analysis of the present invention and excavate the abnormal phenomena occurred during network structure changes over and be conducive to understanding the dynamic behaviour of complication system.
Description
Technical field
The invention belongs to technical field of data processing, find particularly to the anomalous structure evolution in a kind of dynamic network
Method.
Background technology
At present, the network that dynamic network is node and limit can change in time, the many complication systems of real world are all
There is the feature of dynamic network.Big to the social networks of social system, scientific research cooperative person's network, mail network, little to ecosystem
The protein network etc. of system.In view of the constraint such as Commercial Secret Protection etc. of current conditions, many networks cannot obtain enough volumes
External information, thus the analysis of network infrastructure always is the focus of research.Constantly advance however as the time, dynamic network
Structure be in changing during, the abnormal node of mining structure evolutionary process is to understanding that complication system dynamic behaviour has
Significant, also it is a new problem of complex network research field.
For the problem of representation of network structure, document 1 (K.Henderson, B.Gallagher, L.Li, L.Akoglu,
T.Eliassi-Rad,H.Tong,and C.Faloutsos.RolX:Role Extraction and Mining in Large
Networks.In KDD, 2012.) propose to portray with potential " role " (role) the structure behavior of node, Jiao Sedai first
The type of certain network structure of table, the node with analog structure belongs to same role, such as Centroid, fringe node etc..
Be different from community discovery, the usual position of node in community be close, and role portrays be node in a network
Structure, the node with same role is likely distributed in any position of network.It is true that role is permissible after finding step
Obtaining role's value distribution for each node, the number of role determines the dimension of vector, such as 3 role R1~R3 table respectively
Show Centroid, Bridge Joints, fringe node, then role's distribution of certain node may be R1:0.8, R2:0.1, R3:0.1},
Represent that the value on three roles of this node is respectively 0.8,0.1 and 0.1.Owing to role's value of this node substantially biases toward angle
Color R1, then can directly take the role that R1 Centroid is this node.
From the point of view of abstract mathematical model, dynamic network may be considered a figure snapshot sequence.A given Dynamic Networks
Network, can pass through analysis node role's evolution trend at role's distribution situation analysis network of each snapshot time.If only examining
Consider 3 kinds of simple character types R1~R3, represent that Centroid, Bridge Joints, fringe node, each node can obtain respectively
To a corresponding role distribution, connecting according to the time sequential of dynamic network and obtain role's distribution series, this sequence can be with table
Show certain evolution trend of network role, as sequence < 1:{R1:0, R2:0.1, R3:0.8}, 2:{R1:0.2, R2:0.3, R3:
0.5}, 3:{R1:0.6, R2:0.3, R3:0.1} > the part edge node in network may be represented the most gradually
Develop into the process of Centroid.But in reality, always can there is the node that evolutionary process is abnormal, such as in social activity
In network, the less marginal position being in network of user's vermicelli being newly added, the responsible consumer with numerous vermicelli is often net
The Centroid of network, in general the user of marginal position needs constantly to save bit by bit human connection and just can be increasingly becoming important center and use
Family, thus this evolutionary process can be represented by above-mentioned sequence, if but this user certain moment occur suddenly one important
Social events or user's deliberate propagation behavior, then role's distribution series of this node is certain to be become by fringe node suddenly
Centered by node, as may be < 1:{R1:0, R2:0.1, R3:0.8}, 2:{R1:0, R2:0.1, R3:0.8}, 3:{R1:0.8,
R2:0.1, R3:0.1} >.It is that role develops abnormity point (Role that the present invention defines the abnormal node of this role's evolution trend
Evolving Outliers, is abbreviated as REOutliers).From definition, REOutliers is based purely on the structure angle of node
Color proposes, it is not necessary to depend on any extra network information, and the network system of real world is limited to factors and often believes
Breath scarcity, thus the excavation of REOutliers is likely to can help to disclose the anomalous event being hidden within dynamic network, this
A little abnormity point itself also should be paid close attention to.
Summary of the invention
[solving the technical problem that]
It is an object of the invention to provide the method that the anomalous structure evolution in a kind of dynamic network finds, by node role
Evolution condition, role's distribution of node, excavate the node R EOutliers set that evolutionary process is abnormal.
[technical scheme]
The present invention is achieved by the following technical solutions.
The present invention relates to the method that the anomalous structure evolution in a kind of dynamic network finds, including step:
After A, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network;
B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
As one preferred embodiment, described step A specifically includes step:
A1, each figure snapshot is carried out role's discovery respectively, obtain role's distribution series of each node;
A2, according to Aprioir algorithm excavate role's distribution series frequent mode set, the frequent mode that excavation is obtained
Gather the role's evolution trend as network.
As another preferred embodiment, each figure snapshot is carried out role by described step A1 respectively and finds to obtain
The method of role's distribution series of each node is:
To each node, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and obtain
To recursive feature, basic feature and recursive feature collectively form role's distribution series of node.
As another preferred embodiment, described step A2 excavates role's distribution series according to Aprioir algorithm
Method be:
The most respectively role's distribution series in each moment is clustered, all cluster centres all conducts obtained
Length-1 pattern, by design support screening Length-1 frequent mode, and closes downwards according to Apriori algorithm checking
Attribute, then calculates longer frequent mode.
As another preferred embodiment, described support is configured according to the distance of node to cluster centre.
As another preferred embodiment, described step B specifically includes step:
The identical frequent mode of point of all cover times is merged and obtains a time structure;
Each time structure is calculated respectively the frequent mode most preferably mated with this node, obtains node and all structures pair
The optimal coupling frequent mode set answered, then summing node obtains finally with all optimal intensity of anomalys mating frequent mode
Abnormality score.
As another preferred embodiment, described dynamic network is node set or limit set.
The present invention is described in detail below.
The many complication systems of real world all have the feature of dynamic network, analyze and excavate network structure and change over
During occur abnormal phenomena be conducive to understanding the dynamic behaviour of complication system.Therefore, a given dynamic network is fast
According to sequence, this invention address that the evolution condition of research node role, object of study is role's distribution of node, and target is to excavate
The node R EOutliers set that evolutionary process is abnormal.The present invention, by the definition of REOutliers, proposes a kind of based on mould
The role of formula develops, and (Pattern-based Role Evolving Outliers Detection, writes a Chinese character in simplified form Outlier Detection Algorithm
For P-EROD).
Specifically, the inventive method includes step:
After A, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network;
B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
In step, two processes are specifically included:
Step A1, each figure snapshot is carried out role's discovery respectively, obtain role's distribution series of each node.
In step A1, role is the basis of the present invention, and the detailed process that role finds is:
The problem input of step A1 definition is a dynamic network, and first the present invention defines dynamic network.
Define 1. dynamic networks (Dynamic Networks): set D=< N, E > and represent a dynamic network, N=<
N1,N2,…,NT> is node set, E=< E1,E2,…,ET> is limit set.The present invention only considers the architectural feature of network,
Thus D is non-directed graph.D=< N, E > are divided D=< S on a time period1,S2,…,ST>, wherein T represents snapshot number, St
=< Nt,Et> represents the network snapshot in t, NtRepresent the node set of t network, EtRepresent limit set, StEqually
For non-directed graph.
Role finds that target is to portray the structure type appeared in network respectively with a series of potential roles.In with
As a example by heart node, Bridge Joints and three kinds of network structures of fringe node, it is also desirable to it appeared that Three role is respectively
Corresponding three kinds of structures.The role of node finds to include that feature extraction and role find two steps.
Feature extraction purpose is that the basic structural feature (such as degree, participating in the number with triangle) with node represents joint
Point.To each node v, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and passed
Returning feature, basic feature and recursive feature to collectively form the characteristic vector of node, Characteristic Number is automatically determined by algorithm.
Define 2. nodes-eigenmatrix sequence V=< V1,V2,…,VT>: to StCarry out features above extraction process can obtain
Node-eigenmatrixWherein ntFor the node number of t, f is from StThe Characteristic Number of middle extraction.To each
Snapshot carries out feature extraction respectively, obtains node-eigenmatrix sequence V=< V1,V2,…,VT>.
Define 3. nodes-role matrix sequence G=< G1,G2,…,GT>: given node-eigenmatrixWith
And a positive integer r < min (nt, f), use Non-negative Matrix Factorization NMF (Non-negative Matrix
Factorization) method looks for nonnegative matrixAndMake GtF≈Vt, wherein r is role
Number,It it is a node-role's matrix.Still with Centroid, Bridge Joints and three kinds of network knots of fringe node
As a example by structure, there are r=3, GtLine n to be node n be distributed the role of t, such as { R1:0.1, R2:0.1, R3:0.8}.By
V=< V1,V2,…,VT> can obtain whole node-role matrix sequence G=< G1,G2,…,GT>.
After obtaining node-role's matrix, the role of maximum occurrences directly can be ignored it as the role of node
The value of he all roles.As can by be above given fringe node role's sequence < 1:{R1:0.1, R2:0.1, R3:0.8},
2:{R1:0.2, R2:0.3, R3:0.5}, 3:{R1:0.6, R2:0.3, R3:0.1} > be directly expressed as 1:{R3}, 2:{R3}, 3:
R1} > it is { R1:0.1, R2:0.1, R3:0.8}.Although do so can realize a certain degree of simplification, but can cause opening up
Show node role's real change process, it is clear that simplifying later role's sequence cannot develop to Centroid by reflecting edge node
Details.Therefore present invention preserves complete role's distribution, the role of node is regarded as a kind of soft probability distribution, all roles
The common structure representing node.
Step A2, excavate the frequent mode set P of role's distribution series according to the thought of Aprioir algorithm, as network
Role's evolution trend.
In step A2, it is thus achieved that node-role matrix sequence G=< G1,G2,…,GTAfter >, the most respectively to t
Node-role matrix GtClustering, all cluster centres obtained are all as Length-1 pattern (similar Frequent Pattern Mining
In project), by design support screening Length-1 frequent mode, next according to document 1 (Han J, Kamber M,
Pei J.Data mining:concepts and techniques [M] .Morgan kaufmann, 2006.) in
Attribute is closed downwards in Apriori algorithm checking, then calculates longer frequent mode.Next this section describes mode excavation in detail
Process and the design of support.
Definition 4.Length-1 patternRespectively to GtUse document 2 (Ruhnau B.Eigenvector-
Centrality a node-centrality [J] .Social networks, 2000,22 (4): 357-365.) Xmeans
Method clusters, and the method need not the most given cluster number, is more suitable for present invention needs.The cluster centre obtained is still
Being the probability sequence of value on each role, the cluster centre on all time points is all Length-1 pattern.
Such as, if obtaining 2 cluster centres on time point 1, represent with<1:1>,<1:2>, same on time point 2
To 2 cluster centres, represent with<2:1>,<2:2>, then<1:1>,<1:2>,<2:1>,<2:2>are length-1 patterns.
Definition 5.Length-1 patternSupport: similar with the support definition in tradition Frequent Pattern Mining, order
Represent Length-1 pattern (i.e. certain cluster centre of t),Represent role's probability of t node n
Distribution vectorWith cluster centreDistance (present invention uses Euclidean distance),Represent that t owns
Node arrivesUltimate range, then have Length-1 patternSupport such as formula (1) shown in:
The reasonability defined by analytical proof support: (1) with shouldCloser to node to the contribution of its support more
Greatly;(2) withClose node is the most, and support is the biggest;(3)
Definition 6.Length-k (k > 1) pattern support (p): Length-k pattern p is really k cluster centre composition
Sequence, the corresponding snapshot of each cluster centre, as pattern p=<1:1,2:1,3:1>is exactly a Length-3 pattern, by 3
Individual cluster centre<1:1>,<2:1>,<3:1>are constituted.If setting TpThe time point set covered for pattern p,For in t p
Cluster centre, then have shown in the support such as formula (2) of Length-k pattern p:
By its reasonability of analytical proof:
(1) node that corresponding with p on the most time points cluster centre is close, contributes the biggest to the support of p;(2)
Comprising the pattern that (1) interior joint number is the most, support is the biggest;(3)
Prove that 1. close downwards attribute: close downwards attribute and refer to, if Length-k pattern p meets minimum support,
I.e. frequent mode, then all of non-NULL subpattern must all is fulfilled for minimum support.As to Length-3 pattern p=< 1:1,
2:1,3:1>, p has 6 non-NULL subpatterns<1:1,2:1>,<2:1,3:1>,<1:1,3:1>,<1:1>,<2:1>,<3:1>.Due to
Formula (2) needs when computation schema support to tire out successively according to time point set to take advantage of single Length-1 pattern support, and p
The time point number that subpattern is covered is respectively less than the time point number that p is covered, and arbitrarily Length-1 patternHaveIt can thus be appreciated that the definition of above support meets closes downwards attribute.
The method for digging of more long pattern can be obtained by Apriori algorithm.If p1,p2It is two Length-L frequent modes, if having
p1Remove the result after first time point, lucky and p2The result removing last time point is identical, then can be merged
p1、p2Obtain Length-(L+1) pattern.Example: p1, p2For<1:1,2:3,3:2>, p2For<2:3,3:2,5:6>, can get
The candidate pattern<1:1,2:3,3:2,5:6>of one Length-4.
Step B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
In stepb, by discussing how according to the frequent mode set P being previously obtained, define the abnormality score value of node,
Find out role to develop abnormal REOutliers.Owing to Length-1 frequent mode is certain cluster centre on single snapshot, nothing
Method portray " trend ', therefore all Length-1 frequent modes are removed from P by the present invention.
Solve REOutliers set according to frequent mode set P, have the method that two comparisons are simple: (1) design saves
Point is to the difference measurement of frequent mode, and the exceptional value between summing node to each frequent mode is cumulative as abnormality score;(2)
Only consider that cover time point number is most, the longest frequent mode, calculate the node exceptional value conduct to the longest frequent mode
Abnormality score.But two kinds of methods are the most unreasonable: (1) node only may can be well matched with some frequent mode and deviate from
Every other pattern, this can cause the mark that peels off of normal node higher;(2) only considered longer pattern to REOutliers
The impact pinpointed the problems and have ignored short pattern, in fact in true dynamic network, it is little for excavating the long pattern obtained
, short pattern is the most universal.
Define 7. structures (Configuration): structure is exactly the set of a series of time point, by all cover time points
Identical frequent mode merges and obtains a time structure.Such as pattern p1=< 1:1,2:2 > and p2=< 1:2,2:2 >,
Correspondence structure c=< 1,2 > at the same time.Merge the frequent mode of all identical time points in set P, obtain structured set
C。
The problem that the concept of structure can solve above two method.Each c ∈ C is corresponding some has identical time point
Frequent mode, calculate each node abnormality score time, can first each structure c be calculated respectively optimal with this node
The frequent mode of coupling, obtains the optimal coupling frequent mode set that node is corresponding with all structuresThen cumulative joint
Point with in the intensity of anomaly of all patterns obtain last abnormality score.Node and pattern are given below mates score definition.
Define the mark that mates of 8. node n and pattern p: the time point set T covered by frequent mode pPIt is divided into
And θpnTwo parts,The time point set that intermediate scheme p and node n matches, θpnIntermediate scheme p is unmatched with node n
Time point set, hasSuch as, there are frequent mode p=< 1:1,2:1,3:2 >, Tp=1,2,3}, if joint
Point n cluster centre belonging on first three time point is the 1st, then haveθpn={ 3}.
Then have node n and pattern p to mate score definition as follows:
Intuitively: the time point number that (1) node n mates with pattern p is the most, then coupling mark is the highest;(2) mate with n
Cluster centre the most important (support is the highest), then coupling mark the highest;(3) cluster represented by the cluster centre mated with n
The compactest (average cluster is the least), coupling mark is the highest.
Define 9. node n and most preferably mate frequent mode pcDiscrepancy score: intuitively, it should only consider n and pcDo not mate
Time point, and node n is the most remote with corresponding cluster centre record on these unmatched time points, and difference value is the highest.Equally
Also should be according to pattern pcSupport be weighted.Thus can obtain shown in the final abnormality score of node such as formula (4):
Define 10. node abnormality score: according to pcTry to achieve each node to each pcAbnormality score, then can enter one
The final abnormality score that step is obtained node n by formula (5) represents:
The REOutliers detection algorithm P-REOD of the present invention can be with abstract as follows:
Algorithm 1:Pattern-based Role Evolving Outliers Detection Frame (P-
REOD)
Input: The dynamic network D=< S1,S2,…,ST>
Output: Top-K REOutliers Set
r:#roles minSup:the threshold of the support
Step:
1.Input(D)
2.V=< V1,V2,…,VT> ← Feature Extract (D)
3.G=< G1,G2,…,GT> ← Role Discover (V, r)
4.P ← PatternMining (G, minSup)
5.C←Configuration Mining(P)
6.For each configuration c∈C
7.For each n∈N
8.pc←FindBestMatchingPattern(c,n)using(3)
9.S←Compute the outlierScore for node n using(4)and(5)
10.endForendFor
11.Output(Top-K REOutliers in S)
Wherein, the PatternMining algorithm in P-REOD realizes as shown in Algorithm 2:
Algorithm 2:PatternMining
Input: The role-role matixes sequence
Output: Frequent patterns set P
minSup:the threshold of the support
Step:
1.Frequent patterns set
2.Let Ck be the set of length-k candidate patterns,
3.C1←all the clusters in each timestamps,
4.Let Lk be the set of length-k frequent patterns,
5.L2←{f|f∈C1andsup(f)≥minSup},
6.For k=2to T
7.Ck←getCandidates(Lk-1)
8.Lk←{f|f∈Ckandsup(f)≥minSup}
9.P←P∪Lk
10.endFor
From algorithm above, there are two important parameters: role's number r and minimum support threshold value minSup.Role
Number r decides the number to type of network topology classification, and node structure type can be portrayed thinner by bigger r value
Cause, but model can be allowed to become complicated and bring relatively large overhead;Though the least r can with reduced model reduce structure type
Between discrimination.The present invention uses the shortest description length to demonstrate the problems of value of r, finds that r value is optimal when being 3 or 4.
The present invention same role number is 4.
Minimum support threshold value minSup then determines the number of final frequent mode.Given dynamic network data set, too
The pattern that high minSup value may make some actual frequent occur cannot be mined, and then causes some abnormity point cannot be by
Excavation is arrived, even there will be some node can not find match pattern and exceptional value is the situation of 0.The least minSup but may
Cause some frequent mode obtained can not represent the important trend in data, affect end product equally.The present invention is respectively
Take different minSup value many experiments, selected minSup on the basis of non-zero exceptional value can be obtained by whole nodes.
[beneficial effect]
The technical scheme that the present invention proposes has the advantages that
(1) present invention firstly provides role to develop the concept of abnormity point (REOutliers), i.e. role's evolutionary process is run counter to
The node of network entirety role's evolution trend.This abnormity point is based purely on the structure role of node and proposes, it is not necessary to depend on any
The extra network information, excavates such node and can help to disclose the anomalous event being hidden in network, and this is general for research
Have great importance all over the real world network system lacking enough information.
(2) according to the definition of REOutliers, the present invention proposes a kind of REOutliers detection method based on pattern, letter
It is written as P-REOD.The method anomaly that role developed is divided into schema extraction and abnormality detection two parts, first basis
Aprioir algorithm calculates frequent mode that all nodes occur at the role's evolutionary process trend as whole network, then root
According to the network trends detection REOutliers obtained.
(3) present invention tests, to obtain respectively on three truthful data collection (Enron, Facebook, DBLP)
REOutliers is made that and reasonably explains and analyze.
Accompanying drawing explanation
Fig. 1 is the flow chart of the method that the anomalous structure evolution in the dynamic network that the embodiment of the present invention one provides finds.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below by the detailed description of the invention to the present invention
Carry out clear, complete description.
Embodiment one
Anomalous structure in the dynamic network that Fig. 1 provides for the embodiment of the present invention one develops the flow chart of the method found.
As it is shown in figure 1, the method comprising the steps of S1~S2, the detailed process in following steps is referred to the content in description.
After step S1, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network.
Step S2, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
Separately below step S1 and step S2 are described in detail.
Step S1 specifically includes step:
Each figure snapshot is carried out role's discovery respectively, obtains role's distribution series of each node.Wherein to each figure
Snapshot carries out the method for role's distribution series that role's discovery obtains each node respectively:
To each node, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and obtain
To recursive feature, basic feature and recursive feature collectively form role's distribution series of node.
The frequent mode set of role's distribution series, fuzzy frequent itemsets excavation obtained is excavated according to Aprioir algorithm
Cooperation is role's evolution trend of network.According to the method for Aprioir algorithm excavation role's distribution series it is wherein:
The most respectively role's distribution series in each moment is clustered, all cluster centres all conducts obtained
Length-1 pattern, by design support screening Length-1 frequent mode, and closes downwards according to Apriori algorithm checking
Attribute, then calculates longer frequent mode.
As can be seen from the above embodiments, the embodiment of the present invention anomaly that role developed is divided into schema extraction and exception
Detection two parts, first calculate, according to the thought of Aprioir algorithm, the frequent mode that all nodes occur at role's evolutionary process
As the trend of whole network, then according to the network trends detection REOutliers obtained.
It is to be appreciated that the embodiment of foregoing description is a part of embodiment of the present invention rather than whole embodiment, the most not
It it is limitation of the present invention.Based on embodiments of the invention, those of ordinary skill in the art are not paying creative work premise
Lower obtained every other embodiment, broadly falls into protection scope of the present invention.
Below by the method experiment Analysis of embodiment one.
Experimental analysis
In experimental analysis part, the result mainly P-REOD algorithm tested on truthful data collection, and effectively
Property analyze, specifically include procedure below:
1, data set describes & feature extraction
The present invention is tested respectively on three data sets that scale is different, is briefly described as follows:
(1) Enron: the mail network of Enron Corp employee.Enron was once the energy that the U.S. enjoys great reputation
Company, but go bankrupt suddenly between calendar year 2001, thus cause the interest of numerous researcher.The present invention is elite take January calendar year 2001~
The data construct dynamic network of December is analyzed, and is monthly a network snapshots, and wherein node represents employee, while represent employee
Between mail contact.
(2) Facebook-wall (being called for short Facebook): user's wall data of Facebook, each user has oneself
Wall, other users can leave a message on the wall.The present invention chooses in January, 2008~the data of December, is monthly a snapshot, wherein
The node of network represents a Facebook user, while the message relation represented between user.
(3) DBLP: scientific research cooperative person network, the author jointly published thesis has partnership.The present invention chooses 2002
Year~the data construct dynamic network of totally 10 years in 2011, be a snapshot every year, network node represents author, while represent author
Between partnership.Remove the sum author less than 10 papers that publishes thesis.
Experiment finds the interstitial content comprising each snapshot after dynamic network division and differs, and i.e. there is midway and goes out
Now or disappear node, these nodes are likely to become last abnormity point very little due to the pattern that can mate.Need to analyze
Want, present invention preserves complete network structure, but only consider always present in the set of node in network when calculating abnormity point
Close.Three data concentrate the node set size respectively 2114 always existed, 5111, the details of 29747. data sets are shown in
Table 1.
The present invention uses the method that the time decays to calculate the weight on limit, i.e. in the weight calculating two node current times
Time, consider the weight of all historical junctures simultaneously, it is believed that from current time more away from affect the least, see formula (6).Wherein wiRepresent ti
The weight of relation between moment node a and b, sets weight threshold as w*, filter all weights less than w*Limit.The present invention take λ=
1,w*=0.1,
Three data sets are carried out feature extraction operation respectively, obtains node diagnostic matrix, V=< V1,V2,…,VT>,
The Characteristic Number extracted is without artificial given, and the number of features of the biggest extraction of network size is the most.
Table 1 data set details
2, role's extraction is explained with role
Although document 3 (K.Henderson, B.Gallagher, L.Li, L.Akoglu, T.Eliassi-Rad, H.Tong,
and C.Faloutsos.RolX:Role Extraction and Mining in Large Networks.In KDD,
2012.) reasonability to role is made that strong explanation and proof, but in order to preferably explain the present invention's
REOutlier, first this experiment extends the method for document 3, and making explanations role as a example by Enron data set, (other are two years old
Individual data set is similar to).Choosing role's number r is 4, and the snapshot corresponding to 12 months data set calendar year 2001s of Enron carries out angle respectively
Color extraction can obtain angle of rotation of joint colour moment battle array G=< G1,G2,…,GT>, whereinEvery string correspondence one angle
Color.
The 4 kinds of roles obtained for reasonable dismissal, the present invention extends in document 3 and explains role's with common peer metric
Method, has selected 6 kinds of common tolerance (degree of node, band measures and weights, betweenness, eigenvector centrality, Weighted characterization vector center
And PR value).First calculate node and can get peer metric matrix at 6 kinds of metrics of tWherein m is degree
Amount number (embodiment of the present invention one is set to 6).Then reuse Non-negative Matrix Factorization and seek matrixMake GtEt
≈Mt, wherein GtIt it is still the angle of rotation of joint colour moment battle array of t.Matrix EtEvery a line correspondence one role, respectively to should role
Value in 6 kinds of tolerance.The role's metric matrix calculating Enron data set according to above method obtains E=< E1,E2,…,
ET>, obtains the meansigma methods of 4 kinds of roles after being averaging, result shows that 4 kinds of roles have respective obvious characteristic: R1 at 6 kinds of degree
There is in amount bigger value, R2 only in eigenvector centrality tolerance value be not 0, R3 value in 6 kinds of tolerance all connects
Being bordering on, R4 is then equal to 0.
3, REOutliers analyzes
Owing to lacking the background knowledge of concrete application, the assessment to REOutliers becomes more difficulty.In order to this is described
The effectiveness of invention algorithm, the present invention first calculate all node roles distribution meansigma methods as measurement, then choose different
The minimum and maximum node of ordinary index is analyzed.It is the most still main analysis inventive algorithm with Enron data set
Reasonability.
4, Enron data set
Selecting minSup is 0.1 to there are 5511 frequent modes, and the pattern merging identical time point obtains structure 555
Individual.Two nodes (No. 17172 and No. 29659 nodes) that the present invention chooses abnormality score maximum are analyzed, and calculate all
The node that the overall situation is average and exceptional value is minimum of node role's value as a comparison, draws role's evolution diagram respectively.By dividing
Analysis understands, and 17172 there occurs different with the structure role of 29,659 two nodes in the middle of the evolutionary process of 12 months calendar year 2001s really
Reason condition.
As shown in the above, four kinds of role R1~R4 of this data set represent respectively Centroid, have important directly
Neighbor node, general node and fringe node four kinds, thus can continue to analyze the exception knot of gained abnormity point on this basis
Structure situation of change.As a example by No. 17172 nodes, analyze the structure obtaining this node and be found that once abnormal, in June in June
This node one was directly subordinate to the Centroid of network in the past, suddenly became the node with important neighbours June, after August
Then become fringe node.In order to verify that the anomalous structure of No. 17172 nodes changes further, analyze further and obtain, node
17172 at strictly one node being in center in May, suddenly becomes only one of which immediate neighbor when June i.e.
Node 3286, it is a particularly important node when June that analysis obtains node 3286, thus demonstrates above-mentioned analysis.Thing
Shi Shang Enron really there occurs the anomalous events such as replacing CEO in calendar year 2001 6~August, and this confirms this most from another point of view
The reasonability of invention abnormity point.
5, Facebook & DBLP data set
Facebook and DBLP data set chooses minSup and is 0.11, respectively obtain frequent mode be 6286,
4723, the node still choosing exceptional value the highest is analyzed, with global mean value and the exceptional value lowest section of node role
Point, as to contrast, draws node role's evolution diagram.From role's evolution diagram, compare overall situation draw and exceptional value is minimum
Node, the abnormity point that P-REOD algorithm obtains is implicitly present in obvious abnormal conditions.
Comprehensive the above, the carried role of the embodiment of the present invention one abnormity point that develops can be portrayed in dynamic network data set
There is abnormal node and the abnormal moment occur in node structure, excavates such abnormal nodes and can help to find in data
The event hidden and special joint, if enough nodal informations can be obtained, then can help to make a policy further.
Claims (7)
1. the anomalous structure in a dynamic network develops the method found, it is characterised in that include step:
After A, a given dynamic network, excavate the frequent mode of role's Temporal Evolution in whole network;
B, carried out role by the different degree of comparison node to frequent mode and develop anomaly.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described step
Rapid A specifically includes step:
A1, each figure snapshot is carried out role's discovery respectively, obtain role's distribution series of each node;
A2, according to Aprioir algorithm excavate role's distribution series frequent mode set, the frequent mode set that excavation is obtained
Role's evolution trend as network.
The method that anomalous structure evolution in dynamic network the most according to claim 2 finds, it is characterised in that described step
The method that each figure snapshot carries out in rapid A1 role's distribution series that role's discovery obtains each node respectively is:
To each node, first obtain basic feature according to its automatic network structure, assemble basic feature the most iteratively and passed
Feature, basic feature and recursive feature is returned to collectively form role's distribution series of node.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described step
In rapid A2, the method according to Aprioir algorithm excavation role's distribution series is:
Clustering role's distribution series in each moment the most respectively, all cluster centres obtained are all as Length-1
Pattern, by design support screening Length-1 frequent mode, and closes downwards attribute, so according to Apriori algorithm checking
The frequent mode that rear calculating is longer.
The method that anomalous structure evolution in dynamic network the most according to claim 4 finds, it is characterised in that described
Degree of holding is configured according to the distance of node to cluster centre.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described step
Rapid B specifically includes step:
The identical frequent mode of point of all cover times is merged and obtains a time structure;
Each time structure is calculated respectively the frequent mode most preferably mated with this node, obtains node corresponding with all structures
The frequent mode set of optimal coupling, then summing node obtains last different with all optimal intensity of anomalys mating frequent mode
Ordinary index.
The method that anomalous structure evolution in dynamic network the most according to claim 1 finds, it is characterised in that described dynamic
State network is node set or limit set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610474974.2A CN106202614A (en) | 2016-06-24 | 2016-06-24 | The method that anomalous structure evolution in dynamic network finds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610474974.2A CN106202614A (en) | 2016-06-24 | 2016-06-24 | The method that anomalous structure evolution in dynamic network finds |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106202614A true CN106202614A (en) | 2016-12-07 |
Family
ID=57461279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610474974.2A Pending CN106202614A (en) | 2016-06-24 | 2016-06-24 | The method that anomalous structure evolution in dynamic network finds |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202614A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423743A (en) * | 2017-03-20 | 2017-12-01 | 重庆邮电大学 | The dynamic network state evolution method for visualizing of introduced feature component similarity |
CN107609330A (en) * | 2017-08-31 | 2018-01-19 | 中国人民解放军国防科技大学 | Access log mining-based internal threat abnormal behavior analysis method |
CN107705212A (en) * | 2017-07-07 | 2018-02-16 | 江苏开放大学 | A kind of role recognition method based on population random walk |
CN109446836A (en) * | 2018-10-09 | 2019-03-08 | 上海交通大学 | A kind of social networks personal information propagation access control method |
CN109818892A (en) * | 2019-01-18 | 2019-05-28 | 华中科技大学 | Construct Cyclic Spectrum characteristic parameter extraction model and signal modulation mode recognition methods |
CN110557294A (en) * | 2019-09-25 | 2019-12-10 | 南昌航空大学 | PSN (packet switched network) time slicing method based on network change degree |
CN111626891A (en) * | 2020-06-03 | 2020-09-04 | 四川大学 | Dynamic sale network community discovery method based on extended nodes |
CN112527784A (en) * | 2020-12-08 | 2021-03-19 | 天津大学 | Abnormal mode mining and incremental abnormal detection method based on complex network |
CN115114488A (en) * | 2022-07-15 | 2022-09-27 | 中国西安卫星测控中心 | Dynamic information network abnormal evolution node detection method based on role discovery |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104091206A (en) * | 2014-06-18 | 2014-10-08 | 北京邮电大学 | Social network information transmission prediction method based on evolutionary game theory |
CN105243448A (en) * | 2015-10-13 | 2016-01-13 | 北京交通大学 | Method and device for predicting evolution trend of internet public opinion |
-
2016
- 2016-06-24 CN CN201610474974.2A patent/CN106202614A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104091206A (en) * | 2014-06-18 | 2014-10-08 | 北京邮电大学 | Social network information transmission prediction method based on evolutionary game theory |
CN105243448A (en) * | 2015-10-13 | 2016-01-13 | 北京交通大学 | Method and device for predicting evolution trend of internet public opinion |
Non-Patent Citations (2)
Title |
---|
张永辉 等: "基于结构分析的信息网络社团趋势预测", 《计算机科学与探索》 * |
李艳梅 等: "动态信息网络中的角色演化异常及其发现", 《计算机科学与探索》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423743A (en) * | 2017-03-20 | 2017-12-01 | 重庆邮电大学 | The dynamic network state evolution method for visualizing of introduced feature component similarity |
CN107705212A (en) * | 2017-07-07 | 2018-02-16 | 江苏开放大学 | A kind of role recognition method based on population random walk |
CN107705212B (en) * | 2017-07-07 | 2021-06-15 | 江苏开放大学 | Role identification method based on particle swarm random walk |
CN107609330B (en) * | 2017-08-31 | 2019-12-06 | 中国人民解放军国防科技大学 | Access log mining-based internal threat abnormal behavior analysis method |
CN107609330A (en) * | 2017-08-31 | 2018-01-19 | 中国人民解放军国防科技大学 | Access log mining-based internal threat abnormal behavior analysis method |
CN109446836A (en) * | 2018-10-09 | 2019-03-08 | 上海交通大学 | A kind of social networks personal information propagation access control method |
CN109446836B (en) * | 2018-10-09 | 2022-02-15 | 上海交通大学 | Social network personal information propagation access control method |
CN109818892A (en) * | 2019-01-18 | 2019-05-28 | 华中科技大学 | Construct Cyclic Spectrum characteristic parameter extraction model and signal modulation mode recognition methods |
CN110557294A (en) * | 2019-09-25 | 2019-12-10 | 南昌航空大学 | PSN (packet switched network) time slicing method based on network change degree |
CN111626891A (en) * | 2020-06-03 | 2020-09-04 | 四川大学 | Dynamic sale network community discovery method based on extended nodes |
CN111626891B (en) * | 2020-06-03 | 2023-08-01 | 四川大学 | Dynamic sales network community discovery method based on extension node |
CN112527784A (en) * | 2020-12-08 | 2021-03-19 | 天津大学 | Abnormal mode mining and incremental abnormal detection method based on complex network |
CN115114488A (en) * | 2022-07-15 | 2022-09-27 | 中国西安卫星测控中心 | Dynamic information network abnormal evolution node detection method based on role discovery |
CN115114488B (en) * | 2022-07-15 | 2024-03-26 | 中国西安卫星测控中心 | Dynamic information network abnormal evolution node detection method based on role discovery |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106202614A (en) | The method that anomalous structure evolution in dynamic network finds | |
Cherifi et al. | On community structure in complex networks: challenges and opportunities | |
CN109005055B (en) | Complex network information node importance evaluation method based on multi-scale topological space | |
Takaffoli et al. | Incremental local community identification in dynamic social networks | |
US8605092B2 (en) | Method and apparatus of animation planning for a dynamic graph | |
Preciado et al. | Moment-based spectral analysis of large-scale networks using local structural information | |
CN110322356B (en) | Medical insurance abnormity detection method and system based on HIN mining dynamic multi-mode | |
Marcus et al. | Efficient counting of network motifs | |
WO2018203956A1 (en) | Systems and methods to detect clusters in graphs | |
Guillaume et al. | Relevance of massively distributed explorations of the internet topology: Simulation results | |
CN110909173A (en) | Non-overlapping community discovery method based on label propagation | |
Shang | Generalized k-core percolation in networks with community structure | |
Chang et al. | Relative centrality and local community detection | |
Gogoi et al. | A rough set–based effective rule generation method for classification with an application in intrusion detection | |
Yin et al. | Measuring directed triadic closure with closure coefficients | |
Kim et al. | Relational flexibility of network elements based on inconsistent community detection | |
Corneli et al. | The dynamic stochastic topic block model for dynamic networks with textual edges | |
Flossdorf et al. | Change detection in dynamic networks using network characteristics | |
Han et al. | On the complexity of counterfactual reasoning | |
Godziszewski et al. | Attacking similarity-based sign prediction | |
CN103164487B (en) | A kind of data clustering method based on density and geological information | |
Porter et al. | Analytical models for motifs in temporal networks | |
CN107276807B (en) | Hierarchical network community tree pruning method based on community dynamic compactness | |
Wienöbst et al. | Recovering causal structures from low-order conditional independencies | |
CN116668105A (en) | Attack path reasoning system combined with industrial control safety knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161207 |