CN110263831A - A kind of local high-order figure clustering method based on difference privacy - Google Patents
A kind of local high-order figure clustering method based on difference privacy Download PDFInfo
- Publication number
- CN110263831A CN110263831A CN201910490628.7A CN201910490628A CN110263831A CN 110263831 A CN110263831 A CN 110263831A CN 201910490628 A CN201910490628 A CN 201910490628A CN 110263831 A CN110263831 A CN 110263831A
- Authority
- CN
- China
- Prior art keywords
- node
- digraph
- vector
- thermonuclear
- motif
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000005295 random walk Methods 0.000 claims abstract description 52
- 239000011159 matrix material Substances 0.000 claims abstract description 42
- 230000005012 migration Effects 0.000 claims abstract description 33
- 238000013508 migration Methods 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 238000013480 data collection Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000009792 diffusion process Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims 1
- 230000037430 deletion Effects 0.000 claims 1
- 230000007704 transition Effects 0.000 claims 1
- 238000005192 partition Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000004088 simulation Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Abstract
The invention discloses a kind of local high-order figure clustering method based on difference privacy; the specific steps of this method are as follows: utilize the structure of local high-order network subgraph Motif; original social networks is converted to the adjacency matrix based on Motif; diversity based on Motif network subgraph structure; certain threshold value is arranged to the weight in the Motif adjacency matrix of generation; and Laplacian noise disturbance is carried out to the weight in threshold range, realize the secret protection to social networks subgraph;In order to promote the efficiency of Random Walk Algorithm operation, using approximate thermonuclear page rank seed algorithm, random walk is carried out to the Motif matrix after disturbance, according to the cutting ratio of the thermonuclear vector computation partition set after migration, and exports cluster set.
Description
Technical field
The invention belongs to technical field of data security, in particular to a kind of local high-order figure cluster side based on difference privacy
Method.
Background technique
With the rise of the social medias such as blog, microblogging, using user as node, fast as the social networks on side using customer relationship
Surge length.The relationships such as interest, behavior, the function of user make in social networks that there are multiple communities or clusters.Its efficient cluster side
Method makes the intimate degree for understanding oneself place community that user is more efficient, but during cluster, there are infringement privacy of user
Risk.On the one hand, user worries to include too many content in cluster process, can reveal the privacy information of oneself;On the other hand,
According to network subgraph feature, there are many uncertain factors when to social user progress community's division, limit the energy of its technological improvement
Power;Therefore, processing private data usually requires to consider the balance between availability of data and secret protection.
In entire figure cluster process, important component of the social networks as diagram data is widely used in various
In clustering method, cluster passes through the low-level features such as aeoplotropism, the architectural characteristic of high-order subgraph of network, the dense pass of node link
The advanced features such as system, nodal function characteristic, then carry out poly- division to network node, calculate the cutting ratio of same subset vector
(Cheeger ratio) carries out optimal cutling eventually by the Cheeger ratio of dividing subset;But it is currently based on high-order
The clustering of subgraph does not carry out secret protection to data, and then may result in the leakage of network data, not can guarantee
The safety of user data.
Summary of the invention
It is a kind of hidden based on difference it is an object of the invention to propose for above-mentioned Privacy Protection of the existing technology
Private local high-order figure clustering method, the present invention are devised a kind of hidden using the architectural characteristic and difference privacy of social networks subgraph
Private protection model, is based on thermonuclear page rank random walk principle, by limiting the step number of random walk, solves complicated social activity
The clustering problem of network part high-order figure;In specific improve, based on the structure of social networks subgraph, first by primitive society's net
Network is converted into the Motif weight matrix based on network subgraph, then by the Motif weight matrix of building, to determine to Motif
The secret protection dynamics of weight matrix;Finally using approximate thermonuclear page rank seed algorithm to the Motif weight square after disturbance
Battle array carries out random walk, so that social networks under the premise of secret protection, can still there is good cluster result.
Technical principle of the invention: using the structure of local high-order network subgraph Motif, original social networks is converted to
Adjacency matrix based on Motif, based on the diversity of Motif network subgraph structure, to the power in the Motif adjacency matrix of generation
The certain threshold value of weight (number of Motif is generated between node) setting, and Laplce is used to the weight in threshold range
Mechanism carries out noise disturbance, realizes the secret protection to social networks subgraph;In order to promote the efficiency of Random Walk Algorithm operation,
Using approximate thermonuclear page rank seed algorithm, random walk is carried out to the Motif matrix after disturbance, according to the thermonuclear after migration
The cutting ratio of vector computation partition set, and export cluster set.
In order to achieve the above object, the present invention is resolved using following technical scheme.
A kind of local high-order figure clustering method based on difference privacy, comprising the following steps:
Step 1, social network data collection is obtained, i.e. digraph G (V, E) chooses the M in triangle Motif model7Connection knot
Structure, the high-order network subgraph Motif structure as digraph G (V, E);Construct Motif adjacency matrix, weight matrix WM;Using
Difference privacy algorithm interferes Motif structure number in the Motif weight matrix of digraph, obtains with secret protection
Digraph Gλ′。
Wherein, V is node collection, and E is side collection.
Sub-step 1.1 constructs Motif weight matrix WM, specifically: it calculates in digraph G (V, E) from node viTo section
Point vjBetween all paths in the quantity of Motif structure that generates, and as between corresponding node in adjacency matrix
Weight w generates Motif weight matrix WM。
Sub-step 1.2, using difference privacy algorithm, to the Motif weight matrix W of digraphMIn Motif number carry out
Interference, the weight matrix W after being disturbedM', and then obtain the digraph G with secret protectionλ′。
Step 2, using approximate thermonuclear page rank seed node algorithm to the digraph G with secret protectionλ' carry out base
In the random walk of the thermonuclear page, the approximate thermonuclear vector of node is obtained
Sub-step 2.1, by the digraph G with secret protectionλ' random chance conversion is carried out, obtain random chance conversion
Matrix P, calculation formula are as follows:
Wherein, vi~vjIndicate node viTo node vjBetween there are chaining edges.
Sub-step 2.2, sets the diffusion parameter of random walk as t, error parameter δ, δ ∈ (0,1);
Selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on thermonuclear parallel
The random walk of page rank, the approximate thermonuclear vector of calculate node
Wherein, u ∈ V, t are non-negative real number,Expression migration step number is klWhen, the instruction vector of seed node, dimension
Degree is 1 × n;HtIndicate the digraph G with secret protectionλ' thermonuclear page rank, klIndicate random walk step number, and kl≤
K, klFor positive integer;The value of constant c is 1;Vol (G) indicates the volume of digraph G, i.e. digraph
The total number of Motif structure in G;Indicate seed node urDegree, r indicates the corresponding seed node of a migration step number
Serial number, rmaxIndicate the total number of the corresponding seed node of a migration step number,N is node collection V interior joint
Total number, and rmax≤n。
Step 3, using Local Clustering algorithm to each approximate thermonuclear vectorIt is cut, obtains approximate thermonuclear vector
Local Clustering to get arrive digraph G (V, E) Local Clustering set, complete digraph G (V, E) Local Clustering.
Sub-step 3.1, to each approximate thermonuclear vectorBe standardized operation, the approximate thermonuclear after being standardized to
AmountSpecifically:
First to each approximate thermonuclear vectorIn each element divided by rmax, obtain vectorAgain by vectorIn each element divided by the degree of its corresponding node, obtain vectorFinally, by vectorIn
Element carry out descending arrangement, the approximate thermonuclear vector after being standardized
Sub-step 3.2, setting target cutting is than being φ, to the approximate thermonuclear vector after each standardizationCarry out sweep
Scanning, finds out corresponding target collection Sj, complete thermonuclear vectorCutting, obtain the Local Clustering collection of digraph G (V, E)
It closes, completes the Local Clustering of digraph G (V, E);The specific steps are that:
Firstly, choosing the approximate thermonuclear vector after each standardizationThe 1st element be the first candidate collection;Judge
Whether one candidate collection meets Local Clustering condition, if the first candidate collection meets Local Clustering condition, judges the first candidate
Collection is combined into target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 2nd element be added the first candidate collection formed
Second candidate collection;Judge whether the second candidate collection meets cluster condition, if the second candidate collection meets Local Clustering condition,
Then judge the second candidate collection for target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 3rd element be added
Second candidate collection forms third candidate collection;And so on, until finding target collection S1;
Secondly, to the approximate thermonuclear vector after standardizationIn be not included in target collection S1In element continue
Scanning, the approximate thermonuclear vector after standardizationAll elements scan complete, obtain target collection S1, S2...,
Sj..., SJ;
The Local Clustering condition are as follows:
Wherein,Vol (G) indicates the total number of Motif structure in digraph G;φ(Sj) indicate target
Set SjPartial cut ratio;cut(Sj) indicate at least one terminal node in SjIn, another terminal node existsIn
The number of Motif structure, vol (Sj) indicate target collection SjVolume, i.e. target collection SjIn node in digraph G shape
At Motif structure total number;Indicate approximate thermonuclear vectorMiddle target collection SjSupplementary set;Indicate target
Collect complement of a setIn the total number of Motif structure that is formed in digraph G of node.
Compared with prior art, it the invention has the benefit that the present invention is directed to local high-order network subgraph, fully considers
To the important feature of the local sub- community of complicated social networks, and according to these significance levels, privacy is carried out to high-order network subgraph
Protection;Furthermore, it is contemplated that approximate PageRank algorithm is the geometric process of random walk on the diagram, and thermonuclear page-ranking be with
The index of machine migration and, improve the operational efficiency of algorithm;By above-mentioned secret protection step, so that social networks is protected in privacy
Under the premise of shield, the Clustering Effect of local high-order still can be preferably realized.
Detailed description of the invention
The present invention is described in further details in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of flow chart of local high-order figure clustering method based on difference privacy of the invention;
Fig. 2 is the M in intermediate cam Motif model of the present invention7Attachment structure schematic diagram;
Fig. 3 is the average L1 error result schematic diagram being calculated under different migration step numbers;
Fig. 4 is that average L1 error result of the netscience data set after noise processed shows under different migration step numbers
It is intended to;
Fig. 5 is crosspoint difference results schematic diagram of three kinds of data sets under different migration step numbers;
Fig. 6 is cond-mat-2003 data set crosspoint difference results after noise processed under different migration step numbers
Schematic diagram;
Specific embodiment
The embodiment of the present invention and effect are described in further detail with reference to the accompanying drawing.
Referring to Fig.1, the present invention realizes according to the following steps:
Step 1, social network data collection is obtained, i.e. digraph G (V, E) chooses the M in triangle Motif model7Connection knot
Structure (as shown in Figure 2), the high-order network subgraph Motif structure as digraph G (V, E);Construct Motif weight matrix WM, adopt
With difference privacy algorithm, Motif structure number in the Motif weight matrix of digraph is interfered, the power after being disturbed
Weight matrix WM′;
Wherein, V is node collection, and E is side collection.
Sub-step 1.1 constructs Motif weight matrix WM, specifically: it calculates in digraph G (V, E) from node viTo section
Point vjBetween all paths in the quantity of Motif structure that generates, and as between corresponding node in adjacency matrix
Weight w generates Motif weight matrix WM。
Sub-step 1.2, using difference privacy algorithm, to the Motif weight matrix W of digraphMIn Motif number carry out
Interference, obtains the digraph G with secret protectionλ', the specific steps are that:
Sub-step 1.2.1 constructs new digraph Gλ, make new digraph GλThe upper limit of middle Motif structure number is λ;
Firstly, calculating separately and each node viThe Motif structure number Tri of connectioni(G)=wi;
Then, for vi∈ V, judgement and each node viThe Motif structure number w of connectioniWhether upper limit λ is greater than, if wi
>=λ, then delete and viMotif structure number is greater than the side of λ in the node of connection, otherwise, retains node viBetween corresponding node
Side;To obtain new digraph Gλ。
Sub-step 1.2.2, to new digraph GλIn Motif structure number addition Laplacian noise interference, obtain
Weight matrix W after disturbanceM;
Wherein, Lap () indicates that Laplce's mechanism probability density function, ε are the privacy budget of Motif distribution;
And then obtain the digraph G with secret protectionλ′。
Step 2, using approximate thermonuclear page rank seed node algorithm to the digraph G with secret protectionλ' carry out base
In the random walk of the thermonuclear page, the corresponding thermonuclear vector ρ of seed node is obtained;
Sub-step 2.1, by the digraph G with secret protectionλ' random chance conversion is carried out, obtain random chance conversion
Matrix P, calculation formula are as follows:
Wherein, vi~vjIndicate node viTo node vjBetween there are chaining edges.
Sub-step 2.2, sets the diffusion parameter of random walk as t, error parameter δ, δ ∈ (0,1);
Selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on thermonuclear parallel
The random walk of page rank, the approximate thermonuclear vector of calculate node
Wherein, u ∈ V, t are non-negative real number,Expression migration step number is klWhen, the instruction vector of seed node, dimension
Degree is 1 × n;HtIndicate the digraph G with secret protectionλ' thermonuclear page rank, klIndicate random walk step number, and kl≤
K, klFor positive integer;The value of constant c is 1;Vol (G) indicates the volume of digraph G, i.e. digraph
The total number of Motif structure in G;Indicate seed node urDegree, r indicates the corresponding seed node of a migration step number
Serial number, rmaxIndicate the total number of the corresponding seed node of a migration step number,N is node collection V interior joint
Total number, and rmax≤n。
The selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on parallel
The random walk of thermonuclear page rank, the specific steps are that:
Firstly, calculating the digraph G with secret protectionλ' in each node degree
Secondly, by the degree of node by the digraph G with secret protectionλ' all nodes it is descending be ranked up,
Finally, setting the step number of the 1st parallel random walk as k1, r is successively chosen according to the degree of node is descendingmax
A seed node, one seed node of every selection, using the seed node as starting point, in the digraph G with secret protectionλ′
It is upper to carry out parallel random walk;Obtain corresponding approximate thermonuclear vector
The step number of the 2nd parallel random walk is set as k2=k1+ Δ k successively chooses according to the degree of node is descending
rmaxA seed node, one seed node of every selection, using the seed node as starting point, in the digraph with secret protection
Gλ' above carry out parallel random walk;Obtain corresponding approximate thermonuclear vector
And so on, until the parallel random walk of kth, obtains corresponding approximate thermonuclear vector
Wherein, Δ k is the migration step number increment when the previous parallel more last parallel random walk of random walk, and Δ k is
Positive integer, general value are (0.01-0.1) n.
It is described in the digraph G for having secret protectionλIt is upper to carry out parallel random walk, specifically: with secret protection
Digraph GλOn, using seed node u as starting point, the parallel neighbours along the side migration being connected with seed node to seed node are saved
Point (another endpoint on side) completes the parallel random walk of a step;Again using neighbor node as starting point, parallel edge and each neighbor node phase
Even while migration to while another endpoint, and so on, until the step number of migration reaches the migration step number k of settingl, that is, complete
The parallel random walk of one seed node.
Step 3, using Local Clustering algorithm to each approximate thermonuclear vectorIt is cut, obtains approximate thermonuclear vector
Local Clustering to get arrive digraph G (V, E) Local Clustering set, complete digraph G (V, E) Local Clustering.
Sub-step 3.1, to each approximate thermonuclear vectorBe standardized operation, the approximate thermonuclear after being standardized to
Amount
Sub-step 3.2, setting target cutting is than being φ, to the approximate thermonuclear vector after each standardizationCarry out sweep
Scanning, finds out corresponding target collection Sj, complete thermonuclear vectorCutting, obtain the Local Clustering collection of digraph G (V, E)
It closes, completes the Local Clustering of digraph G (V, E);The specific steps are that:
Firstly, choosing the approximate thermonuclear vector after each standardizationThe 1st element be the first candidate collection;Judge
Whether one candidate collection meets Local Clustering condition, if the first candidate collection meets Local Clustering condition, judges the first candidate
Collection is combined into target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 2nd element be added the first candidate collection formed
Second candidate collection;Judge whether the second candidate collection meets cluster condition, if the second candidate collection meets Local Clustering condition,
Then judge the second candidate collection for target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 3rd element be added
Second candidate collection forms third candidate collection;And so on, until finding target collection S1;
Secondly, to the approximate thermonuclear vector after standardizationIn be not included in target collection S1In element continue into
Row scanning, the approximate thermonuclear vector after standardizationAll elements scan complete, obtain target collection S1, S2...,
Sj..., SJ;
The standardized specific steps are as follows: first to each approximate thermonuclear vectorIn each element divided by rmax, obtain
VectorAgain by vectorIn each element divided by the degree of its corresponding node, obtain vectorMost
Afterwards, by vectorIn element carry out descending arrangement, the approximate thermonuclear vector after being standardized
The Local Clustering condition are as follows:
Wherein,Vol (G) indicates the total number of motif structure in digraph G;φ(Sj) indicate object set
Close SjPartial cut ratio;cut(Sj) indicate at least one terminal node in SjIn, another terminal node existsIn
The number of motif structure, vol (Sj) indicate target collection SjVolume, i.e. target collection SjIn node in digraph G shape
At Motif structure total number;Indicate approximate thermonuclear vectorMiddle target collection SjSupplementary set;Indicate target
Collect complement of a setIn the total number of Motif structure that is formed in digraph G of node.
In the present invention, partial cut ratio φ (Sj) it is also referred to as the main body conductance of target collection, theme conductance is smaller, indicates poly-
The effect of class is more excellent, can evaluate clustering result quality with this index.
Emulation experiment
1, simulated conditions:
Using TensorFlow1.4.0 CPU version, Python2.7, Windows Server 2008R2
64 bit manipulation system version of Enterprise is as exploitation environment, using netscience, polblogs, cond-mat-2003
Social network data collection.
In each emulation experiment, accurate thermonuclear vector ρ is calculated with traditional thermonuclear PageRank algorithmT, u, and using this
Inventive method calculates approximate thermonuclear vector, with mean absolute error, crosspoint difference the two error criterions, measures accurate thermal
The similarity of core vector sum approximation thermonuclear vector illustrates random step number corresponding to approximate thermonuclear vector when error amount very little
It is optimal, also it is closest to the link structure of original social networks.
Compare with traditional accurate thermonuclear vector and approximate thermonuclear vector node sequencing obtained of the invention, specifically
For, consider the upper limit of limitation random walk step number, compares the variation of its accuracy.
(1) mean absolute error: with average L1 error amount (Average L1Error) to accurate after random walk
Thermonuclear vector ρT, uWith approximate thermonuclear vector valueError be compared.Average L1The calculation formula of error are as follows:
Wherein, ρT, u(vi) indicate corresponding node v in accurate thermonuclear vectoriValue,It is close after indicating standardization
Like node v corresponding in thermonuclear vectoriValue.
(2) crosspoint difference: the similarity to sort with crosspoint difference measuring node;I.e. respectively give node approximation to
Amount sequence A and accurate vector order B, the length of each sequence are i, then the sequence of node vector approximation and accurate vector order
Difference are as follows:
Wherein, dist (A, B) indicates node vector approximation sequence AiWith accurate vector order BiDifference,Indicate exclusive or
Operation, | | expression takes absolute value.
2, the simulation experiment result one
Three kinds of social network datas collection netscience, polblogs, cond-mat-2003 are chosen, these three societies are compared
Hand over Network data set under different random migration step number, using accurate thermonuclear vector and approximate its knot vector of thermonuclear vector measurement
Absolute error.From the figure 3, it may be seen that different random walk steps generates different vector absolute errors.Generally, three societies
The absolute error tendency gap for handing over Network data set is not too greatly, for more sparse social network data collection netscience
For, vector absolute error gap under different migration step numbers is more obvious.For polblogs and cond-mat-2003 two
For a large data collection error comparison, the connection of data set node is relatively intensive, and data set number of nodes is larger, in different migration
Under step number K, absolute error difference is smaller.
3, the simulation experiment result two
Sparse social network data collection netscience is chosen, for it under the step number for not having to random walk, to generation
Motif adjacency matrix distribute different privacy budget ε, respectively 0.1,0.25,0.5,0.75, compare in different random migration
Under step number, the data influence of mean absolute error, the origin in Fig. 4 indicates no noise added disturbance corresponding to specific step number
Original mean absolute error value.As shown in Figure 4, different privacy budgets is distributed on Motif adjacency matrix, privacy budget is got over
Small, the noise of generation is bigger, and mean absolute error differs larger with the original average error value made an uproar that is not added;Privacy budget is bigger,
The noise of generation is smaller, and mean absolute error and the original average error value made an uproar of being not added are closer, meet difference privacy add make an uproar it is dry
The condition disturbed.
4, the simulation experiment result three
Three kinds of social network datas collection netscience, polblogs, cond-mat-2003 are chosen, these three societies are compared
Hand over Network data set under different random migration step number, using accurate thermonuclear vector and approximate its knot vector of thermonuclear vector measurement
The difference of sequence.As shown in Figure 5, different random walk step numbers generates different vector order difference.For more sparse
For social network data collection netscience, increase of the vector order difference with random walk step number, sequence difference decline
It is more obvious.For other two relatively intensive social network data collection, vector order difference is walked with random walk
Several increases, sequence difference decline are relatively gentle.
5, the simulation experiment result four
Relatively dense social network data collection cond-mat-2003 is chosen, for it under the step number for not having to random walk,
Distribute different privacy budget ε to the Motif adjacency matrix of generation, respectively 0.1,0.25,0.5,0.75, compare difference with
Under machine migration step number, the data value of vector order difference influences.Origin in Fig. 6 indicates not to be added corresponding to specific step number
The original crosspoint difference value of noise disturbance.It will be appreciated from fig. 6 that distributing different privacy budgets, privacy on Motif adjacency matrix
Budget is smaller, and the noise of generation is bigger, and vector order difference differs larger with the original difference value made an uproar that is not added, and privacy budget is got over
Greatly, the noise of generation is smaller, and vector order difference and the original vector order difference value made an uproar of being not added are closer, meets difference privacy
Add the condition for interference of making an uproar.
By being tested above it is found that local high-order figure clustering method of the method for the present invention based on difference privacy, random in limitation
Under conditions of migration step number, when not protected with difference privacy technology, approximate thermonuclear vector and accurate thermonuclear vector it
Between mean error, vector order difference value minimum when, illustrate that the ordered state of approximate thermonuclear vector is optimal, divide the poly- of cutting
Class effect accuracy rate is higher.In the present invention, when with difference privacy technology to Motif weight matrix with Laplce's mechanism into
It row plus makes an uproar when interfering, chooses suitable privacy budget and random walk step number, not only can protect complicated social networks part high-order
The privacy information of subgraph it is also possible that the thermonuclear vector information that node random walk generates is not leaked, while being realized to multiple
The Local Clustering of miscellaneous social networks.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program
When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light
The various media that can store program code such as disk.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of local high-order figure clustering method based on difference privacy, which comprises the following steps:
Step 1, social network data collection is obtained, i.e. digraph G (V, E) chooses the M in triangle Motif model7Connection structure is made
For the high-order network subgraph Motif structure of digraph G (V, E);Construct Motif adjacency matrix, weight matrix WM;It is hidden using difference
Private algorithm interferes Motif structure number in the Motif weight matrix of digraph, obtains having the oriented of secret protection
Scheme Gλ';
Wherein, V is node collection, and E is side collection;
Step 2, using approximate thermonuclear page rank seed node algorithm to the digraph G with secret protectionλ' carry out based on heat
The random walk of the core page obtains the approximate thermonuclear vector of node
Step 3, using Local Clustering algorithm to each approximate thermonuclear vectorIt is cut, obtains approximate thermonuclear vectorOffice
Portion's cluster completes the Local Clustering of digraph G (V, E) to get the Local Clustering set for arriving digraph G (V, E).
2. a kind of local high-order figure clustering method based on difference privacy according to claim 1, which is characterized in that step
1 includes following sub-step:
Sub-step 1.1 constructs Motif weight matrix WM, specifically: it calculates in digraph G (V, E) from node viTo node vj
Between all paths in the quantity of Motif structure that generates, and as the weight between corresponding node in adjacency matrix
W generates Motif weight matrix WM;
Sub-step 1.2, using difference privacy algorithm, to the Motif weight matrix W of digraphMIn Motif number interfered,
Weight matrix W after being disturbedM' to get to secret protection digraph Gλ'。
3. a kind of local high-order figure clustering method based on difference privacy according to claim 2, which is characterized in that sub-step
Rapid 1.2 include following sub-step:
Sub-step 1.2.1 constructs new digraph Gλ, make new digraph GλThe upper limit of middle Motif structure number is λ;It is specific
Step are as follows:
Firstly, calculating separately and each node viThe Motif structure number Tri of connectioni(G)=wi;
Then, for vi∈ V, judgement and each node viThe Motif structure number w of connectioniWhether upper limit λ is greater than, if wi>=λ,
Then deletion and viMotif structure number is greater than the side of λ in the node of connection, otherwise, retains node viBetween corresponding node
Side;To obtain new digraph Gλ;
Sub-step 1.2.2, to new digraph GλIn Motif structure number addition Laplacian noise interference, after obtaining disturbance
Weight matrix WM';
Wherein, Lap () indicates that Laplce's mechanism probability density function, ε are the privacy budget of Motif distribution;
And then obtain the digraph G with secret protectionλ'。
4. a kind of local high-order figure clustering method based on difference privacy according to claim 1, which is characterized in that step
2 include following sub-step:
Sub-step 2.1, by the digraph G with secret protectionλ' random chance conversion is carried out, obtain random chance transition matrix P:
Wherein, vi~vjIndicate node viTo node vjBetween there are chaining edges;
Sub-step 2.2, sets the diffusion parameter of random walk as t, error parameter δ;Selected seed node, is opened from seed node
Begin, to the digraph G with secret protectionλ' the random walk based on thermonuclear page rank, the approximation of calculate node are initiated parallel
Thermonuclear vector
Wherein, (0,1) δ ∈, u ∈ V, t are non-negative real number,Expression migration step number is klWhen, the instruction vector of seed node,
Its dimension is 1 × n;HtIndicate the digraph G with secret protectionλ' thermonuclear page rank, klIndicate random walk step number, and
kl≤ K, klFor positive integer;The value of constant c is 1;Vol (G) indicates the volume of digraph G, that is, has
The total number of Motif structure into figure G;Indicate seed node urDegree, r indicates the corresponding seed node of a migration step number
Serial number, rmaxIndicate the total number of the corresponding seed node of a migration step number,N is to save in node collection V
The total number of point, and rmax≤n。
5. a kind of local high-order figure clustering method based on difference privacy according to claim 4, which is characterized in that described
Selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on thermonuclear page rank parallel
Random walk, the specific steps are that:
Firstly, calculating the digraph G with secret protectionλ' in each node degree
Secondly, by the degree of node by the digraph G with secret protectionλ' all nodes descending be ranked up;
Finally, setting the step number of the 1st parallel random walk as k1, r is successively chosen according to the degree of node is descendingmaxA seed
Node, one seed node of every selection, using the seed node as starting point, in the digraph G with secret protectionλ' above carry out
Parallel random walk;Obtain corresponding approximate thermonuclear vector
The step number of the 2nd parallel random walk is set as k2=k1+ △ k successively chooses r according to the degree of node is descendingmaxIt is a
Seed node, one seed node of every selection, using the seed node as starting point, in the digraph G with secret protectionλ' on
Carry out parallel random walk;Obtain corresponding approximate thermonuclear vectorAnd so on, until the parallel random walk of kth, obtains
To corresponding approximate thermonuclear vector
Wherein, △ k is the migration step number increment when the previous parallel more last parallel random walk of random walk, and △ k is positive whole
Number.
6. a kind of local high-order figure clustering method based on difference privacy according to claim 5, which is characterized in that described
In the digraph G for having secret protectionλIt is upper to carry out parallel random walk, specifically:
In the digraph G with secret protectionλOn, using seed node as starting point, the side migration that parallel edge is connected with seed node is extremely
The neighbor node of seed node completes the parallel random walk of a step;Again using neighbor node as starting point, parallel edge and each neighbor node
It is connected while migration to while another endpoint, and so on, until the step number of migration reaches the migration step number k of settingl, i.e., complete
At the parallel random walk of a seed node.
7. a kind of local high-order figure clustering method based on difference privacy according to claim 1, which is characterized in that step
3 include following sub-step:
Sub-step 3.1, to each approximate thermonuclear vectorIt is standardized, the approximate thermonuclear vector after being standardized
Sub-step 3.2, setting target cutting is than being φ, to the approximate thermonuclear vector after each standardizationSweep scanning is carried out,
Find out corresponding target collection Sj, complete thermonuclear vectorCutting, obtain the Local Clustering set of digraph G (V, E), complete
The Local Clustering of digraph G (V, E).
8. a kind of local high-order figure clustering method based on difference privacy according to claim 7, which is characterized in that sub-step
In rapid 3.1, the standardized specific steps are as follows: first to each approximate thermonuclear vectorIn each element divided by rmax, obtain
VectorAgain by vectorIn each element divided by the degree of its corresponding node, obtain vectorMost
Afterwards, by vectorIn element carry out descending arrangement, the approximate thermonuclear vector after being standardized
9. a kind of local high-order figure clustering method based on difference privacy according to claim 8, which is characterized in that described
To the approximate thermonuclear vector after each standardizationSweep scanning is carried out, corresponding target collection S is found outj, specific steps
Are as follows:
Firstly, choosing the approximate thermonuclear vector after each standardizationThe 1st element be the first candidate collection;Judge the first time
Whether selected works conjunction meets Local Clustering condition, if the first candidate collection meets Local Clustering condition, judges the first candidate collection
For target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 2nd element the first candidate collection be added form the
Two candidate collections;Judge whether the second candidate collection meets cluster condition, if the second candidate collection meets Local Clustering condition,
Judge the second candidate collection for target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 3rd element be added the
Two candidate collections form third candidate collection;And so on, until finding target collection S1;
Secondly, to the approximate thermonuclear vector after standardizationIn be not included in target collection S1In element continue to sweep
It retouches, the approximate thermonuclear vector after standardizationAll elements scan complete, obtain target collection S1, S2..., Sj...,
SJ。
10. a kind of local high-order figure clustering method based on difference privacy according to claim 9, which is characterized in that institute
State Local Clustering condition are as follows:
Wherein,Vol (G) indicates the total number of Motif structure in digraph G;φ(Sj) indicate target collection Sj
Partial cut ratio;cut(Sj) indicate at least one terminal node in SjIn, another terminal node existsIn Motif
The number of structure, vol (Sj) indicate target collection SjVolume, i.e. target collection SjIn node formed in digraph G
The total number of Motif structure;Indicate approximate thermonuclear vectorMiddle target collection SjSupplementary set;Indicate target collection
Supplementary setIn the total number of Motif structure that is formed in digraph G of node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910490628.7A CN110263831A (en) | 2019-06-06 | 2019-06-06 | A kind of local high-order figure clustering method based on difference privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910490628.7A CN110263831A (en) | 2019-06-06 | 2019-06-06 | A kind of local high-order figure clustering method based on difference privacy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110263831A true CN110263831A (en) | 2019-09-20 |
Family
ID=67917140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910490628.7A Pending CN110263831A (en) | 2019-06-06 | 2019-06-06 | A kind of local high-order figure clustering method based on difference privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263831A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723399A (en) * | 2020-06-15 | 2020-09-29 | 内蒙古科技大学 | Large-scale social network directed graph privacy protection method based on k-kernel |
CN112199728A (en) * | 2020-11-04 | 2021-01-08 | 同济大学 | Privacy protection method for social network relationship prediction |
CN113095490A (en) * | 2021-06-07 | 2021-07-09 | 华中科技大学 | Graph neural network construction method and system based on differential privacy aggregation |
CN114118407A (en) * | 2021-10-29 | 2022-03-01 | 华北电力大学 | Deep learning-oriented differential privacy usability measurement method |
CN117436130A (en) * | 2023-12-19 | 2024-01-23 | 暨南大学 | Differential privacy-based directed graph data security release method |
-
2019
- 2019-06-06 CN CN201910490628.7A patent/CN110263831A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723399A (en) * | 2020-06-15 | 2020-09-29 | 内蒙古科技大学 | Large-scale social network directed graph privacy protection method based on k-kernel |
CN111723399B (en) * | 2020-06-15 | 2023-08-29 | 内蒙古科技大学 | Large-scale social network directed graph privacy protection method based on k-kernel |
CN112199728A (en) * | 2020-11-04 | 2021-01-08 | 同济大学 | Privacy protection method for social network relationship prediction |
CN112199728B (en) * | 2020-11-04 | 2022-07-19 | 同济大学 | Privacy protection method for social network relationship prediction |
CN113095490A (en) * | 2021-06-07 | 2021-07-09 | 华中科技大学 | Graph neural network construction method and system based on differential privacy aggregation |
CN114118407A (en) * | 2021-10-29 | 2022-03-01 | 华北电力大学 | Deep learning-oriented differential privacy usability measurement method |
CN114118407B (en) * | 2021-10-29 | 2023-10-24 | 华北电力大学 | Differential privacy availability measurement method for deep learning |
CN117436130A (en) * | 2023-12-19 | 2024-01-23 | 暨南大学 | Differential privacy-based directed graph data security release method |
CN117436130B (en) * | 2023-12-19 | 2024-04-02 | 暨南大学 | Differential privacy-based directed graph data security release method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263831A (en) | A kind of local high-order figure clustering method based on difference privacy | |
Wu et al. | Mining scale-free networks using geodesic clustering | |
Shiokawa et al. | Scan++ efficient algorithm for finding clusters, hubs and outliers on large-scale graphs | |
Li et al. | A multi-agent genetic algorithm for community detection in complex networks | |
Xing et al. | A node influence based label propagation algorithm for community detection in networks | |
Lai et al. | Enhanced modularity-based community detection by random walk network preprocessing | |
He et al. | A novel top-k strategy for influence maximization in complex networks with community structure | |
Shi et al. | A genetic algorithm for detecting communities in large-scale complex networks | |
Ma et al. | Decomposition-based multiobjective evolutionary algorithm for community detection in dynamic social networks | |
Burton | The Pachner graph and the simplification of 3-sphere triangulations | |
Froese et al. | The border k-means clustering algorithm for one dimensional data | |
CN103020163A (en) | Node-similarity-based network community division method in network | |
Xu et al. | Finding overlapping community from social networks based on community forest model | |
CN108809697A (en) | Social networks key node recognition methods based on maximizing influence and system | |
Hu et al. | A new algorithm CNM-Centrality of detecting communities based on node centrality | |
Kong et al. | An improved label propagation algorithm based on node intimacy for community detection in networks | |
Yousuf et al. | Guided sampling for large graphs | |
CN103559318B (en) | The method that the object containing heterogeneous information network packet is ranked up | |
CN104580518A (en) | Load balance control method used for storage system | |
Shen et al. | A novel node gravitation-based label propagation algorithm for community detection | |
CN106126681A (en) | A kind of increment type stream data clustering method and system | |
Yao et al. | Community detection based on variable vertex influence | |
Banati et al. | Modeling evolutionary group search optimization approach for community detection in social networks | |
Sheng et al. | Node trust: an effective method to detect non-overlapping community in social networks | |
CN110223125B (en) | User position obtaining method under node position kernel-edge profit algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190920 |