CN110263831A - A kind of local high-order figure clustering method based on difference privacy - Google Patents

A kind of local high-order figure clustering method based on difference privacy Download PDF

Info

Publication number
CN110263831A
CN110263831A CN201910490628.7A CN201910490628A CN110263831A CN 110263831 A CN110263831 A CN 110263831A CN 201910490628 A CN201910490628 A CN 201910490628A CN 110263831 A CN110263831 A CN 110263831A
Authority
CN
China
Prior art keywords
node
digraph
vector
thermonuclear
motif
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910490628.7A
Other languages
Chinese (zh)
Inventor
李蜀瑜
边锦
曹菡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201910490628.7A priority Critical patent/CN110263831A/en
Publication of CN110263831A publication Critical patent/CN110263831A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The invention discloses a kind of local high-order figure clustering method based on difference privacy; the specific steps of this method are as follows: utilize the structure of local high-order network subgraph Motif; original social networks is converted to the adjacency matrix based on Motif; diversity based on Motif network subgraph structure; certain threshold value is arranged to the weight in the Motif adjacency matrix of generation; and Laplacian noise disturbance is carried out to the weight in threshold range, realize the secret protection to social networks subgraph;In order to promote the efficiency of Random Walk Algorithm operation, using approximate thermonuclear page rank seed algorithm, random walk is carried out to the Motif matrix after disturbance, according to the cutting ratio of the thermonuclear vector computation partition set after migration, and exports cluster set.

Description

A kind of local high-order figure clustering method based on difference privacy
Technical field
The invention belongs to technical field of data security, in particular to a kind of local high-order figure cluster side based on difference privacy Method.
Background technique
With the rise of the social medias such as blog, microblogging, using user as node, fast as the social networks on side using customer relationship Surge length.The relationships such as interest, behavior, the function of user make in social networks that there are multiple communities or clusters.Its efficient cluster side Method makes the intimate degree for understanding oneself place community that user is more efficient, but during cluster, there are infringement privacy of user Risk.On the one hand, user worries to include too many content in cluster process, can reveal the privacy information of oneself;On the other hand, According to network subgraph feature, there are many uncertain factors when to social user progress community's division, limit the energy of its technological improvement Power;Therefore, processing private data usually requires to consider the balance between availability of data and secret protection.
In entire figure cluster process, important component of the social networks as diagram data is widely used in various In clustering method, cluster passes through the low-level features such as aeoplotropism, the architectural characteristic of high-order subgraph of network, the dense pass of node link The advanced features such as system, nodal function characteristic, then carry out poly- division to network node, calculate the cutting ratio of same subset vector (Cheeger ratio) carries out optimal cutling eventually by the Cheeger ratio of dividing subset;But it is currently based on high-order The clustering of subgraph does not carry out secret protection to data, and then may result in the leakage of network data, not can guarantee The safety of user data.
Summary of the invention
It is a kind of hidden based on difference it is an object of the invention to propose for above-mentioned Privacy Protection of the existing technology Private local high-order figure clustering method, the present invention are devised a kind of hidden using the architectural characteristic and difference privacy of social networks subgraph Private protection model, is based on thermonuclear page rank random walk principle, by limiting the step number of random walk, solves complicated social activity The clustering problem of network part high-order figure;In specific improve, based on the structure of social networks subgraph, first by primitive society's net Network is converted into the Motif weight matrix based on network subgraph, then by the Motif weight matrix of building, to determine to Motif The secret protection dynamics of weight matrix;Finally using approximate thermonuclear page rank seed algorithm to the Motif weight square after disturbance Battle array carries out random walk, so that social networks under the premise of secret protection, can still there is good cluster result.
Technical principle of the invention: using the structure of local high-order network subgraph Motif, original social networks is converted to Adjacency matrix based on Motif, based on the diversity of Motif network subgraph structure, to the power in the Motif adjacency matrix of generation The certain threshold value of weight (number of Motif is generated between node) setting, and Laplce is used to the weight in threshold range Mechanism carries out noise disturbance, realizes the secret protection to social networks subgraph;In order to promote the efficiency of Random Walk Algorithm operation, Using approximate thermonuclear page rank seed algorithm, random walk is carried out to the Motif matrix after disturbance, according to the thermonuclear after migration The cutting ratio of vector computation partition set, and export cluster set.
In order to achieve the above object, the present invention is resolved using following technical scheme.
A kind of local high-order figure clustering method based on difference privacy, comprising the following steps:
Step 1, social network data collection is obtained, i.e. digraph G (V, E) chooses the M in triangle Motif model7Connection knot Structure, the high-order network subgraph Motif structure as digraph G (V, E);Construct Motif adjacency matrix, weight matrix WM;Using Difference privacy algorithm interferes Motif structure number in the Motif weight matrix of digraph, obtains with secret protection Digraph Gλ′。
Wherein, V is node collection, and E is side collection.
Sub-step 1.1 constructs Motif weight matrix WM, specifically: it calculates in digraph G (V, E) from node viTo section Point vjBetween all paths in the quantity of Motif structure that generates, and as between corresponding node in adjacency matrix Weight w generates Motif weight matrix WM
Sub-step 1.2, using difference privacy algorithm, to the Motif weight matrix W of digraphMIn Motif number carry out Interference, the weight matrix W after being disturbedM', and then obtain the digraph G with secret protectionλ′。
Step 2, using approximate thermonuclear page rank seed node algorithm to the digraph G with secret protectionλ' carry out base In the random walk of the thermonuclear page, the approximate thermonuclear vector of node is obtained
Sub-step 2.1, by the digraph G with secret protectionλ' random chance conversion is carried out, obtain random chance conversion Matrix P, calculation formula are as follows:
Wherein, vi~vjIndicate node viTo node vjBetween there are chaining edges.
Sub-step 2.2, sets the diffusion parameter of random walk as t, error parameter δ, δ ∈ (0,1);
Selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on thermonuclear parallel The random walk of page rank, the approximate thermonuclear vector of calculate node
Wherein, u ∈ V, t are non-negative real number,Expression migration step number is klWhen, the instruction vector of seed node, dimension Degree is 1 × n;HtIndicate the digraph G with secret protectionλ' thermonuclear page rank, klIndicate random walk step number, and kl≤ K, klFor positive integer;The value of constant c is 1;Vol (G) indicates the volume of digraph G, i.e. digraph The total number of Motif structure in G;Indicate seed node urDegree, r indicates the corresponding seed node of a migration step number Serial number, rmaxIndicate the total number of the corresponding seed node of a migration step number,N is node collection V interior joint Total number, and rmax≤n。
Step 3, using Local Clustering algorithm to each approximate thermonuclear vectorIt is cut, obtains approximate thermonuclear vector Local Clustering to get arrive digraph G (V, E) Local Clustering set, complete digraph G (V, E) Local Clustering.
Sub-step 3.1, to each approximate thermonuclear vectorBe standardized operation, the approximate thermonuclear after being standardized to AmountSpecifically:
First to each approximate thermonuclear vectorIn each element divided by rmax, obtain vectorAgain by vectorIn each element divided by the degree of its corresponding node, obtain vectorFinally, by vectorIn Element carry out descending arrangement, the approximate thermonuclear vector after being standardized
Sub-step 3.2, setting target cutting is than being φ, to the approximate thermonuclear vector after each standardizationCarry out sweep Scanning, finds out corresponding target collection Sj, complete thermonuclear vectorCutting, obtain the Local Clustering collection of digraph G (V, E) It closes, completes the Local Clustering of digraph G (V, E);The specific steps are that:
Firstly, choosing the approximate thermonuclear vector after each standardizationThe 1st element be the first candidate collection;Judge Whether one candidate collection meets Local Clustering condition, if the first candidate collection meets Local Clustering condition, judges the first candidate Collection is combined into target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 2nd element be added the first candidate collection formed Second candidate collection;Judge whether the second candidate collection meets cluster condition, if the second candidate collection meets Local Clustering condition, Then judge the second candidate collection for target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 3rd element be added Second candidate collection forms third candidate collection;And so on, until finding target collection S1
Secondly, to the approximate thermonuclear vector after standardizationIn be not included in target collection S1In element continue Scanning, the approximate thermonuclear vector after standardizationAll elements scan complete, obtain target collection S1, S2..., Sj..., SJ
The Local Clustering condition are as follows:
Wherein,Vol (G) indicates the total number of Motif structure in digraph G;φ(Sj) indicate target Set SjPartial cut ratio;cut(Sj) indicate at least one terminal node in SjIn, another terminal node existsIn The number of Motif structure, vol (Sj) indicate target collection SjVolume, i.e. target collection SjIn node in digraph G shape At Motif structure total number;Indicate approximate thermonuclear vectorMiddle target collection SjSupplementary set;Indicate target Collect complement of a setIn the total number of Motif structure that is formed in digraph G of node.
Compared with prior art, it the invention has the benefit that the present invention is directed to local high-order network subgraph, fully considers To the important feature of the local sub- community of complicated social networks, and according to these significance levels, privacy is carried out to high-order network subgraph Protection;Furthermore, it is contemplated that approximate PageRank algorithm is the geometric process of random walk on the diagram, and thermonuclear page-ranking be with The index of machine migration and, improve the operational efficiency of algorithm;By above-mentioned secret protection step, so that social networks is protected in privacy Under the premise of shield, the Clustering Effect of local high-order still can be preferably realized.
Detailed description of the invention
The present invention is described in further details in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of flow chart of local high-order figure clustering method based on difference privacy of the invention;
Fig. 2 is the M in intermediate cam Motif model of the present invention7Attachment structure schematic diagram;
Fig. 3 is the average L1 error result schematic diagram being calculated under different migration step numbers;
Fig. 4 is that average L1 error result of the netscience data set after noise processed shows under different migration step numbers It is intended to;
Fig. 5 is crosspoint difference results schematic diagram of three kinds of data sets under different migration step numbers;
Fig. 6 is cond-mat-2003 data set crosspoint difference results after noise processed under different migration step numbers Schematic diagram;
Specific embodiment
The embodiment of the present invention and effect are described in further detail with reference to the accompanying drawing.
Referring to Fig.1, the present invention realizes according to the following steps:
Step 1, social network data collection is obtained, i.e. digraph G (V, E) chooses the M in triangle Motif model7Connection knot Structure (as shown in Figure 2), the high-order network subgraph Motif structure as digraph G (V, E);Construct Motif weight matrix WM, adopt With difference privacy algorithm, Motif structure number in the Motif weight matrix of digraph is interfered, the power after being disturbed Weight matrix WM′;
Wherein, V is node collection, and E is side collection.
Sub-step 1.1 constructs Motif weight matrix WM, specifically: it calculates in digraph G (V, E) from node viTo section Point vjBetween all paths in the quantity of Motif structure that generates, and as between corresponding node in adjacency matrix Weight w generates Motif weight matrix WM
Sub-step 1.2, using difference privacy algorithm, to the Motif weight matrix W of digraphMIn Motif number carry out Interference, obtains the digraph G with secret protectionλ', the specific steps are that:
Sub-step 1.2.1 constructs new digraph Gλ, make new digraph GλThe upper limit of middle Motif structure number is λ;
Firstly, calculating separately and each node viThe Motif structure number Tri of connectioni(G)=wi
Then, for vi∈ V, judgement and each node viThe Motif structure number w of connectioniWhether upper limit λ is greater than, if wi >=λ, then delete and viMotif structure number is greater than the side of λ in the node of connection, otherwise, retains node viBetween corresponding node Side;To obtain new digraph Gλ
Sub-step 1.2.2, to new digraph GλIn Motif structure number addition Laplacian noise interference, obtain Weight matrix W after disturbanceM
Wherein, Lap () indicates that Laplce's mechanism probability density function, ε are the privacy budget of Motif distribution;
And then obtain the digraph G with secret protectionλ′。
Step 2, using approximate thermonuclear page rank seed node algorithm to the digraph G with secret protectionλ' carry out base In the random walk of the thermonuclear page, the corresponding thermonuclear vector ρ of seed node is obtained;
Sub-step 2.1, by the digraph G with secret protectionλ' random chance conversion is carried out, obtain random chance conversion Matrix P, calculation formula are as follows:
Wherein, vi~vjIndicate node viTo node vjBetween there are chaining edges.
Sub-step 2.2, sets the diffusion parameter of random walk as t, error parameter δ, δ ∈ (0,1);
Selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on thermonuclear parallel The random walk of page rank, the approximate thermonuclear vector of calculate node
Wherein, u ∈ V, t are non-negative real number,Expression migration step number is klWhen, the instruction vector of seed node, dimension Degree is 1 × n;HtIndicate the digraph G with secret protectionλ' thermonuclear page rank, klIndicate random walk step number, and kl≤ K, klFor positive integer;The value of constant c is 1;Vol (G) indicates the volume of digraph G, i.e. digraph The total number of Motif structure in G;Indicate seed node urDegree, r indicates the corresponding seed node of a migration step number Serial number, rmaxIndicate the total number of the corresponding seed node of a migration step number,N is node collection V interior joint Total number, and rmax≤n。
The selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on parallel The random walk of thermonuclear page rank, the specific steps are that:
Firstly, calculating the digraph G with secret protectionλ' in each node degree
Secondly, by the degree of node by the digraph G with secret protectionλ' all nodes it is descending be ranked up,
Finally, setting the step number of the 1st parallel random walk as k1, r is successively chosen according to the degree of node is descendingmax A seed node, one seed node of every selection, using the seed node as starting point, in the digraph G with secret protectionλ′ It is upper to carry out parallel random walk;Obtain corresponding approximate thermonuclear vector
The step number of the 2nd parallel random walk is set as k2=k1+ Δ k successively chooses according to the degree of node is descending rmaxA seed node, one seed node of every selection, using the seed node as starting point, in the digraph with secret protection Gλ' above carry out parallel random walk;Obtain corresponding approximate thermonuclear vector
And so on, until the parallel random walk of kth, obtains corresponding approximate thermonuclear vector
Wherein, Δ k is the migration step number increment when the previous parallel more last parallel random walk of random walk, and Δ k is Positive integer, general value are (0.01-0.1) n.
It is described in the digraph G for having secret protectionλIt is upper to carry out parallel random walk, specifically: with secret protection Digraph GλOn, using seed node u as starting point, the parallel neighbours along the side migration being connected with seed node to seed node are saved Point (another endpoint on side) completes the parallel random walk of a step;Again using neighbor node as starting point, parallel edge and each neighbor node phase Even while migration to while another endpoint, and so on, until the step number of migration reaches the migration step number k of settingl, that is, complete The parallel random walk of one seed node.
Step 3, using Local Clustering algorithm to each approximate thermonuclear vectorIt is cut, obtains approximate thermonuclear vector Local Clustering to get arrive digraph G (V, E) Local Clustering set, complete digraph G (V, E) Local Clustering.
Sub-step 3.1, to each approximate thermonuclear vectorBe standardized operation, the approximate thermonuclear after being standardized to Amount
Sub-step 3.2, setting target cutting is than being φ, to the approximate thermonuclear vector after each standardizationCarry out sweep Scanning, finds out corresponding target collection Sj, complete thermonuclear vectorCutting, obtain the Local Clustering collection of digraph G (V, E) It closes, completes the Local Clustering of digraph G (V, E);The specific steps are that:
Firstly, choosing the approximate thermonuclear vector after each standardizationThe 1st element be the first candidate collection;Judge Whether one candidate collection meets Local Clustering condition, if the first candidate collection meets Local Clustering condition, judges the first candidate Collection is combined into target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 2nd element be added the first candidate collection formed Second candidate collection;Judge whether the second candidate collection meets cluster condition, if the second candidate collection meets Local Clustering condition, Then judge the second candidate collection for target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 3rd element be added Second candidate collection forms third candidate collection;And so on, until finding target collection S1
Secondly, to the approximate thermonuclear vector after standardizationIn be not included in target collection S1In element continue into Row scanning, the approximate thermonuclear vector after standardizationAll elements scan complete, obtain target collection S1, S2..., Sj..., SJ
The standardized specific steps are as follows: first to each approximate thermonuclear vectorIn each element divided by rmax, obtain VectorAgain by vectorIn each element divided by the degree of its corresponding node, obtain vectorMost Afterwards, by vectorIn element carry out descending arrangement, the approximate thermonuclear vector after being standardized
The Local Clustering condition are as follows:
Wherein,Vol (G) indicates the total number of motif structure in digraph G;φ(Sj) indicate object set Close SjPartial cut ratio;cut(Sj) indicate at least one terminal node in SjIn, another terminal node existsIn The number of motif structure, vol (Sj) indicate target collection SjVolume, i.e. target collection SjIn node in digraph G shape At Motif structure total number;Indicate approximate thermonuclear vectorMiddle target collection SjSupplementary set;Indicate target Collect complement of a setIn the total number of Motif structure that is formed in digraph G of node.
In the present invention, partial cut ratio φ (Sj) it is also referred to as the main body conductance of target collection, theme conductance is smaller, indicates poly- The effect of class is more excellent, can evaluate clustering result quality with this index.
Emulation experiment
1, simulated conditions:
Using TensorFlow1.4.0 CPU version, Python2.7, Windows Server 2008R2 64 bit manipulation system version of Enterprise is as exploitation environment, using netscience, polblogs, cond-mat-2003 Social network data collection.
In each emulation experiment, accurate thermonuclear vector ρ is calculated with traditional thermonuclear PageRank algorithmT, u, and using this Inventive method calculates approximate thermonuclear vector, with mean absolute error, crosspoint difference the two error criterions, measures accurate thermal The similarity of core vector sum approximation thermonuclear vector illustrates random step number corresponding to approximate thermonuclear vector when error amount very little It is optimal, also it is closest to the link structure of original social networks.
Compare with traditional accurate thermonuclear vector and approximate thermonuclear vector node sequencing obtained of the invention, specifically For, consider the upper limit of limitation random walk step number, compares the variation of its accuracy.
(1) mean absolute error: with average L1 error amount (Average L1Error) to accurate after random walk Thermonuclear vector ρT, uWith approximate thermonuclear vector valueError be compared.Average L1The calculation formula of error are as follows:
Wherein, ρT, u(vi) indicate corresponding node v in accurate thermonuclear vectoriValue,It is close after indicating standardization Like node v corresponding in thermonuclear vectoriValue.
(2) crosspoint difference: the similarity to sort with crosspoint difference measuring node;I.e. respectively give node approximation to Amount sequence A and accurate vector order B, the length of each sequence are i, then the sequence of node vector approximation and accurate vector order Difference are as follows:
Wherein, dist (A, B) indicates node vector approximation sequence AiWith accurate vector order BiDifference,Indicate exclusive or Operation, | | expression takes absolute value.
2, the simulation experiment result one
Three kinds of social network datas collection netscience, polblogs, cond-mat-2003 are chosen, these three societies are compared Hand over Network data set under different random migration step number, using accurate thermonuclear vector and approximate its knot vector of thermonuclear vector measurement Absolute error.From the figure 3, it may be seen that different random walk steps generates different vector absolute errors.Generally, three societies The absolute error tendency gap for handing over Network data set is not too greatly, for more sparse social network data collection netscience For, vector absolute error gap under different migration step numbers is more obvious.For polblogs and cond-mat-2003 two For a large data collection error comparison, the connection of data set node is relatively intensive, and data set number of nodes is larger, in different migration Under step number K, absolute error difference is smaller.
3, the simulation experiment result two
Sparse social network data collection netscience is chosen, for it under the step number for not having to random walk, to generation Motif adjacency matrix distribute different privacy budget ε, respectively 0.1,0.25,0.5,0.75, compare in different random migration Under step number, the data influence of mean absolute error, the origin in Fig. 4 indicates no noise added disturbance corresponding to specific step number Original mean absolute error value.As shown in Figure 4, different privacy budgets is distributed on Motif adjacency matrix, privacy budget is got over Small, the noise of generation is bigger, and mean absolute error differs larger with the original average error value made an uproar that is not added;Privacy budget is bigger, The noise of generation is smaller, and mean absolute error and the original average error value made an uproar of being not added are closer, meet difference privacy add make an uproar it is dry The condition disturbed.
4, the simulation experiment result three
Three kinds of social network datas collection netscience, polblogs, cond-mat-2003 are chosen, these three societies are compared Hand over Network data set under different random migration step number, using accurate thermonuclear vector and approximate its knot vector of thermonuclear vector measurement The difference of sequence.As shown in Figure 5, different random walk step numbers generates different vector order difference.For more sparse For social network data collection netscience, increase of the vector order difference with random walk step number, sequence difference decline It is more obvious.For other two relatively intensive social network data collection, vector order difference is walked with random walk Several increases, sequence difference decline are relatively gentle.
5, the simulation experiment result four
Relatively dense social network data collection cond-mat-2003 is chosen, for it under the step number for not having to random walk, Distribute different privacy budget ε to the Motif adjacency matrix of generation, respectively 0.1,0.25,0.5,0.75, compare difference with Under machine migration step number, the data value of vector order difference influences.Origin in Fig. 6 indicates not to be added corresponding to specific step number The original crosspoint difference value of noise disturbance.It will be appreciated from fig. 6 that distributing different privacy budgets, privacy on Motif adjacency matrix Budget is smaller, and the noise of generation is bigger, and vector order difference differs larger with the original difference value made an uproar that is not added, and privacy budget is got over Greatly, the noise of generation is smaller, and vector order difference and the original vector order difference value made an uproar of being not added are closer, meets difference privacy Add the condition for interference of making an uproar.
By being tested above it is found that local high-order figure clustering method of the method for the present invention based on difference privacy, random in limitation Under conditions of migration step number, when not protected with difference privacy technology, approximate thermonuclear vector and accurate thermonuclear vector it Between mean error, vector order difference value minimum when, illustrate that the ordered state of approximate thermonuclear vector is optimal, divide the poly- of cutting Class effect accuracy rate is higher.In the present invention, when with difference privacy technology to Motif weight matrix with Laplce's mechanism into It row plus makes an uproar when interfering, chooses suitable privacy budget and random walk step number, not only can protect complicated social networks part high-order The privacy information of subgraph it is also possible that the thermonuclear vector information that node random walk generates is not leaked, while being realized to multiple The Local Clustering of miscellaneous social networks.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light The various media that can store program code such as disk.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of local high-order figure clustering method based on difference privacy, which comprises the following steps:
Step 1, social network data collection is obtained, i.e. digraph G (V, E) chooses the M in triangle Motif model7Connection structure is made For the high-order network subgraph Motif structure of digraph G (V, E);Construct Motif adjacency matrix, weight matrix WM;It is hidden using difference Private algorithm interferes Motif structure number in the Motif weight matrix of digraph, obtains having the oriented of secret protection Scheme Gλ';
Wherein, V is node collection, and E is side collection;
Step 2, using approximate thermonuclear page rank seed node algorithm to the digraph G with secret protectionλ' carry out based on heat The random walk of the core page obtains the approximate thermonuclear vector of node
Step 3, using Local Clustering algorithm to each approximate thermonuclear vectorIt is cut, obtains approximate thermonuclear vectorOffice Portion's cluster completes the Local Clustering of digraph G (V, E) to get the Local Clustering set for arriving digraph G (V, E).
2. a kind of local high-order figure clustering method based on difference privacy according to claim 1, which is characterized in that step 1 includes following sub-step:
Sub-step 1.1 constructs Motif weight matrix WM, specifically: it calculates in digraph G (V, E) from node viTo node vj Between all paths in the quantity of Motif structure that generates, and as the weight between corresponding node in adjacency matrix W generates Motif weight matrix WM
Sub-step 1.2, using difference privacy algorithm, to the Motif weight matrix W of digraphMIn Motif number interfered, Weight matrix W after being disturbedM' to get to secret protection digraph Gλ'。
3. a kind of local high-order figure clustering method based on difference privacy according to claim 2, which is characterized in that sub-step Rapid 1.2 include following sub-step:
Sub-step 1.2.1 constructs new digraph Gλ, make new digraph GλThe upper limit of middle Motif structure number is λ;It is specific Step are as follows:
Firstly, calculating separately and each node viThe Motif structure number Tri of connectioni(G)=wi
Then, for vi∈ V, judgement and each node viThe Motif structure number w of connectioniWhether upper limit λ is greater than, if wi>=λ, Then deletion and viMotif structure number is greater than the side of λ in the node of connection, otherwise, retains node viBetween corresponding node Side;To obtain new digraph Gλ
Sub-step 1.2.2, to new digraph GλIn Motif structure number addition Laplacian noise interference, after obtaining disturbance Weight matrix WM';
Wherein, Lap () indicates that Laplce's mechanism probability density function, ε are the privacy budget of Motif distribution;
And then obtain the digraph G with secret protectionλ'。
4. a kind of local high-order figure clustering method based on difference privacy according to claim 1, which is characterized in that step 2 include following sub-step:
Sub-step 2.1, by the digraph G with secret protectionλ' random chance conversion is carried out, obtain random chance transition matrix P:
Wherein, vi~vjIndicate node viTo node vjBetween there are chaining edges;
Sub-step 2.2, sets the diffusion parameter of random walk as t, error parameter δ;Selected seed node, is opened from seed node Begin, to the digraph G with secret protectionλ' the random walk based on thermonuclear page rank, the approximation of calculate node are initiated parallel Thermonuclear vector
Wherein, (0,1) δ ∈, u ∈ V, t are non-negative real number,Expression migration step number is klWhen, the instruction vector of seed node, Its dimension is 1 × n;HtIndicate the digraph G with secret protectionλ' thermonuclear page rank, klIndicate random walk step number, and kl≤ K, klFor positive integer;The value of constant c is 1;Vol (G) indicates the volume of digraph G, that is, has The total number of Motif structure into figure G;Indicate seed node urDegree, r indicates the corresponding seed node of a migration step number Serial number, rmaxIndicate the total number of the corresponding seed node of a migration step number,N is to save in node collection V The total number of point, and rmax≤n。
5. a kind of local high-order figure clustering method based on difference privacy according to claim 4, which is characterized in that described Selected seed node, since seed node, to the digraph G with secret protectionλ' initiate to be based on thermonuclear page rank parallel Random walk, the specific steps are that:
Firstly, calculating the digraph G with secret protectionλ' in each node degree
Secondly, by the degree of node by the digraph G with secret protectionλ' all nodes descending be ranked up;
Finally, setting the step number of the 1st parallel random walk as k1, r is successively chosen according to the degree of node is descendingmaxA seed Node, one seed node of every selection, using the seed node as starting point, in the digraph G with secret protectionλ' above carry out Parallel random walk;Obtain corresponding approximate thermonuclear vector
The step number of the 2nd parallel random walk is set as k2=k1+ △ k successively chooses r according to the degree of node is descendingmaxIt is a Seed node, one seed node of every selection, using the seed node as starting point, in the digraph G with secret protectionλ' on Carry out parallel random walk;Obtain corresponding approximate thermonuclear vectorAnd so on, until the parallel random walk of kth, obtains To corresponding approximate thermonuclear vector
Wherein, △ k is the migration step number increment when the previous parallel more last parallel random walk of random walk, and △ k is positive whole Number.
6. a kind of local high-order figure clustering method based on difference privacy according to claim 5, which is characterized in that described In the digraph G for having secret protectionλIt is upper to carry out parallel random walk, specifically:
In the digraph G with secret protectionλOn, using seed node as starting point, the side migration that parallel edge is connected with seed node is extremely The neighbor node of seed node completes the parallel random walk of a step;Again using neighbor node as starting point, parallel edge and each neighbor node It is connected while migration to while another endpoint, and so on, until the step number of migration reaches the migration step number k of settingl, i.e., complete At the parallel random walk of a seed node.
7. a kind of local high-order figure clustering method based on difference privacy according to claim 1, which is characterized in that step 3 include following sub-step:
Sub-step 3.1, to each approximate thermonuclear vectorIt is standardized, the approximate thermonuclear vector after being standardized
Sub-step 3.2, setting target cutting is than being φ, to the approximate thermonuclear vector after each standardizationSweep scanning is carried out, Find out corresponding target collection Sj, complete thermonuclear vectorCutting, obtain the Local Clustering set of digraph G (V, E), complete The Local Clustering of digraph G (V, E).
8. a kind of local high-order figure clustering method based on difference privacy according to claim 7, which is characterized in that sub-step In rapid 3.1, the standardized specific steps are as follows: first to each approximate thermonuclear vectorIn each element divided by rmax, obtain VectorAgain by vectorIn each element divided by the degree of its corresponding node, obtain vectorMost Afterwards, by vectorIn element carry out descending arrangement, the approximate thermonuclear vector after being standardized
9. a kind of local high-order figure clustering method based on difference privacy according to claim 8, which is characterized in that described To the approximate thermonuclear vector after each standardizationSweep scanning is carried out, corresponding target collection S is found outj, specific steps Are as follows:
Firstly, choosing the approximate thermonuclear vector after each standardizationThe 1st element be the first candidate collection;Judge the first time Whether selected works conjunction meets Local Clustering condition, if the first candidate collection meets Local Clustering condition, judges the first candidate collection For target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 2nd element the first candidate collection be added form the Two candidate collections;Judge whether the second candidate collection meets cluster condition, if the second candidate collection meets Local Clustering condition, Judge the second candidate collection for target collection;Otherwise, by the approximate thermonuclear vector after standardizationThe 3rd element be added the Two candidate collections form third candidate collection;And so on, until finding target collection S1
Secondly, to the approximate thermonuclear vector after standardizationIn be not included in target collection S1In element continue to sweep It retouches, the approximate thermonuclear vector after standardizationAll elements scan complete, obtain target collection S1, S2..., Sj..., SJ
10. a kind of local high-order figure clustering method based on difference privacy according to claim 9, which is characterized in that institute State Local Clustering condition are as follows:
Wherein,Vol (G) indicates the total number of Motif structure in digraph G;φ(Sj) indicate target collection Sj Partial cut ratio;cut(Sj) indicate at least one terminal node in SjIn, another terminal node existsIn Motif The number of structure, vol (Sj) indicate target collection SjVolume, i.e. target collection SjIn node formed in digraph G The total number of Motif structure;Indicate approximate thermonuclear vectorMiddle target collection SjSupplementary set;Indicate target collection Supplementary setIn the total number of Motif structure that is formed in digraph G of node.
CN201910490628.7A 2019-06-06 2019-06-06 A kind of local high-order figure clustering method based on difference privacy Pending CN110263831A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910490628.7A CN110263831A (en) 2019-06-06 2019-06-06 A kind of local high-order figure clustering method based on difference privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910490628.7A CN110263831A (en) 2019-06-06 2019-06-06 A kind of local high-order figure clustering method based on difference privacy

Publications (1)

Publication Number Publication Date
CN110263831A true CN110263831A (en) 2019-09-20

Family

ID=67917140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910490628.7A Pending CN110263831A (en) 2019-06-06 2019-06-06 A kind of local high-order figure clustering method based on difference privacy

Country Status (1)

Country Link
CN (1) CN110263831A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723399A (en) * 2020-06-15 2020-09-29 内蒙古科技大学 Large-scale social network directed graph privacy protection method based on k-kernel
CN112199728A (en) * 2020-11-04 2021-01-08 同济大学 Privacy protection method for social network relationship prediction
CN113095490A (en) * 2021-06-07 2021-07-09 华中科技大学 Graph neural network construction method and system based on differential privacy aggregation
CN114118407A (en) * 2021-10-29 2022-03-01 华北电力大学 Deep learning-oriented differential privacy usability measurement method
CN117436130A (en) * 2023-12-19 2024-01-23 暨南大学 Differential privacy-based directed graph data security release method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723399A (en) * 2020-06-15 2020-09-29 内蒙古科技大学 Large-scale social network directed graph privacy protection method based on k-kernel
CN111723399B (en) * 2020-06-15 2023-08-29 内蒙古科技大学 Large-scale social network directed graph privacy protection method based on k-kernel
CN112199728A (en) * 2020-11-04 2021-01-08 同济大学 Privacy protection method for social network relationship prediction
CN112199728B (en) * 2020-11-04 2022-07-19 同济大学 Privacy protection method for social network relationship prediction
CN113095490A (en) * 2021-06-07 2021-07-09 华中科技大学 Graph neural network construction method and system based on differential privacy aggregation
CN114118407A (en) * 2021-10-29 2022-03-01 华北电力大学 Deep learning-oriented differential privacy usability measurement method
CN114118407B (en) * 2021-10-29 2023-10-24 华北电力大学 Differential privacy availability measurement method for deep learning
CN117436130A (en) * 2023-12-19 2024-01-23 暨南大学 Differential privacy-based directed graph data security release method
CN117436130B (en) * 2023-12-19 2024-04-02 暨南大学 Differential privacy-based directed graph data security release method

Similar Documents

Publication Publication Date Title
CN110263831A (en) A kind of local high-order figure clustering method based on difference privacy
Wu et al. Mining scale-free networks using geodesic clustering
Shiokawa et al. Scan++ efficient algorithm for finding clusters, hubs and outliers on large-scale graphs
Li et al. A multi-agent genetic algorithm for community detection in complex networks
Xing et al. A node influence based label propagation algorithm for community detection in networks
Lai et al. Enhanced modularity-based community detection by random walk network preprocessing
He et al. A novel top-k strategy for influence maximization in complex networks with community structure
Shi et al. A genetic algorithm for detecting communities in large-scale complex networks
Ma et al. Decomposition-based multiobjective evolutionary algorithm for community detection in dynamic social networks
Burton The Pachner graph and the simplification of 3-sphere triangulations
Froese et al. The border k-means clustering algorithm for one dimensional data
CN103020163A (en) Node-similarity-based network community division method in network
Xu et al. Finding overlapping community from social networks based on community forest model
CN108809697A (en) Social networks key node recognition methods based on maximizing influence and system
Hu et al. A new algorithm CNM-Centrality of detecting communities based on node centrality
Kong et al. An improved label propagation algorithm based on node intimacy for community detection in networks
Yousuf et al. Guided sampling for large graphs
CN103559318B (en) The method that the object containing heterogeneous information network packet is ranked up
CN104580518A (en) Load balance control method used for storage system
Shen et al. A novel node gravitation-based label propagation algorithm for community detection
CN106126681A (en) A kind of increment type stream data clustering method and system
Yao et al. Community detection based on variable vertex influence
Banati et al. Modeling evolutionary group search optimization approach for community detection in social networks
Sheng et al. Node trust: an effective method to detect non-overlapping community in social networks
CN110223125B (en) User position obtaining method under node position kernel-edge profit algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190920