CN109299615A - A kind of difference privacy processing dissemination method towards social network data - Google Patents

A kind of difference privacy processing dissemination method towards social network data Download PDF

Info

Publication number
CN109299615A
CN109299615A CN201810705888.7A CN201810705888A CN109299615A CN 109299615 A CN109299615 A CN 109299615A CN 201810705888 A CN201810705888 A CN 201810705888A CN 109299615 A CN109299615 A CN 109299615A
Authority
CN
China
Prior art keywords
community
matrix
node
label
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810705888.7A
Other languages
Chinese (zh)
Other versions
CN109299615B (en
Inventor
黄海平
汤雄
张东军
张伟
张大成
戴华
徐宁
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Publication of CN109299615A publication Critical patent/CN109299615A/en
Application granted granted Critical
Publication of CN109299615B publication Critical patent/CN109299615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The invention discloses a kind of, and the difference privacy towards social network data handles dissemination method.This method is to carry out structure tag recognition to social network, generation makes the node label of community's node rendezvous when for the processing of the adjacency matrix of social network diagram in such a way that fast community detection difference adds and makes an uproar;Matrix close quarters are determined using the adaptive approach of Dynamic data exchange and binary tree structure to the upper triangle adjacency matrix of generation simultaneously;Band finally is rebuild using matrix disposal mode make an uproar adjacency matrix and carry out network publication.Present invention introduces the concepts of community's grouping; preferable data instrumentality is also ensured that while protecting social network data privacy; data-handling efficiency can be effectively improved using the mode of upper triangular matrix subregion density reconstruction, the optimal secret protection degree for adding the side method of salary distribution of making an uproar to also ensure scheme is designed for different densities.

Description

A kind of difference privacy processing dissemination method towards social network data
Technical field
The present invention relates to the technical fields of the difference privacy processing of social network data publication, and in particular to one kind is towards society The privacy based on the aggregation of community's detection density of network data is handed over to handle dissemination method.
Background technique
Nowadays, with universal and associated mobile network technology the development of mobile terminal device, so that social networks And big data technology is increasingly deep into the daily life of ordinary user, but social networks of the user in social networks And the safety of individual privacy data is still the problem of meriting special attention.
Social networks can exist in the form of a graph structure, and there are the more central nodes of degree (with micro blog network In star for), central node will show that multiple summits (bean vermicelli concern) perhaps, produce the less node of degree, social network Multiple central nodes may be present in network figure, and can produce between each other relationship (such as A star and B star be man and wife or Lover), it is also possible to a node by some and puts another point (such as the director all recognized by one) of understanding.In certain Heart node and being closely connected for surroundings nodes composition often will form individual community network.It is individual in relational network to ensure Personal secrets, social networks publication before need to information carry out secret protection processing (media reporter is often from star's social network Relationship change in network is framed news, such as A star cancels@B star).Social networks publication secret protection, the purpose is to Guarantee to realize effective information sharing in the case that Social Individual sensitive information is not leaked, realization means use difference mostly Divide privacy technology.Currently, the difference privacy processing technique of social networks, usually: or drawn according to node degree direction Divide processing or is converted into the thinking of adjacency matrix processing.The former is the method based on cluster, and this method is due to having hidden son Scheme the attribute information and related information of internal Social Individual, therefore bring biggish data defect, is unfavorable for community network Partial structurtes analyzed, also affect the overall size of community network;The latter is the method based on network structure modification, should Class method modifies to the structure of social networks, has the social networks of publication and original social networks in structure certain Difference achievees the purpose that secret protection with this.Structure modification method is with respect to can keep the original of social networks for clustering method Scale, data defect is relatively small, can get relatively high data effectiveness, but secret protection ability relative drop.
The secret protection model for being mainly used for solving relational data publication more existing simultaneously, including K- are anonymous, and L- is more Sample, t-closness etc. can not be directly used in the privacy leakage solved the problems, such as in social networks, while these methods well Lack stringent challenge model, the attack based on background knowledge cannot be resisted.
In view of the above problems, it has also been discovered that the adjacency matrix structure of social network data, node rendezvous can often be drawn It is divided into and forms a team, and then the different communities of dense aggregate can be divided into again.For this characteristic, community's detection mode pair can be introduced Different difference privacy release processing methods is designed in network area, to keep higher while guaranteeing secret protection degree Availability of data.
Summary of the invention
Present invention aims in view of the above shortcomings of the prior art, needed based on network graph structure conversion publication difference privacy The model asked provides a kind of difference privacy of matrix perturbance based on community's dense aggregate that can satisfy privacy and function balance Method.Under the premise of realizing difference privacy measuring and calculating graph structure degree of relevancy, is converted by grouping and add that make an uproar can be preferable Handle original social networks privacy;The binary tree structure for introducing community's detection algorithm and upper triangular matrix simultaneously divides storage Mode effectively promotes privacy release processing efficiency and data-privacy degree of protection.
The technical solution adopted by the present invention includes the following steps:
Step 1 obtains a certain original social network diagram G to be released, and every relationship is all converted in adjacency matrix (0,1) Element form store, be mapped in away from shape two-dimensional surface matrix AoIt is initial to social network diagram to be released in region Node label be labeled as χ, record initial corresponding label with array n [k].
Step 2 introduces the tag recognition that fast community detection algorithm carries out dense aggregate to network G, makes connection mutual Close node becomes community certainly, and intercommunal connection is more sparse;Different label sequences can make the adjoining of network Matrix density assembles situation difference, and the privacy budget of this process distribution is ε1, node of graph label is labeled as χ ' after identification, then uses number Node of graph label χ ' after group N [k] record alignment processing.Concrete processing procedure is as described in substep:
2.1) network G=(V, E) corresponding adjacency matrix and the modularity Q of the network are calculated, each section is initialized Point is 1~n of single community's label, n=| V |;
2.2) fast community detection algorithm will select the community Liang Ge to merge into new community in pairs, and merge every time all necessary Enable Q value to be able to maximum increase (or minimum reduction), while recording community's addition sequence, this merging process will seek maximum The optimal dividing of Q value, and need to update the matrix element e of the community the Liang Ge ranks selected after the completion of each mergingij
2.3) initialization community's aggregation matrix clusters [3, n], for difference community SC any in community set SCi And SCjCombination, i.e. SCi≠SCjWhen calculate Δ Q=2 (eij-aiaj);
If 2.4)It finds out Δ Q and changes maximum community SCiAnd SCjIn it is all right Simultaneously community merging process matrix Z [n-1,3] is written in its node ID by the node answered in pairs;
2.5) after community merges new communities' label variation range for being formed be [| V |+1,2 | V | -1], repeat step 2.3) Final identical to clusters (community's label), i.e. corresponding the third line of all nodes of clusters matrix is identical, obtains at this time Final community merging process matrix Z [n-1,3];
2.6) to matrix Z [n-1,3] progress first two columns traversal, (first two columns of Z matrix is characterized in community's detection process The process that community's label merges), it can be found that the sequencing merged when community's detection of aggregation, starts the cycle over each from the first row Capable first two columns, when community label Z [i, j]≤| V | when, community's node label when as initial need to resequence and to mark Label renaming, the node of graph label χ ' after alignment processing is recorded with array N [k].
Further, modularity in the stepI.e. a social networks is broken down into m community, Then correspond to the symmetrical matrix e, matrix element e of a dimension of m mijIt indicates in the node and community j in community network in community i Node is connected shared ratio in while all, eiiIt is shared in while all for connections all inside community's i interior joint Ratio.For indicating to be connected with the node in community i ratio shared in while all.
Step 2.2) needs iteration to execute t=| V | -1 time, calculating Δ Q need to be belonged to each time to the community of present node by merging The variation of corresponding function q (Δ Q, SC), i.e. introducing index difference privacy calculateChange Change situation, Q value is enabled to be able to maximum increase (or minimum reduction).
Wherein, Clusters is 3 × n matrix in step 2.3), and the community of network divides: the 1st row is nodal scheme, the 2nd Row is the transformation of community's label, and the 3rd row is community's label, and the 3rd line label is identical at the end of division indicates all nodes same Community, detection terminates at this time;In label detection process, the every single-step iteration of node can be all routed in new community's label, Have recorded the conversion process of node-home community.
Matrix Z in step 2.5) is the matrix of (n-1) * 3, and the i-th row indicates that i-th node merges, the 1st column and 2nd column indicate that combined community's label, the 3rd column are the modularities after merging.Wherein community's label is initialized as node label, from 1 to maximum to n=| V |, the new communities' label range formed after the node of each step merges be [| V |+1,2 | V | -1], because | V | the merging of a node needs | V | -1 time.
Step 3, upper triangle adjacency matrix A corresponding to the good community network of label carry out the exploration and knowledge of density area Not, design difference privacy methods division region independent to data carries out noise disturbance, and the privacy budget of this process distribution is ε2 =ε-ε1, ε is total privacy budget, and carries out density storage to triangle adjacency matrix on this using the structure BT of standard binary tree. Concrete processing procedure is as described in substep:
3.1)That is binary tree structure initialization is counted according to the community's number determined when Q value maximum in step 2 Calculate the high h of tree of BT;
3.2) executed as i < h: wherein when i=0, by upper triangle adjacency matrix A be considered as the initial nonleaf node of BT tree into 3 region divisions of row;
3.3) each nonleaf node U ∈ lev (i, BT) is executed: calculates privacy budgetWithDivide 3 sub-districts DomainOtherwise executing terminates;
3.4) the 3 sub-regions collection R formed in each step are executed: calculates and counts Laplace noise figure in RAny subregion is represented with node V and is inserted into binary tree BT;If node V meets termination condition (areal concentration is more than or equal to given numerical value) then makes leaf region of the node V as BT;
3.5) it gos to step and 3.2) is repeated until terminating after i++;Finally return that noise community binary tree BT.
Further, privacy budget ε in step 32ε is formed by two parts2cp, εcIt makes an uproar for adding to area count, εp For selecting community's division points, and meet
Wherein, i-th executes the connection U of the binary tree BT to the region division and its formation of n omicronn-leaf region (U expression) ∈ lev (i, BT) is indicated;It indicates to apply privacy budgetRegion division is carried out to region U and forms 3 sub-regions; The function function divided in the R of region in step 3.3) is designed asζ is expressed as selecting in the R of region (division points are correspondingly formed upper left R to subregion collection after selecting division points pΔlRegion, bottom right RΔrRegion, intermediate rectangular are sparseRegion), the division points which finds out will be so that intermediate rectangular areal concentration den be minimum.
Step 4 carries out adding the while optimum allocation after making an uproar to node each in BT tree (corresponding one is adding matrix of making an uproar), rebuilds The adjacency matrix A of the matrix A that band is made an uproar ' and initial social network diagram to be releasedoRealize minimum difference.Concrete processing procedure As described in substep:
4.1) the noise binary tree BT for traversing building from top to bottom traverses n omicronn-leaf nodal regions (for rectangle sparse region), according to matrix of areasMiddle sum plus value of making an uproarAnd the size cases of true value c, in the direct of original c 1 position Adjacent area carries out Probabilistic Design distribution;
If 4.2) traverse the intensive leaf region of binary tree, binary tree leaf region triangle adjacency matrixIt indicates, because To be densely distributed so random arrangementA 1 to triangle R ' on leaf regionijIn [m (m-1)]/2 positions disposition 1;
4.3) by each matrix of areas R 'ijCompletely upper triangle adjacency matrix A ' is formed after splicing, and returns to the A '.
Further, triangle adjacency matrix is used in step 4.2)Indicate binary tree leaf node region, the present invention is each The division and processing of step all correspond to currently pending upper triangular matrix, and corresponding position element is by triangle on region in matrix R′ijInA position composition, m are the line number (columns) of the matrix.
Step 5, result publication, the community network publication figure G ' for having difference privacy noise after treatment is sent out Cloth.
Beneficial effect
The deficiency of protection aspect is issued for current social network data privacy structured data, the present invention is based on solution society The Privacy Protection for handing over network sensitive structure data information, in the premise for realizing difference privacy measuring and calculating graph structure degree of relevancy Under so that original social networks by grouping conversion and add make an uproar processing after can preferably protect original social networks hidden It is private.
By introducing in community's detection algorithm to the field of social networks difference secret protection, so that the knot of social networks Structure divide feasibility it is higher, carried out according to the result of division network node label rename but also network it is intensive dilute Thin region is easier to differentiation processing.
Compared to other social networks matrix disposal modes, the present invention is by using binary tree structure to upper triangle adjacency matrix The mode for dividing storage allows network area partition structure and keeping initial data correlation;Simultaneously according to dividing regions after identification The specific side of the different designs of domain dense aggregate restores distribution method, effectively promoted privacy release processing efficiency and data effectiveness and Privacy Safeguarding.
Detailed description of the invention
Fig. 1 is the basic procedure schematic diagram of present invention method.
Fig. 2 is the simple Undirected networks illustrated example in present invention method.
Fig. 3 (a), Fig. 3 (b) are the node label in the embodiment of the present invention based on Fig. 2 before and after the processing.
Fig. 4 is the binary tree structure sample figure generated in the embodiment of the present invention based on Fig. 2.
Fig. 5 is that Fig. 2 program operation process binary tree structure division result is based in the embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing and example, to this Invention is further elaborated.It should be appreciated that specific example described herein is only used to explain the present invention, and do not have to It is of the invention in limiting.
As shown in Figure 1, the adjacency matrix for social network diagram handles different close quarters, it is to be tied using community first Structure detection method combination difference privacy carries out the tag recognition of dense aggregate to social network diagram, allows and connects mutually close node From becoming community, and intercommunal connection is then more sparse, and different labels sequence can make the adjacency matrix density of network It is different to assemble situation;Triangle adjacency matrix carries out the exploration and identification of density area, poor designs on next network good to label The method of privacy division region independent to data is divided to carry out noise disturbance, and using the structure of standard binary tree to adjacency matrix Carry out density storage, it is ensured that the high efficiency after reconstruction;Final step is the side optimum allocation after opposite side adds matrix of making an uproar to carry out plus make an uproar. The concrete processing procedure step of the present embodiment includes:
Step 1 obtains a certain original social network diagram G to be released, inputs the exemplary diagram of Fig. 2, every relationship is all converted For the element form storage of (0,1) in adjacency matrix, it is mapped in away from shape two-dimensional surface matrix AoIn region, to be released χ, i.e. label in Fig. 3 (a) is marked in the initial node label of social network diagram, and records initial corresponding mark with array n [k] Label.
Wherein, network being described with symbol G=(V, E), V is the vertex set of network,For the relationship side of figure Collection, so carrying out vertex and the line set of corresponding diagram G using V (G) and E (G) respectively.Node label determines the adjacency matrix A of figure G (G) and meet:In order to save memory space, generally deposited using upper triangular matrix Storage.Three community networks for including in example network have 14 nodes, 22 sides, are indicated respectively by circle, it can be found that in community Inside, closely, and the connection between community is relatively sparse for the connection between node.
Step 2 introduces the tag recognition that fast community detection algorithm carries out dense aggregate to network G, makes connection mutual Close node becomes community certainly, and keeps intercommunal connection more sparse, and different labels sequences can make network Adjacency matrix dense aggregate situation is different, and the privacy budget of this process distribution is ε1, node of graph label is χ ' after identification, then uses number Node of graph label after group N [k] record alignment processing.Concrete processing procedure is as described in substep:
2.1) network G=(V, E) corresponding adjacency matrix and the modularity Q of the network are calculated, each section is initialized Putting is single community's label 1~14, n=| V |=14;
2.2) fast community detection algorithm will select the community Liang Ge to merge into new community in pairs, and merge every time all necessary Enable Q value to be able to maximum increase (or minimum reduction), while recording community's addition sequence, this merging process will seek maximum The optimal dividing of Q value, and need to update the matrix element e of the community the Liang Ge ranks selected after the completion of each mergingij
2.3) initialization community's aggregation matrix clusters [3, n], for difference community SC any in community set SCi And SCjCombination, i.e. SCi≠SCjWhen calculate Δ Q=2 (eij-aiaj);
If 2.4)It finds out Δ Q and changes maximum community SCiAnd SCjIn it is all right Simultaneously community merging process matrix Z [n-1,3] is written in its node ID by the node answered in pairs;
2.5) after community merges new communities' label variation range for being formed be [| V |+1,2 | V | -1], repeat step 2.3) Final identical to clusters (community's label), i.e. corresponding the third line of all nodes of clusters matrix is identical, obtains at this time Final community merging process matrix Z [n-1,3];
2.6) to matrix Z [n-1,3] progress first two columns traversal, (first two columns of Z matrix is characterized in community's detection process The process that community's label merges), it can be found that the sequencing merged when community's detection of aggregation, starts the cycle over each from the first row Capable first two columns, when community label Z [i, j]≤| V | when, community's node label when as initial need to resequence and to mark Label renaming, treated node label in node of graph the label χ ', as Fig. 3 (b) after alignment processing is recorded with array N [k].
Further, modularity in the stepI.e. a social networks is broken down into m community, Then correspond to the symmetrical matrix e, matrix element e of a dimension of m mijIt indicates in the node and community j in community network in community i Node is connected shared ratio in while all, eiiIt is shared in while all for connections all inside community's i interior joint Ratio.For indicating to be connected with the node in community i ratio shared in while all.
Step 2.2) needs iteration to execute t=| V | -1 time, calculating Δ Q need to be belonged to each time to the community of present node by merging The variation of corresponding function q (Δ Q, SC), i.e. introducing index difference privacy calculateChange Change situation, Q value is enabled to be able to maximum increase (or minimum reduction).
Wherein, Clusters is 3 × n matrix in step 2.3), and the community of network divides, and the 1st row is nodal scheme, the 2nd Row is the transformation of community's label, and the 3rd row is community's label, and third line label is identical at the end of division indicates all nodes same Community, detection terminates at this time;In label detection process, the every single-step iteration of node can be all routed in new community's label, Have recorded the conversion process of node-home community.
Matrix Z in step 2.5) is the matrix of a 13*3, and the i-th row indicates that i-th merges, the 1st column and the 2nd list Show that community's label of merging, the 3rd column are the modularities after merging.Wherein community's label is initialized as node label, from 1 to maximum To 14, the new communities' label range formed after the node of each step merges is [15,27], because the merging of 14 nodes needs It wants 13 times.
Step 3, upper triangle adjacency matrix A corresponding to the good network of label carry out the exploration and identification of density area, if It counts difference privacy methods division region independent to data and carries out noise disturbance, the privacy budget of this process distribution is ε2=ε-ε1, ε is total privacy budget, and carries out density storage to adjacency matrix using the structure BT of standard binary tree.Concrete processing procedure is as follows Described in column sub-step:
3.1)That is binary tree structure initialization is counted according to the community's number determined when Q value maximum in step 2 Calculate the high h of tree of BT;
3.2) executed as i < h: wherein when i=0, by upper triangle adjacency matrix A be considered as the initial nonleaf node of BT tree into 3 region divisions of row;
3.3) each nonleaf node U ∈ lev (i, BT) is executed: calculates privacy budgetWithDivide 3 sub-districts DomainOtherwise executing terminates;
3.4) the 3 sub-regions collection R formed in each step are executed: calculates and counts Laplace noise figure in RAny subregion is represented with node V and is inserted into binary tree BT;If node V meets termination condition (areal concentration >=70%) then makes leaf region of the node V as BT;
3.5) it gos to step and 3.2) is repeated until terminating after i++;Finally return that noise community binary tree BT.
In this example, binary tree BT forms 3 final leaf nodes, that is, has corresponded to 3 compact districts of social network diagram Domain, areal concentration den (R) >=70%, community detection are completed at this time.As shown in Fig. 3 (b), corresponding first possible division Point is (9,9), and division operation identifies corresponding 3 regions with thick lines, is divided since 9 rows 9 column, it is meant that the upper left corner The upper Delta Region of 9 ranks and the upper triangle neighboring region of remaining 5 rank in the lower right corner form the binary tree or so during current iteration Leaf node, as region R1=A [1,9;10,14], R2=AΔ[1,9;1,9], R3=AΔ[10,14;10,14].Each step Region division can all form binary tree father node of the sparse n omicronn-leaf region as two leaf regions, region as escribed above R1It is the sparse region of noise statistics, and noise statistics counting region R3It is as final without further division for close quarters Binary tree BT a leaf node region, this programme only need to be to region R2Carry out the Loop partition of next step.It is grasped by algorithm Make, the high h value of tree of the BT of binary tree, which has reached, forms the final corresponding binary tree structure of 3 leaf nodes, as shown in Figure 4.
Further, privacy budget ε in step 32Consist of two parts ε2cp, εcIt makes an uproar for adding to area count, εp For selecting community's division points,Wherein, i-th is executed to n omicronn-leaf region (U expression) Region division and its formation binary tree BT connection with U ∈ lev (i, BT) indicate;Indicate pre- using privacy It calculatesRegion division is carried out to region U and forms 3 sub-regions;The function function divided in the R of region in step 3.3) is designed asWherein ζ is expressed as selecting the subregion collection (division points after division points p in the R of region It is correspondingly formed upper left RΔlRegion, bottom right RΔrRegion, intermediate rectangular are sparseRegion), division points which finds out will so that Intermediate rectangular areal concentration den is minimum.
Step 4 carries out adding the while optimum allocation after making an uproar to node each in BT tree (corresponding one is adding matrix of making an uproar), rebuilds The adjacency matrix A of the matrix A that band is made an uproar ' and initial social network diagram to be releasedoRealize minimum difference.Concrete processing procedure As described in substep:
4.1) the noise binary tree BT for traversing building from top to bottom traverses n omicronn-leaf nodal regions (for rectangle sparse region), according to matrix of areasMiddle sum plus value of making an uproarAnd the size cases of true value c, in the direct of original c 1 position Adjacent area carries out Probabilistic Design distribution;
If 4.2) traverse the intensive leaf region of binary tree, binary tree leaf region triangle adjacency matrixIt indicates, because To be densely distributed so random arrangementA 1 to triangle R ' on leaf regionijIn [m (m-1)]/2 positions disposition 1;
4.3) by each matrix of areas R 'ijCompletely upper triangle adjacency matrix A ' is formed after splicing, and returns to the A '.
Wherein, triangle adjacency matrix is used in step 4.2)Indicate binary tree leaf node region, each step of the present invention It divides and processing all corresponds to currently pending upper triangular matrix, corresponding position element is by triangle R ' on region in matrixijIn 'sA position composition, m are the line number (columns) of the matrix.
Step 5, result publication, the planned network publication figure G ' for having difference privacy noise after treatment is sent out Cloth.
It has been presented in Fig. 5 using example network Fig. 2 using this method treated binary tree structure division result, wherein 1,3 column are respectively x, y origin coordinates, and 2,4 column are respectively x, y terminating coordinates, and 5 are classified as plus actual count c before making an uproar, and 6 are classified as plus make an uproar Count value7 are classified as the binary tree number of plies where node, and it is leaf node that 8 column, which are 1, and 9 are classified as the density d en (R) in the region.
The technical means disclosed in the embodiments of the present invention is not limited only to technological means disclosed in above embodiment, further includes Technical solution consisting of any combination of the above technical features, falls within the scope of the claimed invention.

Claims (12)

1. a kind of difference privacy towards social network data handles dissemination method, it is characterised in that following steps:
Step 1) obtains a certain original social network diagram G to be released, and every relationship is all converted in adjacency matrix the member of (0,1) Prime form storage, is mapped in away from shape two-dimensional surface matrix AoIn region, the node initial to social network diagram to be released χ is marked in label, records initial corresponding label with array n [k];
Step 2) introduces the tag recognition that fast community detection algorithm carries out dense aggregate to network G, makes connection mutually close Node from community is become, intercommunal connection is more sparse, and different labels sequences can make the adjacency matrix of network close Degree aggregation situation is different, and the privacy budget of this process distribution is ε1, node of graph label is χ ' after identification, then is recorded with array N [k] Node of graph label χ ' after alignment processing;
Step 3) the network adjacent matrix good to label carries out the exploration and identification of density area, designs difference privacy methods logarithm Noise disturbance is carried out according to independent division region, the privacy budget of this process distribution is ε2=ε-ε1, ε is total privacy budget, and is adopted Density storage is carried out to adjacency matrix with the structure BT of standard binary tree;
Step 4) carries out adding the while optimum allocation after making an uproar to node each in BT tree (corresponding one is adding matrix of making an uproar), rebuilds band and makes an uproar Matrix A ' adjacency matrix A with initial social network diagram to be releasedoRealize minimum difference;
The planned network publication figure G ' for having difference privacy noise after treatment is issued in the publication of step 5) result.
2. the method according to claim 1, wherein introducing fast community detection algorithm FN knot in the step 2) The tag recognition that difference privacy carries out dense aggregate to original social networks to be released is closed, detailed step detects in the community Include:
2.1) network G=(V, E) corresponding adjacency matrix and the modularity Q of the network are calculated, initializing each node is Single community's label 1~n, n=| V |;
2.2) fast community detection algorithm will select the community Liang Ge to merge into new community in pairs, and merging must all make every time Q value can be able to maximum increase, or minimum reduction, while record community's addition sequence, this merging process will seek maximum Q value Optimal dividing, and need to update the matrix element e of the community the Liang Ge ranks selected after the completion of each mergingij
2.3) initialization community's aggregation matrix clusters [3, n], for difference community SC any in community set SCiAnd SCj's Combination, i.e. SCi≠SCjWhen calculate Δ Q=2 (eij-aiaj);
If 2.4)It finds out Δ Q and changes maximum community SCiAnd SCjIn it is all corresponding Simultaneously community merging process matrix Z [n-1,3] is written in its node ID by node in pairs;
2.5) after community merges new communities' label variation range for being formed be [| V |+1,2 | V | -1], repeat step 2.3) extremely Clusters (community's label) is final identical, i.e. corresponding the third line of all nodes of clusters matrix is identical, obtains at this time most Whole community merging process matrix Z [n-1,3];
2.6) to matrix Z [n-1,3] progress first two columns traversal, (first two columns of Z matrix characterizes the community in community's detection process The process that label merges), find the sequencing that merges when community's detection of aggregation, from the first row start the cycle over every a line before two Column, when community label Z [i, j]≤| V | when, community's node label when as initial need to resequence and be ordered again label Name, the node of graph label χ ' after alignment processing is recorded with array N [k].
3. the method according to claim 1, wherein modularity in the step 2)I.e. one Social networks is broken down into m community, then corresponds to the symmetrical matrix e, matrix element e of a dimension of m mijIt indicates in community network The node in node and community j in community i is connected shared ratio in while all, eiiFor institute inside community's i interior joint The ratio for having connection shared in while all;For indicating to be connected with the node in community i while while all In shared ratio.
4. according to the method described in claim 2, it is characterized in that, the step 2.2) needs iteration to execute t=| V | -1 time, often It is primary to merge the variation that belong to the corresponding function q (Δ Q, SC) of calculating Δ Q to the community of present node, i.e. introducing index difference Privacy calculatesSituation of change, enable Q value to be able to maximum increase, or minimum reduce.
5. according to the method described in claim 2, it is characterized in that, wherein, Clusters is 3 × n matrix, net in step 2.3) The community of network divides: the 1st row is nodal scheme, and the 2nd row is the transformation of community's label, and the 3rd row is community's label, at the end of division 3rd line label is identical to indicate all nodes in same community, and detection terminates at this time;In label detection process, each step of node Iteration can be all routed in new community's label, have recorded the conversion process of node-home community.
6. according to the method described in claim 2, it is characterized in that, the matrix Z in step 2.5) is the square of (n-1) * 3 Battle array, the i-th row indicate that i-th node merges, and the 1st column and the 2nd column indicate that combined community's label, the 3rd column are the modules after merging Degree;Wherein community's label is initialized as node label, from 1 to maximum to n=| V |, formed after the node of each step merges New communities' label range be [| V |+1,2 | V | -1], because | V | the merging of a node needs | V | -1 time.
7. the method according to claim 1, wherein in the step 3) to the good network adjacent matrix of label into The exploration in line density region and identification the following steps are included:
3.1)That is binary tree structure initialization calculates BT according to the community's number determined when Q value maximum in step 2 The high h of tree;
3.2) it is executed as i < h: wherein when i=0, the upper triangle adjacency matrix A initial nonleaf node for being considered as BT tree being carried out 3 A region division;
3.3) each nonleaf node U ∈ lev (i, BT) is executed: calculates privacy budgetWithDivide 3 sub-regionsOtherwise executing terminates;
3.4) the 3 sub-regions collection R formed in each step are executed: calculates and counts Laplace noise figure in RAny subregion is represented with node V and is inserted into binary tree BT;If node V meets termination condition, Areal concentration is more than or equal to given numerical value, then node V is made to become the leaf region of BT;
3.5) it gos to step and 3.2) is repeated until terminating after i++;Finally return that noise community binary tree BT.
8. the method according to claim 1, wherein privacy budget ε in the step 3)2ε is formed by two parts2cp, εcIt makes an uproar for adding to area count, εpFor selecting community's division points,
9. it is indicated the method according to the description of claim 7 is characterized in that i-th executes n omicronn-leaf region in step 3.2) with U, The connection of the region division and its binary tree BT of formation are indicated with U ∈ lev (i, BT);Indicate pre- using privacy It calculatesRegion division is carried out to region U and forms 3 sub-regions.
10. the method according to the description of claim 7 is characterized in that dividing the function function design in the R of region in step 3.3) ForWherein ζ is expressed as selecting the subregion collection after division points p, a division points in the R of region It is correspondingly formed upper left RΔlRegion, bottom right RΔrRegion, intermediate rectangular are sparseRegion, division points which finds out will so that Intermediate rectangular areal concentration den is minimum.
11. the method according to claim 1, wherein after opposite side adds matrix of making an uproar to carry out plus make an uproar in the step 4) Side optimum allocation detailed step include:
4.1) the noise binary tree BT for traversing building from top to bottom, traversing n omicronn-leaf nodal regions is rectangle sparse regionRoot According to matrix of areasMiddle sum plus value of making an uproarAnd the size cases of true value c, in the direct neighbor region of original c 1 position Carry out Probabilistic Design distribution;
If 4.2) traverse the intensive leaf region of binary tree, binary tree leaf region triangle adjacency matrixIt indicates, because point Cloth is intensive so random arrangementA 1 to triangle R ' on leaf regionijIn [m (m-1)]/2 positions disposition 1;
4.3) by each matrix of areas R 'ijCompletely upper triangle adjacency matrix A ' is formed after splicing, and returns to the A '.
12. according to the method described in claim 9, it is characterized in that, using triangle adjacency matrix in step 4.2)Indicate two Divide leaf nodes region, the division and processing of each step all correspond to currently pending upper triangular matrix, correspond to position in matrix Element is set by triangle R ' on regionijInA position composition, m are line number, that is, columns of the matrix.
CN201810705888.7A 2017-08-07 2018-06-29 Differential privacy processing and publishing method for social network data Active CN109299615B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710669586 2017-08-07
CN2017106695864 2017-08-07

Publications (2)

Publication Number Publication Date
CN109299615A true CN109299615A (en) 2019-02-01
CN109299615B CN109299615B (en) 2022-05-17

Family

ID=65168289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810705888.7A Active CN109299615B (en) 2017-08-07 2018-06-29 Differential privacy processing and publishing method for social network data

Country Status (1)

Country Link
CN (1) CN109299615B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858282A (en) * 2019-02-12 2019-06-07 北京信息科技大学 A kind of social network relationships data-privacy guard method and system
CN110210248A (en) * 2019-06-13 2019-09-06 重庆邮电大学 A kind of network structure towards secret protection goes anonymization systems and method
CN111046429A (en) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 Method and device for establishing relationship network based on privacy protection
CN112541042A (en) * 2020-12-17 2021-03-23 四川新网银行股份有限公司 Method for generating lightweight social network under ten-million orders of magnitude
CN113254999A (en) * 2021-06-04 2021-08-13 郑州轻工业大学 User community mining method and system based on differential privacy
CN115828312A (en) * 2023-02-17 2023-03-21 浙江浙能数字科技有限公司 Privacy protection method and system for power user social network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701469A (en) * 2013-12-26 2014-04-02 华中科技大学 Compression and storage method for large-scale image data
CN106022938A (en) * 2016-06-02 2016-10-12 北京奇艺世纪科技有限公司 Social network user association dividing method and social network user association dividing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701469A (en) * 2013-12-26 2014-04-02 华中科技大学 Compression and storage method for large-scale image data
CN106022938A (en) * 2016-06-02 2016-10-12 北京奇艺世纪科技有限公司 Social network user association dividing method and social network user association dividing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RUI CHEN 等: "Correlated network data publication via differential privacy", 《THE VLDB JOURNAL》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858282A (en) * 2019-02-12 2019-06-07 北京信息科技大学 A kind of social network relationships data-privacy guard method and system
CN109858282B (en) * 2019-02-12 2020-12-25 北京信息科技大学 Social network relationship data privacy protection method and system
CN110210248A (en) * 2019-06-13 2019-09-06 重庆邮电大学 A kind of network structure towards secret protection goes anonymization systems and method
CN111046429A (en) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 Method and device for establishing relationship network based on privacy protection
CN111046429B (en) * 2019-12-13 2021-06-04 支付宝(杭州)信息技术有限公司 Method and device for establishing relationship network based on privacy protection
CN112541042A (en) * 2020-12-17 2021-03-23 四川新网银行股份有限公司 Method for generating lightweight social network under ten-million orders of magnitude
CN112541042B (en) * 2020-12-17 2022-11-04 四川新网银行股份有限公司 Method for generating lightweight social network under ten million orders of magnitude
CN113254999A (en) * 2021-06-04 2021-08-13 郑州轻工业大学 User community mining method and system based on differential privacy
CN115828312A (en) * 2023-02-17 2023-03-21 浙江浙能数字科技有限公司 Privacy protection method and system for power user social network
CN115828312B (en) * 2023-02-17 2023-06-16 浙江浙能数字科技有限公司 Privacy protection method and system for social network of power user

Also Published As

Publication number Publication date
CN109299615B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN109299615A (en) A kind of difference privacy processing dissemination method towards social network data
Wang et al. The evolution of the Internet of Things (IoT) over the past 20 years
Gupta et al. Top-k interesting subgraph discovery in information networks
CN104866781B (en) The community network data publication method for secret protection of Community-oriented detection application
CN103336783B (en) Associating Thiessen polygon and the density map drafting method of inverse distance-weighting
CN103838829B (en) Raster vectorization system based on hierarchical boundary-topology search model
Zhang et al. Local refinement for analysis-suitable++ T-splines
CN105894587B (en) A kind of ridge line and valley route filter method of rule-based constraint
CN104317904B (en) A kind of extensive method of Weight community network
CN107895038A (en) A kind of link prediction relation recommends method and device
Zarate et al. Optimal sankey diagrams via integer programming
CN110134879A (en) A kind of point of interest proposed algorithm based on difference secret protection
CN102136133B (en) A kind of image processing method and image processing apparatus
CN108471382A (en) A kind of complex network clustering algorithm attack method based on node angle value
CN101741623A (en) Method for network visualization
CN103164487B (en) A kind of data clustering method based on density and geological information
CN108898013A (en) A kind of Android application interface similarity-rough set method dividing feature vector based on layout
CN104331883B (en) A kind of image boundary extraction method based on asymmetric inversed placement model
van Dijk et al. Block crossings in storyline visualizations
CN101533525A (en) Method for analyzing the overlay of point and face
CN109741421A (en) A kind of Dynamic Graph color method based on GPU
CN106685893B (en) A kind of authority control method based on social networks group
CN112380267B (en) Community discovery method based on privacy graph
Wang et al. Detecting overlapping communities based on vital nodes in complex networks
CN103984724A (en) Visualization interaction method based on space optimization tree layout

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant