CN114692200A

CN114692200A - Privacy protection distributed graph data feature decomposition method and system

Info

Publication number: CN114692200A
Application number: CN202210341719.6A
Authority: CN
Inventors: 郑宜峰; 王松磊
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2022-07-01
Anticipated expiration: 2042-04-02
Also published as: CN114692200B

Abstract

The invention discloses a distributed graph data characteristic decomposition method and a system for privacy protection, in the method provided by the invention, randomly sampled graph nodes holding local graph data encrypt own degree information and send the information to a first computing terminal and a second computing terminal, the first computing terminal and the second computing terminal cooperatively compute and generate first encryption degree distribution information and second encryption degree distribution information in a ciphertext domain, so that the graph nodes can determine a target interval to which the degree of the graph nodes belong, further select proper sampling sensitivity sampling noise, add false edges with weight of 0 in a real graph adjacent matrix, realize sparse representation of the matrix in the form of the matrix, encrypt a triple set added with the false edges and respectively send the ciphertext to the first computing terminal and the second computing terminal for encrypted characteristic decomposition, on the premise of protecting node privacy, the sparsity of graph data is kept and the effectiveness of feature decomposition is guaranteed.

Description

Privacy protection distributed graph data feature decomposition method and system

Technical Field

The invention relates to the technical field of information security, in particular to a distributed graph data feature decomposition method and system with privacy protection.

Background

Graph (Graph) data can describe complex interrelationships between entities, a wide variety of analysis tasks can be performed on the information-rich Graph data, and the analysis tasks on the Graph can become more challenging when the Graph data appears in a distributed fashion. By distributed, it is meant that each entity can only obtain partial data about the entire graph (named local graph data). For example, in a phonebook network, each user is a graph node, and the phonebook of each user represents the contacts (i.e., edges in the graph data) between that user and other users. Obviously, if the phonebook network is modeled as a graph, no entity can directly obtain the information of the whole graph, and instead, each user can only know a part of the connection relationship (i.e. the contact information contained in his own phonebook).

Collecting such distributed graph data for graph task analysis can raise significant privacy concerns (e.g., no one would like to share out his phone book). Thus, if the local graph data owned by each user is not protected, they may be unwilling to participate in the analysis of such graph tasks. Therefore, there is a need to introduce privacy protection mechanisms in task analysis performed on such distributed graph data, so that valuable graph analysis tasks can be performed without compromising each user's sensitive and private local graph data.

In graph analysis tasks, feature decomposition is a very popular basic task. The characteristic decomposition based on the graph data acts on the adjacency matrix of the graph data to generate characteristic values and characteristic vectors, which can provide basic information for various graph analysis tasks, such as community structure detection, community important member discovery, graph division, webpage sorting and the like, but at present, no characteristic decomposition scheme for realizing privacy protection on distributed graph data exists.

Thus, there is a need for improvements and enhancements in the art.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method and a system for decomposing the characteristics of distributed graph data with privacy protection, and aims to solve the problem that a characteristic decomposition scheme for realizing privacy protection on the distributed graph data does not exist in the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

in a first aspect of the present invention, a privacy-preserving distributed graph data feature decomposition method is provided, where the method includes:

generating an initial set by a target graph node in a global graph according to local graph data, wherein the initial set comprises a plurality of groups of triples, and each group of triples comprises a node mark of the target graph node, a node mark of an adjacent graph node of the target graph node and the weight of a connecting edge of the target graph node and the adjacent graph node;

the target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, the first encryption degree information is sent to a first computing terminal, and the second encryption degree information is sent to a second computing terminal;

the first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global graph data according to the first encryption degree information and the second encryption degree information sent by the target graph nodes;

the target graph node determines a target interval to which the degree of the target graph node belongs according to the received first encryption degree distribution information and the second encryption degree distribution information, determines a target sampling sensitivity according to boundary information of the target interval, samples noise from Laplace distribution according to the target sampling sensitivity, adds a false triple in the target combination according to the noise, and generates a target set, wherein the weight value of the false triple is 0;

the target graph node encrypts the target set based on additive secret sharing to obtain a first encryption set and a second encryption set, sends the first encryption set to a first computing terminal, and sends the second encryption set to a second computing terminal;

and the first computing terminal and the second computing terminal carry out feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph.

The privacy-protected distributed graph data feature decomposition method includes, before the target graph node encrypts the degree of the target graph node based on function secret sharing, the steps of:

the first computing terminal and/or the second computing terminal randomly selects part of nodes in all nodes of the global graph to send encryption requests;

and after the target graph node receives the encryption request, the target graph node encrypts the degree of the target graph node based on function secret sharing.

The privacy-protected distributed graph data feature decomposition method, wherein the target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, includes:

and the target graph node acquires the first encryption degree information and the second encryption degree information output by a first preset algorithm in function secret sharing, wherein the input of the first preset algorithm comprises the degree of the target graph node.

The privacy-protected distributed graph data feature decomposition method, in which the first and second computing terminals generate first and second encryption degree distribution information of global graph data according to the first and second encryption degree information sent by the plurality of target graph nodes, includes:

the first computing terminal inputs the first encryption degree information of the target graph node and a target scale into a second preset algorithm in function secret sharing to obtain first encryption degree comparison information between the degree of the target graph node and the target scale, and the second computing terminal inputs the second encryption degree information of the target graph node and the target scale into the second preset algorithm to obtain second encryption degree comparison information between the degree of the target graph node and the target scale;

wherein the sum of the degree of the target graph node and the first encryption degree comparison information and the second encryption degree comparison information of the target scale is 1 when the degree of the target graph node is equal to the target degree, and is 0 otherwise;

the first computing terminal obtains first encryption histogram information, the second computing terminal obtains second encryption histogram information, the first encryption histogram information includes first encryption graph node quantity information corresponding to each target degree, each first encryption graph node quantity information is the sum of all the first encryption degree comparison information corresponding to one target degree, the second encryption histogram information includes second encryption graph node quantity corresponding to each target degree, and each second encryption graph node quantity information is the sum of all the second encryption degree comparison information corresponding to one target degree;

acquiring first encryption degree information between the degrees of the target graph nodes and each target scale as first encryption degree histogram information, and acquiring second encryption degree information between the degrees of the target graph nodes and each target scale as second encryption degree histogram information by the second computing terminal;

and the first computing terminal and the second computing terminal determine the first encryption degree distribution information and the second encryption degree distribution information according to the first encryption degree histogram information and the second encryption degree histogram information.

The method for decomposing the data characteristics of the privacy-protected distributed graph comprises the following steps that each digit value in the first encryption degree distribution information and the second encryption degree distribution information is 0 or 1; the first computing terminal and the second computing terminal determine the first encryption degree distribution information and the second encryption degree distribution information according to the first encryption degree histogram information and the second encryption degree histogram information, and the method comprises the following steps:

the first computing terminal and the second computing terminal determine the number of target nodes in each interval according to the number of the target graph nodes sending the encryption degree information and the preset interval number;

the first computing terminal sequentially adds each first encryption graph node quantity information in the first encryption degree histogram information to a first accumulator according to the size sequence of the corresponding target scale, and the second computing terminal sequentially adds each second encryption graph node quantity information in the second encryption degree histogram information to a second accumulator according to the size sequence of the corresponding target scale;

after the first encryption graph node quantity information and the second encryption graph node quantity information are respectively added into the first accumulator and the second accumulator, the first computing terminal obtains a first encryption comparison result according to the first accumulator and generates a new one-bit numerical value in the first encryption degree distribution information according to the first encryption comparison result, the second computing terminal obtains a second encryption comparison result according to the second accumulator and generates a new one-bit numerical value in the second encryption degree distribution information according to the second encryption comparison result, wherein when the sum of the first accumulator and the second accumulator is not less than the target node quantity, the XOR gate operation result of the first encryption comparison result and the second encryption comparison result is 1, and when the sum of the first accumulator and the second accumulator is less than the target node quantity, the exclusive-or gate operation result of the first encryption comparison result and the second encryption comparison result is 0;

the first computing terminal inverts the latest one-bit numerical value in the first encryption degree distribution information to obtain an inversion bit, the first computing terminal obtains a first secret share based on additive secret sharing calculation, the second computing terminal obtains a second secret share based on additive secret sharing calculation, wherein the sum of the first secret share and the second secret share is the product of a first value and a second value, the first value is the exclusive-or gate operation result of the inversion bit and the latest one-bit in the second encryption degree distribution information, and the second value is the sum of the first accumulator and the second accumulator;

the first computing terminal updates the value of the first accumulator to the first secret share, adds next first encryption map node number information to the first accumulator, and the second computing terminal updates the value of the second accumulator to the second secret share, and adds next second encryption map node number information to the second accumulator.

The method for decomposing the data characteristics of the distributed graph with the privacy protection function, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new one-digit numerical value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, and generates a new one-digit numerical value in the second encryption degree distribution information according to the second encryption comparison result, includes:

the third computing terminal generates a first key, a second key, a first random number and a second random number according to a third preset algorithm shared by function secrets and the number of the target nodes, sends the first key and the first random number to the first computing terminal, and sends the second key and the second random number to the second computing terminal,

the first computing terminal sending the sum of the first random number and the first accumulator to the second computing terminal, the second computing terminal generating a second random number, the sum of the second random number and the second accumulator being sent to the first computing terminal, such that the first computing terminal and the second computing terminal each obtain a scrambling input value, the scrambling input value being the sum of the first random number, the second random number, the first accumulator, and the second accumulator;

the first computing terminal inputs the permutation input value and the first secret key to a fourth preset algorithm of function secret sharing to obtain the first encryption bit, and the second computing terminal inputs the permutation input value and the second secret key to the fourth preset algorithm of function secret sharing to obtain a second encryption bit, wherein when the sum of the first accumulator and the second accumulator is less than the number of target nodes, the result of the exclusive-or gate operation of the first encryption bit and the second encryption bit is 1, and when the sum of the first accumulator and the second accumulator is not less than the number of target nodes, the result of the exclusive-or gate operation of the first encryption bit and the second encryption bit is 0;

the first computing terminal takes the first encryption bit as a new numerical value in the first encryption degree distribution information, the second computing terminal turns over the second encryption bit to be used as a new numerical value in the second encryption degree distribution information, or the first computing terminal turns over the first encryption bit to be used as a new numerical value in the first encryption degree distribution information, and the second computing terminal takes the second encryption bit as a new numerical value in the second encryption degree distribution information.

The method for decomposing the data characteristics of the distributed graph with the privacy protection function, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator and generates a new one-digit numerical value in the first encryption degree distribution information according to the first encryption comparison result, the second computing terminal obtains a second encryption comparison result according to the second accumulator and generates a new one-digit numerical value in the second encryption degree distribution information according to the second encryption comparison result, comprises the following steps:

the first computing terminal acquires a first random number, the second computing terminal acquires a second random number, and the sum of the first random number and the second random number is the number of the target nodes;

the first computing terminal obtains first bit data, the second computing terminal obtains second bit data, the first bit data is bit data corresponding to the difference between the first accumulator and the first random number, and the second bit data is bit data corresponding to the difference between the second accumulator and the second random number;

the first computing terminal and the second computing terminal input the respective held bit data to a parallel prefix adding circuit, perform exclusive-or gate calculation and gate calculation to obtain the most significant bit of the first bit data and the most significant bit of the second bit data, respectively, when the sum of the first accumulator and the second accumulator is less than the number of target nodes, the exclusive-or gate operation result of the most significant bit of the first bit data and the second bit data is 1, and when the sum of the first accumulator and the second accumulator is not less than the number of target nodes, the exclusive-or gate operation result of the most significant bit of the first bit data and the second bit data is 0;

the first computing terminal takes the most significant bit of the first bit data as a new bit value in the first encryption degree distribution information, the second computing terminal turns over the most significant bit of the second bit data to be used as a new bit value in the second encryption degree distribution information, or the first computing terminal turns over the most significant bit of the first bit data to be used as a new bit value in the first encryption degree distribution information, and the second computing terminal takes the most significant bit of the second bit data as a new bit value in the second encryption degree distribution information.

The privacy-protected distributed graph data feature decomposition method, wherein the feature decomposition of the global graph data is performed by the first computing terminal and the second computing terminal according to the first encryption set and the second encryption set corresponding to each node in the global graph, and includes:

the first computing terminal obtains a first encryption adjacent matrix according to the first encryption set corresponding to each node in the global graph, and the second computing terminal obtains a second encryption adjacent matrix according to the second encryption set corresponding to each node in the global graph;

the first computing terminal and the second computing terminal perform dimensionality reduction on the sum of the first encryption adjacent matrix and the second encryption adjacent matrix based on additive secret sharing to obtain a dimensionality reduction matrix;

the first computing terminal and the second computing terminal execute a QR algorithm on the dimension reduction matrix based on additive secret to obtain an encrypted characteristic value and an encrypted characteristic vector of the global graph data;

for square root operation in the dimension reduction process, the first computing terminal and the second computing terminal iteratively calculate through a second computing formula based on additive secret sharing to obtain the reciprocal of the square root;

the second calculation formula is:

y'_nthe calculation result of the reciprocal of the square root of the nth iteration calculation is shown, and x' represents the number of the square root to be generated.

The privacy-protected distributed graph data feature decomposition method, wherein the first computing terminal and the second computing terminal execute a QR algorithm on the dimensionality reduction matrix based on an additive secret, and includes:

in the ith iteration in a QR algorithm, a first encryption matrix and a second encryption matrix are obtained by a first computing terminal, the sum of the first encryption matrix and the second encryption matrix is a target matrix, and the target matrix is a matrix formed by elements with the positions of (i, i), (i, i +1), (i +1, i) and (i +1) in a plaintext Givens rotation matrix used in the ith iteration;

for matrix multiplication in the QR algorithm, the first computing terminal and the second computing terminal realize multiplication in additive secret sharing by taking the first encryption matrix and the second encryption matrix as two secret shares of a Givens rotation matrix in the QR algorithm based on a randomly generated multiplication tuple matrix.

In a second aspect of the present invention, a privacy-protected distributed graph data feature decomposition system is provided, where the system includes a target graph node, a first computing terminal, and a second computing terminal, where the target graph node, the first computing terminal, and the second computing terminal are configured to execute relevant steps in the privacy-protected distributed graph data feature decomposition method provided in the first aspect of the present invention.

Compared with the prior art, the invention provides a distributed graph data feature decomposition method and a distributed graph data feature decomposition system for privacy protection, wherein in the distributed graph data feature decomposition method for privacy protection, randomly sampled graph nodes holding local graph data encrypt own degree information and send the degree information to a first computing terminal and a second computing terminal, the first computing terminal and the second computing terminal cooperatively compute and generate first encryption degree distribution information and second encryption degree distribution information in a ciphertext domain, so that the graph nodes can determine a target interval to which the own degrees belong, further select proper sampling sensitivity sampling noise, add false edges with weight of 0 in a real graph adjacent matrix, realize sparse representation of the matrix in the form of a matrix triplet, encrypt a triplet set to which the false edges are added by the graph nodes to obtain a first encryption set and a second encryption set, and respectively send the first encryption set and the second encryption set to the first computing terminal and the second computing terminal, the first computing terminal and the second computing terminal carry out feature decomposition based on the first encryption set and the second encryption set, on the premise of protecting node privacy, sparsity of graph data is kept, meanwhile, effectiveness of feature decomposition is guaranteed, and distributed feature data feature decomposition of privacy protection is achieved.

Drawings

FIG. 1 is a flow diagram of an embodiment of a privacy-preserving distributed graph data feature decomposition method provided by the present invention;

FIG. 2 is a schematic diagram of an application scenario of an embodiment of a distributed graph data feature decomposition method for privacy protection provided in the present invention;

FIG. 3 is a density function of Laplace distributions for different sensitivities;

FIG. 4 is a schematic diagram of a secure degree histogram estimation algorithm in an embodiment of a distributed graph data feature decomposition method for privacy protection provided by the present invention;

FIG. 5 is a schematic diagram of a secure degree distribution information generation algorithm in an embodiment of a distributed graph data feature decomposition method for privacy protection provided by the present invention;

FIG. 6 is a schematic diagram of a parallel prefix addition circuit in an embodiment of a privacy preserving distributed graph data feature decomposition method provided by the present invention;

FIG. 7 is a schematic diagram of a partial graph data encryption algorithm in an embodiment of a privacy preserving distributed graph data feature decomposition method provided by the present invention;

FIG. 8 is a schematic illustration of the Arnoldi algorithm in plaintext;

FIG. 9 is a diagram of a plain-text Lanczos algorithm;

fig. 10 is a schematic diagram of an Arnoldi algorithm of a ciphertext domain in an embodiment of a privacy-preserving distributed graph data feature decomposition method provided by the present invention;

FIG. 11 is a diagram illustrating an iterative process of a QR algorithm in the prior art;

FIG. 12 is a diagram illustrating a QR algorithm of a ciphertext domain in an embodiment of a distributed graph data feature decomposition method of privacy protection provided by the present invention;

fig. 13 is a schematic diagram of an iterative process of a QR algorithm in an embodiment of the privacy-preserving distributed graph data feature decomposition method provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example one

The embodiment provides a privacy-protection distributed graph data feature decomposition method, which aims to realize feature decomposition of a ciphertext domain on distributed graph data in a privacy-protection mode, and generate an eigenvalue and an eigenvector based on an adjacency matrix acted on the graph data by the feature decomposition of the graph data. As shown in fig. 2, in the method provided in this embodiment, three entities are included: n users U_i(i∈[1,N]) Two cloud servers (computing terminals) CS₁And CS₂And an analyst. Each user holds partial Graph data (for example, an address book held by each user in an address book scene) to form a complete Graph (Graph). In the graph, each user represents a graph node, and the relationship between the usersEdges in the graph are represented (e.g., by contact addresses in a contact list scenario). The size of the local data held by each user indicates the degree (e.g., the number of contacts per user in the address book scenario) of each node.

The distributed graph may be represented in the form of an adjacency matrix a of size N × N, where each row a [ i,:](i∈[1,N]) Representing a user U_iHeld partial graph data. For example, in an unweighted graph, A [ i, j ]]1 may represent a user U_iAnd U_jThere is a relationship between; in a weighted graph, A [ i, j ]]V may represent a user U_iAnd U_jThere is a relationship between them and the intimacy is v. These users allow the cloud server to perform analysis tasks on the consolidated data of their partial graph data (i.e., the complete graph data). However, due to privacy concerns, each user U is in the process of the entire feature decomposition_iLocal graph data A [ i, j ] which do not want to keep private]The disclosure is carried out, and because the sensitive information of the scheme can be leaked, a feature decomposition method for privacy protection on a distributed graph is provided for the scheme.

In the method provided by the embodiment, the number of the participants providing the cloud computing service is two (denoted as CS)₁And CS₂) And from a different trust domain. This can be served by two competing cloud providers in a real-world industrial scenario. CS₁And CS₂The feature decomposition task is performed assisted and no sensitive information of any user can be obtained all the way, and the result of the feature decomposition in the clear cannot be obtained at the same time, because the result of the feature decomposition is also encrypted. Both cloud services merchants are "honest but curious" and non-collusive. Namely, each cloud server as a computing terminal can faithfully execute the security protocol designed by us to carry out operation, and simultaneously, the cloud servers try to independently infer the sensitive data of the user from the processes of data collection and feature decomposition. Under the scene, the method provided by the invention comprises the following two parts:

1. secure distributed graph data gathering: at this stage, the cloud server collects each user U_iThe encrypted partial map data A [2 ]i,:](i∈[1,N]) A complete encrypted adjacency matrix is formed to support subsequent eigen decomposition. Meanwhile, in the process of completing the collection of the graph data, the method provided by the embodiment can well keep the sparsity of the graph data, and the characteristic can greatly save the calculation and communication overhead of performing feature decomposition on the encrypted adjacency matrix subsequently.

2. Safe feature decomposition: the cloud server, after collecting the complete encrypted adjacency matrix, cooperatively performs a feature decomposition of the ciphertext domain. Specifically, the dimensionality reduction of the matrix is completed in the ciphertext domain, and then the QR algorithm is implemented in the ciphertext domain to obtain complete eigenvalues and eigenvectors of the small matrix after dimensionality reduction.

The method provided by the embodiment adopts two cryptographic techniques: additive secret sharing and function secret sharing, the following describes the cryptographic technique employed in the method provided in this embodiment.

Additive secret sharing

An Additive Secret Sharing (ASS) of privacy value x is denoted as

It has two forms:

1. arithmetic secret sharing:

wherein

<x>₁And<x>₂held by two computing participants, respectively.

2. Boolean secret sharing:

wherein

₁And₂held by two computing participants, respectively.

With the secret sharing described above, two computation participants can perform linear and multiplicative computations securely without obtaining plaintext data.

1) Secure linear computation: linear computation in secret sharing requires only two parties to compute local computations. I.e., if a, β, γ are constants in the plaintext,

and

is a secret shared value, then

Each party can use the ciphertext they hold to perform local computations.

2) Secure multiplication computation: to calculate the product of two secret sharing values requires two parties to make a round of communication. I.e. to calculate

Two parties need to share one multiplication tuple in advance

Then, each party P_iLocal computing<e>_i＝<x>_i-_iAnd<f>_i＝<y>_i-<v>_i. Then each party P_iWill be provided with<e>_iAnd<f>_isent to each other to obtain e and f in the clear. Finally, P_iThe product ciphertext held by i ∈ {0,1} is

<z>_i＝i×e×f+f×_i+e×<v>_i+Kw)_i

Linear and multiply operations in Boolean secret sharing are similar to those in arithmetic sharing, except that an XOR is used

"instead of an addition operation, use" an

"replace multiply operation.

Secret sharing of functions

Function Secret Sharing (FSS) is an extension of additive secret sharing that can accomplish secure function computations with a lower traffic volume. Therefore, FSS has a great performance advantage over ordinary secret sharing in high-latency networks. In general, a two-party FSS-based privacy function, f, consists of the following two abstract algorithms:

1.(k₁,k₂)←Gen(1^λf): given a security parameter lambda and a function description f, two FSS keys k are output₁,k₂One for each computing participant.

2.<f(x)>_i←Eval(k_iX): given an FSS key k_iAnd an evaluation point x for outputting an additive secret share of the evaluation result<f(x)>_i。

The FSS can ensure that if an attacker learns only one of the two FSS keys, he cannot obtain any information about this objective function and the calculation output f (x).

As shown in fig. 1, the method for decomposing the data characteristics of the distributed graph with privacy protection provided by this embodiment includes the steps of:

s100, generating an initial set by target graph nodes in a global graph according to local graph data, wherein the initial set comprises a plurality of groups of triples, and each group of triples comprises a node mark of the target graph node, a node mark of an adjacent graph node of the target graph node and the weight of a connecting edge of the target graph node and the adjacent graph node.

The target graph node can be any node in the global graph, and for the feature decomposition task of the distributed graph data, the local graph data of each user needs to be collected firstly so as to form a complete encrypted adjacency matrix A related to the graph, and therefore, the target graph node can be any node in the global graphAnd performing subsequent feature decomposition of the ciphertext domain. Specifically, adjacent to each row a [ i,:]the local graph data for each user is represented. At this stage each user U_iShare his local graph data a [ i,:]for two cloud servers CS₁And CS₂. In order to obtain a high efficiency, each user U is given access by means of ASS_iEncrypt a [ i,:]. Simply applying this technique by each node on its own local graph data would result in a high overhead, since the distributed graph data is typically sparse, i.e. contiguous to each row a [ i,:]mostly 0 elements and only a small part of the data is valid (e.g. only a small number of phone numbers per user's phone book). In this embodiment a sparse representation of the matrix is used, the basic idea being to process and submit only the (encrypted) values of the non-zero elements and their positions. In particular, each user U_iOnly the positions and weights of the non-zero elements are stored: { (i, j, A [ i, j ]])}. Each element in the set is a matrix triplet: (i, j, A [ i, j)]) Where i represents the node label of the target graph node and j represents the node label of a neighboring node of the target graph node, i.e., each element is the display node (i.e., user) U_iAnd node U_jA side in between, A [ i, j ]]Representing the weight of the edge, the number of elements in the set representing the node U_iDegree of (degree). Then let each user U_iEncryption of weights A [ i, j ] using ASS techniques]. Specifically, an edge weight is given

User generates a random number

Then the weight A [ i, j ]]Respectively of the two shares of the arithmetic cryptogram of<A[i,j]>₁＝A[i,j]-r and<A[i,j]>₂r. Last user U_iThe sum of { (i, j,<A[i,j]>₁) And { (i, j,<A[i,j]>₂) Are respectively sent to a first computing terminal CS₁And a second computing terminal CS₂. However, this is simple because the number of non-zero elements represents the number of edges (i.e., degree), and soThe encryption mode will reveal the degree information of each node to the computing terminal. Based on this information, the existing literature indicates that the computing terminal can infer the user U_iVarious private information. Meanwhile, if the distribution diagram is an unweighted diagram (namely, the elements in the adjacency matrix are 0 or 1), it is meaningless to encrypt only the edges with non-0 weights, because the existence of the edges reveals that the weights of the edges are 1, and further, the computing terminal can obtain the complete diagram adjacency matrix. Therefore, the challenge here is how to protect each user U while using the ternary coding scheme of the sparse matrix_iDegree of (c) information. While not affecting the effectiveness of subsequent feature decomposition.

To solve this challenge, the present embodiment provides a method to find a theoretical balance (trade-off) between the user-degree information and the matrix sparsity. In particular, each user U_iIn { (i, j, A [ i,:]) And adding some false edges (i.e., (i, j,0)) with weight value of 0 at random empty positions in the data, and then simultaneously applying ASS technology to the weight values of the real edges and the false edges for encryption. Since in the ASS technique, even if the same (e.g., 0) value is encrypted multiple times, the indiscriminability of the cipher texts of the cipher can be ensured. Therefore, the method can not only make the cloud server unable to distinguish the real edge from the false edge, but also does not affect the effectiveness of the subsequent security feature decomposition process (because the weight of the false edge is 0). Simultaneously, the user U can be protected_iDegree information (because some false edges are added). There remains a challenge in choosing an appropriate number of false edges to achieve a theoretical balance between sparsity and privacy. Specifically, too many false edges weaken sparsity of the adjacency matrix of the collected ciphertext graph and increase subsequent system overhead, while too few false edges result in poor privacy protection.

In one possible implementation, each user U_iSampling a noise n from a discrete Laplace distribution (definition 2)_iOnly own noisy local data is shared.

The laplacian distribution is one of the most popular noise distributions, which can be defined as:

a discrete random variable obeys a laplacian distribution Lap (e, δ, Δ) when its probability density function satisfies the following equation.

Where μ is the mean of the laplace distribution.

Where Δ is the sensitivity of the function f:

A＝max|f(x)-f(x')|

which can be used to measure how much a single entity's data can change the data output in the worst case.

According to the definition of Laplace distribution, if no setting is made, each user U_iSampling a noise n from a discrete Laplace distribution (definition 2)_iIn this case, the sensitivity Δ of the laplace distribution should be set to Δ ═ d_max-d_minWherein d is_max,d_minRespectively the maximum and minimum of possible nodes in the distributed graph. Then adding n at random position in its local graph data_iA false edge with a weight of 0 (i.e., (i, j, 0)). And finally, simultaneously encrypting the weight values of the real edges and the weight values of the false edges by applying an ASS technology, and sending the ciphertext to each cloud server. Therefore, privacy protection of each node degree can be realized, and the existence of each edge can be protected.

Although this scheme is effective, it will result in a large sensitivity Δ (which will theoretically reach N, i.e. the number of nodes in the graph), which will result in sampled laplacian noise N_iCan be very large, meaning that each user needs to add nearly N false edges, which can severely impact the sparsity of the graph and the performance of subsequent feature decomposition. As shown in fig. 3 for the probability density function of the discrete laplacian distribution for different deltas. The figure reveals a large sensitivityThe degree Δ will make the shape of the density function of the laplacian distribution more uniform. This characteristic indicates a greater sensitivity Δ, user U_iThe greater the probability that a larger noise | n will be selected_iL. Conversely, a small sensitivity Δ (e.g., 50 in fig. 3) will make the probability density function more focused, which means that the user U is_iWill choose a smaller noise | n most probable_iL. the method is used for the preparation of the medicament. Therefore, if all users U_iSampling noise from a laplacian distribution with a greater sensitivity Δ will result in each user adding too many false edges to their local graph data, thereby severely affecting the sparsity of the collected graph data.

In order to obtain better sparsity, in this embodiment, privacy protection based on the idea of "partitioning bucket" is used, that is, each node in the graph is divided into different "buckets" to achieve privacy protection of node degree information in the "bucket", specifically, the node in the graph is divided into several buckets, or a plurality of degree intervals are divided between the maximum value and the minimum value of the degree of the node in the graph, each bucket contains approximately equal number of users, the degree of all nodes in the bucket is within one cell interval, and therefore the cell interval [ d_p,d_q]And (4) the following steps. Thus, all users in the same bucket may use a smaller sensitivity Δ d_q-d_p. In order to implement safe bucket allocation for nodes, the method provided by this embodiment further includes the steps of:

s200, encrypting the degree of the target graph node by the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, sending the first encryption degree information to a first computing terminal, and sending the second encryption degree information to a second computing terminal;

s300, the first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global graph data according to the first encryption degree information and the second encryption degree information sent by the target graph nodes.

The first encryption degree distribution information and the second encryption degree distribution information areThe first computing terminal and the second computing terminal respectively send the first encryption degree distribution information and the second encryption degree distribution information to the node, and the node can decrypt to obtain the barrel mapping, so that the barrel to which the first computing terminal belongs is determined, and the sensitivity of sampling noise is set. In this embodiment, the bucket map is a string of bits, where element 1 shows the boundary of the bucket, e.g., given d _max10, bucket mapping inter 0001000001 shows that the user is divided into two buckets (intervals): users' degree e [1,4 ∈]And users their degree e [5,10 ∈]. How to obtain the bucket mapping in the ciphertext domain calculation is described in detail below.

In order to securely let the first computing terminal CS₁And said second computing terminal CS₂Dividing all possible value ranges of the degrees of the nodes in the global graph into regions to realize barrel division for users, and firstly enabling the first computing terminal and the second computing terminal to estimate the encrypted degree histograms of all the nodes under the condition of not obtaining any node plaintext degree information, namely estimating the number of the nodes corresponding to the possible values of each degree. Specifically, a common degree d is given_iAnd a specific user U_jDegree information d of_jAt d_jAnd when the detection results are both ciphertext, the CS_{1,2}It is necessary to detect whether d_j＝d_i. To achieve this objective, the present embodiment mainly utilizes a Distributed Point Function (DPF) based on FSS. A DPF mechanism f_α,β(x) If x is alpha, otherwise 0 is output.

Similar to the general framework of FSS, an FSS-based two-sided DPF mechanism consists of the following two algorithms:

1.(k₁,k₂)←Gen(1^λα, β): given a safety parameter lambda and alpha, beta, two DPF secret keys k are output₁,k₂Each to cloud server CS₁And CS₂One of them.

2.<f_α,β(x)>_i←Eval(k_iX): given a DPF key k_iAnd an evaluation point x for outputting a secret share of the evaluation result<f_α,β(x)>_i。

The pseudo code of the secure degree histogram estimation algorithm in the present embodiment is shown in fig. 4.

Because all nodes are directly asked to send their degree of encryption d_iThe scheme utilizes a sampling strategy because higher system overhead is caused to a computing terminal. That is, the computing terminal randomly samples S users (denoted as SU) from the entire user population_j}_j∈[1,S]) And the sampled users are allowed to send own encrypted degree information. Before the target graph node encrypts the degree of the target graph node based on the function secret sharing, the method comprises the following steps:

The target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, and the method comprises the following steps:

the target graph node obtains the first encryption degree information and the second encryption degree information output by a first preset algorithm in function secret sharing, wherein the input of the first preset algorithm comprises the degree of the target graph node.

The first and second computing terminals generate first and second encryption degree distribution information of global graph data according to the first and second encryption degree information sent by the plurality of target graph nodes, and the method includes:

In particular, each sampled user SU_jBased on his degree d_jDPF key is generated (Algorithm 3 line 1 in FIG. 4) with DPF parameters α, β set to d, respectively_jAnd 1. Thereafter, each user SU_jSending a secret key k_j,1(i.e. said first encryption degree information) to the first computing terminal CS₁Sending a secret key k_j,2(i.e. said second encryption degree information) to a second computing terminal CS₂. After all sampled users send DPF keys, each cloud server CS_t∈{1,2}For all possible degrees i e [1, d ]_max]Using each key k_j,t}_j∈[1,S]Evaluating Eval (k)_j,tI). Finally these evaluation results are summed (Algorithm 3 line 6), CS_tIt is possible to accurately obtain how many sampled users their degree d, respectively_jIs an encrypted share of this information equal to i, where i ∈ [1, d_max]. The correctness is demonstrated as follows:

through the above steps, the first computing terminal and the second computing terminal respectively hold encrypted degree histogram estimates

Then the first computing terminal and the second computing terminal further generate a bucket map in the ciphertext domain. The pseudo code of the algorithm that generates the bucket map is shown in FIG. 5. Algorithm 4 in FIG. 5, which outputs encrypted bucket mappings

(share encrypted bit string with boolean secret), where element 1 shows the boundary of each bucket. E.g. given a d _max10, inter 0001000001 shows that the user is divided into two buckets: users' degree e [1,4 ∈]And users their degree e [5,10 ∈]. After computing the encrypted bucket map, cloud server CS₁And CS₂Can be combined with

And sending the data to each user, and judging which bucket the user belongs to according to the degree of the user. Is connected withHow to implement Algorithm 4 is described in detail below.

Algorithm 4 (line 1) first lets CS_1,2Calculate the bucket size sizeB of the plaintext (i.e., how many users need to be contained in each bucket), and then initialize an encrypted accumulator

Then CS_1,2Degree histogram estimated in last stage

The accumulator is added one by one (line 4 of Algorithm 4). At the same time, each additional one

Thereafter, CS_1,2In the ciphertext domain, whether to judge

And adds this comparison result (encrypted with a boolean secret share) to the bucket map

(line 5 of Algorithm 4). Specifically, if accu is not less than sizeB, inter [ i ≧ i]One bucket boundary is shown at 1, otherwise inter i]0. Thereafter, based on the result of the comparison of the above encryptions, the CS_1,2Accumulator for judging whether to encrypt in cipher text field

And setting 0. Specifically, if inter [ i ]]1, indicates that a bucket boundary occurs, thus requiring the accumulator [ accum]^A Set 0 and prepare for the next bucket to accumulate. If inter [ i)]If no bucket boundary appears, the accumulation is continued unchanged, namely 0. The above steps are shown as Algorithm 4 at line 6. Wherein "! "means" not "operation, which may let the CS be_1,2One of which flips its secret shared share<inter[i]>₁Or<inter[i]>₂To complete. Finally, CS_1,2Outputting encrypted bucket mappings

In Algorithm 4, the addition operation may be done by a protocol supported by the additive secret sharing itself, but the comparison operation of the ciphertext domains

And are not natively supported. Thus, the present embodiment provides two operations that can be performed in the ciphertext domain

The method of (1). The first method is based on a function secret sharing FSS. It is more suitable for high latency network scenarios because it requires a minimum number of interaction rounds between servers (at the cost of more local computations). The second approach is based on an additive secret shared ASS, which requires less local computation, but requires more online traffic and traffic theory between the two servers, and is therefore more suitable for low latency network scenarios.

In a first method, the first computing terminal obtains a first encryption comparison result according to the first accumulator, and generates a new one-bit value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, and generates a new one-bit value in the second encryption degree distribution information according to the second encryption comparison result, including:

the first computing terminal inputs the permutation input value and the first secret key to a fourth preset algorithm of function secret sharing to obtain the first encryption bit, and the second computing terminal inputs the permutation input value and the second secret key to the fourth preset algorithm of function secret sharing to obtain the second encryption bit, wherein when the sum of the first accumulator and the second accumulator is smaller than the number of target nodes, the exclusive-or gate operation result of the first encryption bit and the second encryption bit is 1, and when the sum of the first accumulator and the second accumulator is not smaller than the number of target nodes, the exclusive-or gate operation result of the first encryption bit and the second encryption bit is 0;

Specifically, the first method mainly uses a distributed comparison function (hereinafter referred to as DCF) in the function secret sharing to implement the comparison operation of the ciphertext domain. One DCFg_α,β(x) Output β if input x<α, otherwise 0 is output. Similar to the general framework of FSS, an FSS-based two-party DCF mechanism consists of the following two algorithms:

1.(k₁,k₂,r₁,r₂)←Gen(1^λα α, β): given a security parameter lambda and alpha, beta, two DCF keys k are output₁,k₂And two random numbers r₁,r₂(wherein r is₁+r₂＝rⁱⁿ) Each to one of the two parties.

2.<g_α,β(x)>_i←Eval(k_i,x+rⁱⁿ): given a DCF key k_iAnd a scrambled (masked) input x + rⁱⁿOutputting a secret share of the evaluation result<g_α,β(x)>_i。

The security evaluation process of the DCF function only needs one round of online communication, namely the computing terminal CS₁And CS₂Sending<x>_i+r_iI ∈ {1,2} to each other, to expose the scrambled (masked) input x + rⁱⁿWithout compromising the privacy of the encrypted input x. Next, how to accomplish based on DCF function is described

And (5) operating.

To accomplish

Setting relevant parameters as alpha ═ sizeB, beta ═ 1 and output domain as

The generated DCF keys can be sent to the cloud server CS respectively₁And CS₂. Note that in an actual working scenario, this offline key preparation may be done by a third party server. After obtaining the relevant DCF key, the CS_t∈{1,2}First of all, exchange<accu>_t+r_tTo disclose accu + rⁱⁿ. Thereafter, they each evaluate Eval (k) locally_t,accu+rⁱⁿ) The evaluation will output<1>_tIf accu<sizeB, otherwise output<0>_t. Since the requirement of CS in Algorithm 4 is_1,2Output [1 ]]^BIf accu ≧ sizeB, CS_1,2One of the parties needs to flip his evaluation result locally to take the "not" of the evaluation result.

In a second method, the first computing terminal obtains a first encryption comparison result according to the first accumulator, and generates a new one-bit value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, and generates a new one-bit value in the second encryption degree distribution information according to the second encryption comparison result, including:

the first computing terminal takes the most significant bit of the first bit data as a new bit value in the first encryption degree distribution information, the second computing terminal turns over the most significant bit of the second bit data to take the most significant bit of the second bit data as a new bit value in the second encryption degree distribution information, or the first computing terminal turns over the most significant bit of the first bit data to take the most significant bit of the first bit data as a new bit value in the first encryption degree distribution information, and the second computing terminal takes the most significant bit of the second bit data as a new bit value in the second encryption degree distribution information.

The second approach is based on "bit decomposition" of the secret shared domain. Specifically, a most significant bit (msb) of the complement of x may represent the positive and negative attributes of x (i.e., x ≧ 0 if msb ≧ 0, otherwise msb ≧ 1). Given two numbers a and B represented by complementary codes, which may be represented as two ciphertexts of a number sharing shares, are respectively computed by the terminal CS₁And CS₂And (4) holding. The most significant bits of a + B can be safely computed by a custom parallel prefix addition circuit. A custom 8-bit parallel prefix addition circuit is shown in fig. 6.

Given a computed terminal CS₁And CS₂Respectively held cipher texts

Cloud server CS₁And CS₂May first be decomposed locally<x>₁And<x>₂for bit data:<x>_i＝x_i[1],…,x_i[k]i is e {1,2 }. Then calculates the terminal CS₁And CS₂The bit held by itself is input to a customized parallel prefix addition circuit to safely execute an' exclusive-OR gate

AND gate

And calculating. As described above, the xor

And

"is natively supported in boolean secret sharing. So cloud server CS₁And CS₂Can be safely usedCalculating the most significant bit of a ciphertext data to obtain a private input

And a magnitude relation of 0. Based on parallel prefix addition circuit, cloud server CS₁And CS₂Can calculate

Namely the output

If accu is greater than or equal to sizeB, otherwise

CS is required in Algorithm 4_1,2Output of

If accu ≧ sizeB, CS_1,2One of the parties needs to flip his evaluation result locally to take the "not" of the evaluation result.

As can be seen from the foregoing description, for each bit in the bucket map, there are two cases: to be 0 or 1, when it is 1, the accumulator needs to be cleared, when it is 0, the accumulator does not need to be cleared, and in order to achieve clearing or not clearing of the accumulator in the ciphertext domain, that is, the first accumulator and the second accumulator are updated respectively, but the first computing terminal and the second computing terminal cannot infer whether the sum of the first accumulator and the second accumulator is cleared, the method provided in this embodiment is that, after obtaining the first encryption degree distribution information and the second encryption degree distribution information, one of the two computing terminals flips the latest bit of the locally held encryption degree distribution information, for example, the first computing terminal flips the latest bit value in the first encryption degree distribution information to obtain a flipped bit, and if the latest bit in the bucket map is 1, the sum of the flipped bit and the second encryption degree distribution information (xor gate operation result) is 0, if the sum (exclusive or gate operation result) of the inversion bit and the second encryption degree distribution information is 1 when the latest bit in the bucket map is 0, and the first computing terminal and the second computing terminal compute the product of the first value and the second value in the ciphertext domain, the sum of the first accumulator and the second accumulator can be emptied or not emptied based on the latest bit of the bucket map in the case of ciphertext.

Referring to fig. 1 again, the method for decomposing the data characteristics of the distributed graph with privacy protection provided in the embodiment further includes the following steps:

s400, the target graph node determines a target interval of the degree of the target graph node according to the received first encryption degree distribution information and the second encryption degree distribution information, determines a target sampling sensitivity according to boundary information of the target interval, samples noise from Laplace distribution according to the target sampling sensitivity, adds a false triple in the target combination according to the noise, and generates a target set, wherein the weight value of the false triple is 0;

s500, the target graph node encrypts the target set based on additive secret sharing to obtain a first encryption set and a second encryption set, sends the first encryption set to a first computing terminal, and sends the second encryption set to a second computing terminal.

The sum of the first encryption degree distribution information and the second encryption degree distribution information which are locally held by the first computing terminal and the second computing terminal respectively is the barrel mapping, the first computing terminal and the second computing terminal send the locally held encryption degree distribution information to nodes in the graph, then the nodes in the graph are decrypted to obtain the barrel mapping of the plaintext, and then local graph data held by the first computing terminal and the second computing terminal are encrypted based on the barrel mapping. Specifically, as shown in FIG. 7, after the encrypted bucket map is decrypted at the user end, each user U_iIts partial graph data is encrypted with Algorithm 5. The main point to be noted here is the noise n sampled from the laplacian distribution_iMay be negative, which means that the user U is_iSome edges need to be deleted. It is obvious thatThis will seriously affect the accuracy of the subsequent feature decomposition. To solve this problem, in the present embodiment, each user U_iTruncation of n_i(i.e., line 4 of Algorithm 5). Then we use

Representing a set of real and false edges,

representing a user U_iLocal graph data after the dummy edge with weight 0 is added. Last user U_iThe weight A [ i, j ] at each (real or false) edge]Obtaining final partial graph data ciphertext by applying ASS

And shares of ciphertext

And

are sent to CS respectively₁And CS₂。

Compared with each element in the local graph data a [ i ], ] directly encrypted by each user, the scheme for encrypting the local graph data provided by the embodiment can save 90% of ciphertext storage space, and can save 80% of online communication and 50% of calculation time in subsequent feature decomposition.

S600, the first computing terminal and the second computing terminal perform feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph.

And the first computing terminal receives the first encryption set sent by each node in the global graph, the second computing terminal receives the second encryption set sent by each node in the global graph, and feature decomposition is carried out on global graph data on a ciphertext domain. The following first explains the graph data feature decomposition process in the clear text:

given an N by N adjacency matrix, the complexity at which a complete eigen decomposition is performed is N³. This is unnecessary because most graph analysis applications only require the eigenvalues and eigenvectors of top-k (k is much smaller than N). Thus, in a practical application scenario, given a large-scale adjacency matrix a, to compute its eigenvalues and eigenvectors of top-k, the first step is to reduce its dimension from N × N to M × M (M is usually slightly larger than k), resulting in a new small matrix a

And carrying out further treatment. The most popular dimension reduction algorithms are the Arnoldi Algorithm (Algorithm 1, pseudocode as shown in FIG. 8) and the Lanzcos Algorithm (Algorithm 2, pseudocode as shown in FIG. 9), which act on asymmetric and symmetric matrices, respectively. After dimensionality reduction, a new small matrix is computed, typically using QR algorithms

All eigenvalues and eigenvectors. Finally, the small matrix

The characteristic value and the characteristic vector of top-k of the original matrix A are represented; and then

Feature vector corresponding to feature value of top-k

(each column vector of the matrix is

A feature vector of) can be expressed by formula

Converting the characteristic vector V corresponding to the characteristic value of top-k of the original matrix A, wherein the matrix P is formed by AlgoLine 11 in rithm 1 and Algorithm 2. The clear QR algorithm is described below.

The QR algorithm proceeds in an iterative manner. Formally, given an objective matrix L, let T₀L, at the kth iteration (K e 1, K)]) In the method, the calculation result T of the previous iteration is input_k-1Once QR decomposition T can be calculated_k-1＝Q_k-1R_k-1Wherein Q is_k-1Is an orthogonal matrix, R_k-1Is a Shanghaineberg matrix and outputs T_k＝R_k-1Q_k-1. When the QR algorithm is finished, the output matrix T of the last iteration_KThe diagonal elements of (a) are the eigenvalues of the target matrix L, the matrix S ═ Q₁...Q_KAre all eigenvectors (one for each column) of the target matrix L. One QR decomposition may be done using Givens rotation. Formally, given an M x M shanghai senberg matrix T_k-1An orthogonal Givens rotation matrix G may be created_i,i∈[1,M-1]。

Wherein,

and is provided with

H(1)＝T_k-1. At the end of this QR decomposition,

in addition, Q_k-1＝G₁...G_M-1. Figure 11 shows the process of performing a QR decomposition on a 4 x 4 shanghaneberg matrix using a series of Givens rotation matrices.

The first computing terminal and the second computing terminal perform feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph, and the feature decomposition includes:

the first computing terminal and the second computing terminal perform dimension reduction on the sum of the first encryption adjacent matrix and the second encryption adjacent matrix based on additive secret sharing to obtain a dimension reduction matrix;

the second calculation formula is:

wherein, y'_nAnd the calculation result of the reciprocal of the root in the nth iteration calculation is shown, and x' represents the number of the root to be extracted.

The procedure of performing the dimensionality reduction operation in the ciphertext domain by the first computing terminal and the second computing terminal is described below by taking the Arnoldi algorithm as an example. Observing the operations of lines 1-7 in fig. 8, both composed of additions and multiplications, is naturally supported in the additive secret sharing domain. However, how to perform the operations of

lines

8 and 9 in the ciphertext domain is challenging because they require square root operations (L) respectively₂Norm requiring an open square root) and division operations.

In this embodiment, the method of approximate computation is used to decompose the square root operation and the division operation into a series of additions supported in the ciphertext domainAnd multiplication operations. In particular, to compute square roots in the ciphertext domain

First, the reciprocal of the square root is approximated

Namely, it is

Wherein, y'_nRepresenting the result of the calculation of the reciprocal of the root for the nth iteration, x' representing the number of roots to be prescribed, which converges the iteration to

Clearly, both subtraction and multiplication are supported natively in the secret shared domain. In addition, to obtain a faster convergence rate, an initial value may be used

y'₀＝3e^0.5-x'+0.003。

Can then calculate

To obtain

Division for ciphertext domains

The main challenge is to calculate the reciprocal

However, the reciprocal in the division operation of Algorithm 1 (line 9) is

Which is the inverse of the calculation (square root) of the eighth line. Therefore, the calculation result of the reciprocal of the square root can be directly obtained

As

Simply multiplied by p_kLine 9 may be completed in the ciphertext domain. Up to this point, algorithmm 1 may be performed in its entirety in the ciphertext domain. The Arnoldi algorithm for a specific ciphertext domain may be as shown in FIG. 10. For other security calculation methods of dimension reduction algorithm, such as Lanczos method, the operations that need to be performed safely are the same as the security Arnoldi method, and the description of the algorithm of the security Lanczos method is omitted here.

The QR algorithm is mainly composed of matrix multiplications, and for the multiplication between two M x M matrices, the direct method in the secret shared domain is element-by-element multiplication, which requires M³Secondary multiplication and requires 2M communication between two servers³In this embodiment, the vectorized multiplicative tuple is used to implement more efficient multiplication between matrices, thereby saving traffic and computation. That is, the multiplication tuple needed for multiplication between two matrices can be vectorized as Z ═ XY, where X and Y play a role in masking (masked) the two input matrices during the secure multiplication process. Specifically, for two ciphertext matrices of size N × N

And

if one wants to calculate the ciphertext matrix product between them

Existing secret sharing protocols configure a pair of multiplication tuples for each multiplication in a matrix instead of each element (i.e., each multiplication tuple is a binary set of elements in a matrix, which is a set of elements in a matrix, which are each a function of a multiplier

) Therefore, it is necessary to prepare 3N in advance³Element (i.e. multiplication of two matrices of size N x N requires N³Sub-multiplication, each multiplication requiring a multiplication tuple)This approach is not efficient and unnecessary. The multiplicative tuple vectorization adopted in the embodiment means that independent multiplicative tuples are not directly generated randomly, but a multiplicative tuple matrix is generated randomly

After that, ciphertext matrix multiplication can be directly performed. Two cloud servers P_i∈{0,1}Matrix operation using secret shares of its own<E>_i＝<A>_i-_iAnd<F>_i＝_i-<V>_i. Then each party P_iWill be provided with<E>_iAnd<F>_isent to E and F, which obtain the plaintext from each other. Finally, P_iThe product ciphertext held by i ∈ {0,1} is<C>_i＝i×E×F+F×_i+E×<V>_i+<E>_i. It can be found that the two parties only need to communicate on line in the process of 2N²One element, i.e. two matrices each sent to the other<E>_iAnd<F>_i. At the same time, the multiplication tuples needing to be prepared in advance also become

I.e. 3N²And (4) each element.

The pseudo code of the QR algorithm secured in this embodiment is shown in fig. 12. Input of it

Output of

And

reviewing the decomposition of the features of the plaintext as described above,

top-k (the largest k) of the diagonal elements of (a) are the original matrices

Characteristic value of top-k. While

Is a small matrix

By a formula

Which can be converted into the original matrix

A feature vector of (1), wherein

Is the output matrix of the secure dimension reduction algorithm.

Further, the inventors have also found that at each Givens rotation

Or H (i) G_iOnly row i and row i +1 of h (i) are updated. Thus, to save overhead, the Givens rotation matrix G may be simplified_iFrom the formula (1) to

Figure 13 shows the completion of one QR decomposition with an optimized 4 x 4 Givens rotation matrix. It can be seen that a significant number of multiplications can be saved compared to fig. 11. Similarly, in the calculation

(i.e., Algorithm 7, line 15), G may also be substituted_iSimplified to g_i. After simplification, the Givens rotation matrix is multiplied in a form of traversal, as shown in fig. 13, the Givens rotation matrix g₁Multiply first four elements of the upper left 2 x 2 of the H matrix, H [1:2,1:2]Then multiply by a group of elements, i.e. H1: 2,2:3]、H[1:2,3:4]. Then update givens rotationMatrix, get g₂Then proceed to the next row, i.e. H2: 3,1:2]、[2:3,2:3]、H[2:3,3:4]And so on (see the shaded portion in fig. 13).

It should be noted that the 4 × 4H matrix is an example, and in practical applications, H may be of any dimension, and may be updated by multiplying by a 2 × 2 Givens matrix in the manner described above.

After the above simplification, it can be seen that the simplified Givens rotation matrix

Multiple multiplications with two rows of elements in the large matrix H are required, i.e. repeated use in multiple multiplications. Thus, coupled multiplicative tuples may be used to save traffic between two computing terminals. For example, assume that a ciphertext matrix needs to be transformed

Multiplying by a ciphertext matrix

Only one random matrix need be used

Unmasking (mask)

Without the use of k different random matrices. Therefore, when multiple multiplications need to be performed by the same multiplier

As a multiplier, only one random matrix needs to be used to mask it in the present invention. Finally, the process is carried out in a closed loop,

is transposed matrix of

Can let the computing terminal CS₁And CS₂Locally sharing shares of own secret<g_i>₁Or<g_i>₂And the transposition is carried out, so that the communication overhead is further saved.

After the optimization, the performance of the secure QR algorithm provided in the present embodiment can be greatly improved. In particular, the underlying secure QR algorithm requires the computing terminal CS₁And CS₂Online communication 6K (M-1) M²One element (ignoring the traffic needed to approximate the square root because it was not optimized; K and M correspond to K and M in Algorithm 7, respectively), while the optimized secure QR Algorithm only needs to communicate K (M-1) (6M +4) elements online. According to experiments, the optimized safe QR algorithm can save up to 97% of on-line communication and 9.3% of calculation time compared with the basic safe QR algorithm.

To sum up, this embodiment provides a method for decomposing characteristics of distributed graph data with privacy protection, in which a graph node holding local graph data encrypts its own degree information and sends the encrypted degree information to a first computing terminal and a second computing terminal, where the first computing terminal and the second computing terminal cooperatively compute in a ciphertext domain to generate first encryption degree distribution information and second encryption degree distribution information, so that the graph node can determine a target interval to which its degree belongs, and further select a suitable sampling sensitivity sampling noise, add a false edge with a weight of 0 to a real graph adjacent matrix, and realize sparse representation of the matrix in the form of a triple matrix, and perform encrypted characteristic decomposition on the adjacent matrix to which the false edge is added, so as to achieve keeping sparsity of the graph data and ensuring validity of characteristic decomposition on the premise of protecting node privacy, and the distributed special data characteristic decomposition with privacy protection is realized.

It should be understood that, although the steps in the flowcharts shown in the figures of the present specification are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the flowchart may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Example two

Based on the above embodiment, the present invention also provides a distributed graph data feature decomposition system for privacy protection, where the system includes a target graph node, a first computing terminal, and a second computing terminal; the target graph node, the first computing terminal and the second computing terminal are used for cooperatively executing relevant steps in the privacy-protected distributed graph data feature decomposition method in the first embodiment.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A privacy preserving distributed graph data feature decomposition method, the method comprising:

and the first computing terminal and the second computing terminal perform feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph.

2. The privacy-preserving distributed graph data feature decomposition method according to claim 1, wherein before the target graph node encrypts the degree of the target graph node based on function secret sharing, the method comprises:

3. The privacy-preserving distributed graph data feature decomposition method according to claim 1, wherein the encrypting, by the target graph node, the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information comprises:

4. The privacy-preserving distributed graph data feature decomposition method according to claim 3, wherein the first and second computing terminals generate first and second encryption degree distribution information of global graph data according to the first and second encryption degree information sent by the plurality of target graph nodes, and the method comprises:

5. The privacy-preserving distributed graph data characteristic decomposition method according to claim 4, wherein each bit value in the first encryption degree distribution information and the second encryption degree distribution information is 0 or 1; the first computing terminal and the second computing terminal determine the first encryption degree distribution information and the second encryption degree distribution information according to the first encryption degree histogram information and the second encryption degree histogram information, and the method comprises the following steps:

the first computing terminal sequentially adds each piece of first encryption graph node quantity information in the first encryption degree histogram information into a first accumulator according to the size sequence of the corresponding target scale, and the second computing terminal sequentially adds each piece of second encryption graph node quantity information in the second encryption degree histogram information into a second accumulator according to the size sequence of the corresponding target scale;

the first computing terminal updates the value of the first accumulator to the first secret share, adds the next first encryption graph node number information to the first accumulator, and the second computing terminal updates the value of the second accumulator to the second secret share, and adds the next second encryption graph node number information to the second accumulator.

6. The method as claimed in claim 5, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new one-bit value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, and generates a new one-bit value in the second encryption degree distribution information according to the second encryption comparison result, including:

7. The privacy-preserving distributed graph data feature decomposition method according to claim 5, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator and generates a new one-bit value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator and generates a new one-bit value in the second encryption degree distribution information according to the second encryption comparison result, including:

8. The privacy-preserving distributed graph data feature decomposition method according to claim 1, wherein the feature decomposition of the global graph data by the first computing terminal and the second computing terminal according to the first encryption set and the second encryption set corresponding to each node in the global graph comprises:

the second calculation formula is:

9. The privacy-preserving distributed graph data feature decomposition method according to claim 8, wherein the first computing terminal and the second computing terminal perform a QR algorithm on the reduced-dimension matrix based on an additive secret, comprising:

10. A distributed graph data feature decomposition system with privacy protection is characterized by comprising a target graph node, a first computing terminal and a second computing terminal; the target graph node, the first computing terminal, and the second computing terminal cooperate to perform the privacy-preserving distributed graph data feature decomposition method of any one of claims 1-9.