CN114513337A - Privacy protection link prediction method and system based on mail data - Google Patents
Privacy protection link prediction method and system based on mail data Download PDFInfo
- Publication number
- CN114513337A CN114513337A CN202210066876.0A CN202210066876A CN114513337A CN 114513337 A CN114513337 A CN 114513337A CN 202210066876 A CN202210066876 A CN 202210066876A CN 114513337 A CN114513337 A CN 114513337A
- Authority
- CN
- China
- Prior art keywords
- data
- relationship
- relation
- sensitive
- entities
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0407—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a privacy protection link prediction method and a system based on mail data, wherein the method comprises the following steps: constructing a figure relation knowledge graph by using the mail data; training a distribution of training data for learning of a generative model using the generative confrontation network; reconstructing the multivariate relational data so as to confuse sensitive and non-sensitive relational information implied in the data; and the relationship between the entities is complemented by the reconstructed multivariate relational data, so that the sensitive relationship between the entities is protected while the non-sensitive relationship between the entities is complemented. The invention also provides a privacy protection link prediction system based on the mail data to realize the method. The invention completes the relationship between the entities by using the reconstructed multivariate relationship data, achieves the purpose of completing the non-sensitive relationship between the entities and protecting the sensitive relationship between the entities, and solves the technical problem that the social relationship of the personnel under the mail system can not be protected in the prior link prediction technology.
Description
Technical Field
The invention relates to the technical field of counterwork learning, graph network representation learning, knowledge maps and link prediction, in particular to a privacy protection link prediction method and system based on mail data.
Background
Mail is one of the important information communication modes in modern society as one of the applications of the internet. The mail data records the contents of human communication, including important information such as communication relation, communication time, communication frequency, and the like. By simple entity relation extraction and data mining, a plurality of knowledge maps can be established for one mail data. Such as exemplified by a campus student mail system: a communication relationship map can be established for the communication relationship view, and an online login behavior map can be established for the online device login view. For such a graph, where nodes correspond to entities and edges correspond to relationships, we represent that each such triple represents an entity and that such a relationship exists between entities.
In recent years, the study of knowledge maps has been greatly advanced. However, the incompleteness of the knowledge graph affects its application to some extent. To address this problem, a series of knowledge graph embedding models are proposed. Where the model may generate embedded representations of entities and relationships and may be used for link prediction, i.e., predicting relationships between existing entities. This approach creates some problems. Any attacker can use the generated embedding to carry out link prediction, and accurate relationships between entities can be obtained. However, some of these relationships may be sensitive information that we do not want to obtain by others. Therefore, we cannot use embedding directly, but need to do some processing to achieve privacy protection, where we treat these relationships as sensitive information.
The existing privacy protection technologies are mainly classified into the following categories. The first type is differential privacy, which is achieved mainly by adding noise to the original data or parameters or results. The common laplacian mechanism and exponential mechanism cause high practical loss when realizing differential privacy. Based on this situation, xu et al proposed a matrix factorization based differential privacy network embedding method that introduces enough noise to guarantee privacy, but is not suitable for link prediction. Kearns et al propose a model to protect some nodes, but this is not applicable to link prediction scenarios. Abir De et al introduced a ranking algorithm that monotonically transformed the base scores of the non-private link prediction system, and then added noise that more effectively weighed privacy and prediction performance. Javier et al propose a method of adding or deleting items to minimize privacy risks. Privacy protection may be achieved by deleting or adding specific edges, but this may affect the prediction of the remaining non-sensitive relationships. In addition, simple deletion of sensitive information is also vulnerable to inference attacks. The second type is encryption technology. The encryption-based privacy protection scheme achieves privacy protection through advanced encryption techniques. Classical encryption techniques include homomorphic encryption and secure multiparty computation, among others. They can effectively achieve privacy protection, but the computational load is always high. The last category is GAN, which is embedded by generating an antagonistic network training. Li kaiyang et al propose that this is a graph confrontation training framework that integrates privacy stripping and clearing mechanisms to avoid inference attacks. Wherein the countermeasure self-encoding (AAE) employs a generative countermeasure network (GAN) to make varying inferences forcing the posterior distribution of the covert code to a specified prior distribution such that supervised separation capability can protect privacy. However, GAN training still has some problems, such as unstable training.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a link prediction method and a system for privacy protection based on mail data, aims to solve the technical problem that the social relationship of people under a mail system cannot be protected in the prior link prediction technology, ensures the diversity of generated samples, and has better privacy protection and smaller calculation amount than the encryption technology in the aspect of prediction of non-sensitive relationship.
The purpose of the invention is realized by the following technical scheme:
a privacy-preserving link prediction method based on mail data comprises the following steps:
the method comprises the following steps: preprocessing the mail data, mining implicit relations in the mails, and constructing a figure relation knowledge graph based on the mail data;
step two: encoding entities and implicit relations in the human relationship knowledge graph by using an energy-based learning entity low-dimensional embedding model to obtain embedding space and embedding data with one-to-one relations among different entities;
step three: training by using the generated countermeasure network and using the coded embedded data to obtain a generated model, and simulating an embedded space by using the model;
step four: a gradient descent reconstruction method is used for confusing the sensitive relation and the non-sensitive relation implied in the original data, and the distribution structure of the embedding space is finely adjusted;
step five: and performing reasoning prediction based on the character relationship of the mail system based on the data of the finely adjusted embedding space.
Specifically, the first step specifically comprises:
s101, aiming at a college student mail system data set, selecting a student communication relation which is most closely related to personnel, and establishing a communication relation knowledge map;
s102, dividing a college student mail system network into an intra-domain communication network and an extra-domain communication network;
s103, defining the communication relation knowledge graph as a (h, l, t) triple, wherein the communication relation l is divided into two groups of relations which are respectively known relations loAnd unknown relationships l that need to be de-predicteduAnd l isu∈lo;
S104, converting the known relation loFurther divided into sensitive relationships in intra-domain networksAnd non-sensitive relationships in out-of-domain communication networksAnd is provided with
Specifically, the second step specifically comprises:
s201, generating a real Gaussian distribution, and randomly sampling and initializing entities and relations of original mail data;
s202, carrying out normalization processing on the vectors of the entities and the relations in each iteration;
s203, selecting a fixed amount of data as positive samples S each timebatchIs represented by (h, l)oT) and for each positive sample, then replace its head and tail entities as a negative sample S'batchIs represented by (h', l)o,t’);
S204, updating the entity and the relation vector by using a random gradient descent algorithm according to the following loss functions:
wherein, [ x ]]+Represents taking [0, x]Maximum value of (1), γ>0 is a boundary over-parameter, which acts as an interval correction before a positive and negative sample; d (x, y) is a distance function, d (x, y) being (x-y)2。
Specifically, the process of obtaining a generated model by training in the third step specifically includes:
s301, sampling a random noise Z from Gaussian distribution;
s302, using a neural network comprising two fully-connected layers and a normalization layer as a generator model G (), and adopting Wasserstein loss and link prediction loss, wherein the link prediction loss is expressed as ranking loss based on margin and is represented as follows:
wherein the content of the first and second substances,is not sensitiveThe relationship of the three-element group,a sensitive relationship triplet; gamma ray>0 is a boundary hyperparameter, d (x, y) represents the Euclidean distance between two vectors;
the Wasserstein loss was calculated as follows:
wherein, ynDenotes a non-sensitive label, ysThe loss of the entire generative model for the sensitized tag is shown as follows:
LG=L2+λLDist
wherein, λ is a hyper-parameter for adjusting the weight of a single loss function;
s303, using two full-connection layer networks with LeakyReLU active layers as a discriminator model D (), using the second full-connection layer as a classifier to distinguish the authenticity of input data, and using Wasserstein loss; penalizing L with a gradientGPTo enforce the lipschitz constraint, the discriminator model is penalized if the gradient norm deviates from its target norm value of 1, and therefore the penalty function of the discriminator model is given by:
and S304, alternately training the generator model and the discriminator model.
Specifically, the step four specifically includes the following substeps:
S402, the original data set is processed according to the relation loIs divided into a plurality of sets of data Xl,XlTransE coding representing correspondence of relation l;
S403, for any group of data sets X containing the relation llUsing the trained generator model as a reconstructed neural network, and reconstructing the coding Z of the relational data using the following loss function:
wherein G is(Z)Is the output of the generator model with input Z (Z ∈ Z); alpha is alpha>0 is boundary over-parameter, and the function of the boundary over-parameter is equivalent to interval correction of sensitive relation reconstruction coding and normal coding;
s404, the initial embedding is reconstructed by using the gradient descent algorithm for L timesThe reconstruction process is calculated according to the following formula:
s405, randomly initializing R z, and sampling to different local minimum values to improve robustness of a reconstructed model, wherein z is*Is found by minimizing the following equation:
finally using the reconstructed dataAnd embedding as a final relation, and predicting the subsequent personnel relation.
A privacy protection link prediction system based on mail data realized by the privacy protection link prediction method based on mail data comprises
The data preprocessing module is used for constructing a knowledge graph according to original mail data to form strict mathematical definition and a target;
the entity relationship low-dimensional embedding module is used for learning the low-dimensional embedding of the entities and the relationships in the knowledge graph;
the generator training module comprises a generator G and a discriminator D, and input data are real embedded data and random sampling noise Z which obeys Gaussian distribution;
a data reconstruction module for reconstructing the embedded data X processed by the entity relationship low-dimensional embedding module to obtain a new entity and relationship low-dimensional embedded G (z)*)。
A link prediction module for embedding G (z) according to a low dimension*) The relationship of people in the mail network is predicted.
The invention has the beneficial effects that:
1. the invention uses the reconstructed multivariate relational data to complement the relationship between the entities, achieves the purpose of complementing the non-sensitive relationship between the entities and protecting the sensitive relationship between the entities, and solves the technical problem that the social relationship of the personnel under the mail system can not be protected in the prior link prediction technology.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a frame diagram of the present model;
FIG. 3 is a schematic diagram of a neural network architecture used in generating a countermeasure network;
fig. 4 is a functional block diagram of the system of the present invention.
Detailed Description
The following detailed description will be selected to more clearly understand the technical features, objects and advantages of the present invention. It should be understood that the embodiments described are illustrative of some, but not all embodiments of the invention, and are not to be construed as limiting the scope of the invention. All other embodiments that can be obtained by a person skilled in the art based on the embodiments of the present invention without any inventive step are within the scope of the present invention.
The first embodiment is as follows:
in this embodiment, as shown in fig. 1, a method for predicting a privacy-preserving link based on mail data includes: preprocessing the mail data, mining implicit relations in the mails, and constructing a figure relation knowledge graph based on the mail data; encoding entities and implicit relations in the human relationship knowledge graph by using an energy-based learning entity low-dimensional embedding model to obtain embedding space and embedding data with one-to-one relations among different entities; training by using the generated countermeasure network and using the coded embedded data to obtain a generated model, and simulating an embedded space by using the model; a gradient descent reconstruction method is used for confusing the sensitive relation and the non-sensitive relation implied in the original data, and the distribution structure of the embedding space is finely adjusted; and performing reasoning prediction based on the character relationship of the mail system based on the data of the finely adjusted embedding space.
In this embodiment, as shown in fig. 2, a frame diagram of a privacy protection link prediction model based on mail data is designed, and prediction of privacy protection links is performed by using the model, which includes the following steps:
preprocessing mail data, and extracting a relation from original data to construct a knowledge graph;
step (2), the entity and the relation in the multi-relation data are coded by using TransE, so that the obtained representation has good performance on the downstream link prediction tasks (sensitive relation and non-sensitive relation);
step (3), training a generation model with good generation capacity by using a generation countermeasure network;
step (4), reconstructing the coded representation of the data by using the generating model and combining the sensitive relation data and the non-sensitive relation data;
and (5) predicting the personnel relationship based on the mail network by using the new coded representation.
In the data set preprocessing, taking a data set of a student mail system of a college as an example, the specific implementation steps are as follows:
a) from different perspectives, different knowledge maps can be established for the mail system: such as online login behavior maps, mailbox common time interval maps and the like. Selecting the student communication relationship with the closest relationship with personnel, and establishing a communication relationship knowledge graph;
b) the student mail system network is divided into two parts: communication network inside students — intra-domain communication networks, such as: communication between kenou, fellow, classmates, etc.; communication between inside and outside of students-communication systems outside of the domain, such as communication between students and instructors, students and teachers, students and instructors;
c) the knowledge-graph is defined as (h, l, t) triplets, where the relationship l is divided into two groups, the currently known relationship lo(Buddha, fellow, instructor, lovers … …) and unknown relationships l that need to be predictedu(Buddha, fellow, mentor … …), where lu∈lo;
d) Will know the relation loFurther divided into sensitive relationships in intra-domain networks(Buddha, fellow, lovers, etc.) and non-sensitive relationships in extraterritorial communication networks(student to instructor relationship, student to professor relationship, student to instructor relationship, etc.), herePrivacy preserving link prediction is based on a known relationship triplet (h, l)oT) to predict unknown relationship triplets (h, l)uT), and ifMaking the probability of the prediction as small as possible and vice versa;
in the process of encoding the entities and the relations in the original data by using TransE, the specific implementation steps are as follows:
a) generating a real Gaussian distribution, and carrying out random sampling to initialize the entity and the relation of the original data;
b) normalizing the vectors of the entities and the relations in each iteration;
c) each time, a fixed amount of data is selected as a positive sample SbatchIs represented by (h, l)oT) and for each positive sample, then replace its head and tail entities as a negative sample S'batchIs represented by (h', l)o,t’);
d) The entity and relationship vectors are updated using a stochastic gradient descent algorithm with the following loss functions:
here, [ x ]]+Represents taking [0, x]Maximum value of (1), γ>0 is a boundary hyperparameter which acts as a correction of the interval between a positive and a negative sample, the larger γ the larger the interval between two samples which has been corrected, the more stringent the correction for the code vector, d (x, y) is a distance function, usually chosen as l2Norm, i.e.:
d(x,y)=(x-y)2 (2)
in the process of training a generative model with good generative capacity by using a generative confrontation network, the invention adopts the following algorithm:
the specific implementation steps of the process are as follows:
a) sampling a random noise Z from a gaussian distribution;
b) as shown in fig. 3, a neural network structure comprising two fully-connected layers and one normalization layer is used as a generator model G (), and to avoid mode collapse and increase diversity, we adopt Wasserstein loss plus link prediction loss, which is expressed as margin-based ranking loss, as follows:
here, the first and second liquid crystal display panels are,in the case of a non-sensitive relationship triplet,a sensitive relationship triplet. Gamma ray>0 is a boundary hyperparameter, d (x, y) represents the euclidean distance between the two vectors;
the Wasserstein loss was calculated as follows:
wherein y isnAnd ysRepresent non-sensitized tags and sensitized tags, respectively, so the overall generative model penalty is as follows:
LG=L2+λLDist (5)
wherein lambda is a hyper-parameter for adjusting the weight of a single loss function;
c) two fully-connected layers with LeakyReLU active layers are used as a discriminator model D (), and the second fully-connected layer is used as a classifier for distinguishing input data as real data and false data, and Wasserstein loss is used. To stabilize the training process and eliminate pattern collapse, we also employ a gradient penalty LGPTo strengthen the liphowstz constraint. The model is penalized if the gradient norm deviates from its target norm value of 1, so the penalty function for the discriminator model is as follows:
d) alternately training a generator model and a discriminator model;
in reconstructing the encoded representation of the data using the generative model and combining the sensitive relationship data and the non-sensitive relationship data, the present invention employs the following algorithm:
the specific steps of the process comprise:
b) The original data set is expressed by the relation loIs divided into a plurality of sets of data XlE.g. XLovers' electric heating device、XTeachers and students、XBuddha's friendEtc. XlRepresenting the TransE code corresponding to the relation l;
c) for any set of data set X containing relation llUsing the trained generator model as a reconstructed neural network, and reconstructing the coding Z of the relational data using the following loss function:
here, G(z)Is the output of the generator model with an input of Z (Z ∈ Z), α>0 is boundary hyperparameter, its action is equal to interval correction of sensitive relation reconstruction code and its normal code, the larger alpha is, the larger interval between two codes is corrected is, and for code directionThe more stringent the correction of the quantity;
d) we use the gradient descent algorithm of degree L to reconstruct the initial embeddingThe reconstruction process is calculated according to the following formula:
e) due to non-convexity of mean square error, randomly initializing R z to enable us to sample different local minimum values so as to improve robustness of a reconstruction model, wherein z is*Is found by minimizing the following equation:
finally using the reconstructed dataAnd embedding as a final relation, and predicting the subsequent personnel relation.
The solution in this embodiment adopts WGAN to solve the problems of conventional GAN training, such as unstable training, and basically solves the problem of collapse mode, thereby ensuring the diversity of generated samples. In terms of prediction of non-sensitive relationships, the solution of the embodiment is better than differential privacy, and the calculation amount is smaller than that of encryption technology.
Example two:
in this embodiment, a privacy-preserving link prediction system based on mail data is constructed by using the method provided in the first embodiment, and as shown in fig. 4, the system includes the following modules:
a data preprocessing module: constructing a knowledge graph according to original mail data to form strict mathematical definition and a target;
an entity relationship low-dimensional embedding module: given a set S of triples in the form of (h, L, t) containing two entities h, t E E (the set of entities), a relationship L E L (the set of relationships). The entity relation low-dimensional embedding module mainly learns the low-dimensional embedding of the entities and the relations, and the embedding has a good effect on a downstream link prediction task. The patent selects a TransE model with excellent performance to be used for the entity relationship embedding module.
A generator training module: the module is shown in fig. 2. part r, and comprises a generator G and a discriminator D, the input data being real embedded data and random sampling noise Z from a gaussian distribution. The generator can generate data with the same distribution as the real embedded data during the counter training.
A data reconstruction module: the module is shown in the left part of FIG. 2. the data processed by the entity relationship low-dimensional embedding module is called embedded data and is represented by X. Therefore, for any entity or relationship, we can represent the mapping relationship between the entity and the embedded data by a unique duplet { e (h, l, te ∈ e) → X }. Given a pre-trained generator G and the entity or relationship X to be predicted, z should first be found*To minimize our reconstruction loss. Then G (z)*) Embedded as a reconstruction is used as a link prediction. Since equation 1 is a highly non-convex minimization problem, we use different random initializations of R z (denoted as) To make an L gradient descent to approximate the process. After antagonism training, we willInput into the generator, we use the gradient descent algorithm at L steps to evaluate the projection of the real dataset in the embedding space of the generator.
A link prediction module: through the data reconstruction module, we obtain new low-dimensional embedding of entities and relationsG (z)*). This embedding can be used to predict the relationship of people in the mail network.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, and such changes and modifications are within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (6)
1. A privacy protection link prediction method based on mail data is characterized by comprising the following steps:
the method comprises the following steps: preprocessing the mail data, mining implicit relations in the mails, and constructing a figure relation knowledge graph based on the mail data;
step two: encoding entities and implicit relations in the human relationship knowledge graph by using an energy-based learning entity low-dimensional embedding model to obtain embedding space and embedding data with one-to-one relations among different entities;
step three: training by using the generated countermeasure network and using the coded embedded data to obtain a generated model, and simulating an embedded space by using the model;
step four: a gradient descent reconstruction method is used for confusing the sensitive relation and the non-sensitive relation implied in the original data, and the distribution structure of the embedding space is finely adjusted;
step five: and performing reasoning prediction based on the character relationship of the mail system based on the data of the finely adjusted embedding space.
2. The method according to claim 1, wherein the first step specifically comprises:
s101, aiming at a college student mail system data set, selecting a student communication relation which is most closely related to personnel, and establishing a communication relation knowledge map;
s102, dividing a college student mail system network into an intra-domain communication network and an extra-domain communication network;
s103, defining the communication relation knowledge graph as a (h, l, t) triple, wherein the communication relation l is divided into two groups of relations which are respectively known relations loAnd unknown relationships l that need to be de-predicteduAnd l isu∈lo;
3. The method according to claim 1, wherein the second step specifically comprises:
s201, generating a real Gaussian distribution, and randomly sampling and initializing entities and relations of original mail data;
s202, carrying out normalization processing on the vectors of the entities and the relations in each iteration;
s203, selecting a fixed amount of data as positive samples S each timebatchIs represented by (h, l)oT) and for each positive sample, then replace its head and tail entities as a negative sample S'batchIs represented by (h' l)o,t’);
S204, updating the entity and the relation vector by using a random gradient descent algorithm according to the following loss functions:
wherein, [ x ]]+Represents taking [0, x]The maximum value of (a) is a boundary hyperparameter whose function is equivalent to a gap correction between a positive and a negative sample; d (x, y) is a distance function, d (x, y) being (x-y)2。
4. The method for predicting privacy-preserving links based on mail data as claimed in claim 1, wherein the training in step three obtains the generative model specifically comprising:
s301, sampling a random noise Z from Gaussian distribution;
s302, using a neural network comprising two fully-connected layers and a normalization layer as a generator model G (), and adopting Wasserstein loss and link prediction loss, wherein the link prediction loss is expressed as ranking loss based on margin and is represented as follows:
wherein the content of the first and second substances,in the case of a non-sensitive relationship triplet,a sensitive relationship triplet; gamma ray>0 is a boundary hyperparameter, d (x, y) represents the euclidean distance between the two vectors;
the Wasserstein loss was calculated as follows:
wherein, ynDenotes a non-sensitive label, ysThe loss of the entire generative model is as followsShown in the figure:
LG=L2+λLDist
wherein, λ is a hyper-parameter for adjusting a single loss function weight;
s303, using two full-connection layer networks with LeakyReLU active layers as a discriminator model D (), using the second full-connection layer as a classifier to distinguish the authenticity of input data, and using Wasserstein loss; penalizing L with a gradientGPTo enforce the lipschitz constraint, the discriminator model is penalized if the gradient norm deviates from its target norm value of 1, and therefore the penalty function of the discriminator model is given by:
and S304, alternately training the generator model and the discriminator model.
5. The method for predicting privacy-preserving links based on mail data as claimed in claim 1, wherein the fourth step specifically comprises the following sub-steps:
S402, the original data set is processed according to the relation loIs divided into a plurality of sets of data Xl,XlRepresenting the TransE code corresponding to the relation l;
s403, for any group of data sets X containing the relation llUsing the trained generator model as a reconstructed neural network, and reconstructing the coding Z of the relational data using the following loss function:
wherein G is(Z)Is input asThe output of the generator model for Z (Z ∈ Z); alpha is alpha>0 is boundary over-parameter, and the function of the boundary over-parameter is equivalent to interval correction of sensitive relation reconstruction coding and normal coding;
s404, the initial embedding is reconstructed by using the gradient descent algorithm for L timesThe reconstruction process is calculated according to the following formula:
s405, randomly initializing R z, and sampling to different local minimum values to improve robustness of a reconstructed model, wherein z is*Is found by minimizing the following equation:
6. A privacy-preserving link prediction system based on mail data, which is realized by the privacy-preserving link prediction method based on mail data of any one of claims 1 to 5, and is characterized by comprising
The data preprocessing module is used for constructing a knowledge graph according to original mail data to form strict mathematical definition and a target;
the entity relationship low-dimensional embedding module is used for learning the low-dimensional embedding of the entities and the relationships in the knowledge graph;
the generator training module comprises a generator G and a discriminator D, and input data are real embedded data and random sampling noise Z which obeys Gaussian distribution;
a data reconstruction module for reconstructing the embedded data X processed by the entity relationship low-dimensional embedding module to obtain a new entity and relationship low-dimensional embedded G (z)*);
A link prediction module for embedding G (z) according to a low dimension*) The relationship of people in the mail network is predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210066876.0A CN114513337B (en) | 2022-01-20 | 2022-01-20 | Privacy protection link prediction method and system based on mail data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210066876.0A CN114513337B (en) | 2022-01-20 | 2022-01-20 | Privacy protection link prediction method and system based on mail data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114513337A true CN114513337A (en) | 2022-05-17 |
CN114513337B CN114513337B (en) | 2023-04-07 |
Family
ID=81550105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210066876.0A Active CN114513337B (en) | 2022-01-20 | 2022-01-20 | Privacy protection link prediction method and system based on mail data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114513337B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115238827A (en) * | 2022-09-16 | 2022-10-25 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting sample detection system training method and device |
CN117290888A (en) * | 2023-11-23 | 2023-12-26 | 江苏风云科技服务有限公司 | Information desensitization method for big data, storage medium and server |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647334A (en) * | 2018-05-11 | 2018-10-12 | 电子科技大学 | A kind of video social networks homology analysis method under spark platforms |
CN110147450A (en) * | 2019-05-06 | 2019-08-20 | 北京科技大学 | A kind of the knowledge complementing method and device of knowledge mapping |
EP3557505A1 (en) * | 2018-04-20 | 2019-10-23 | Facebook, Inc. | Contextual auto-completion for assistant systems |
WO2019231481A1 (en) * | 2018-05-29 | 2019-12-05 | Visa International Service Association | Privacy-preserving machine learning in the three-server model |
CN111046187A (en) * | 2019-11-13 | 2020-04-21 | 山东财经大学 | Sample knowledge graph relation learning method and system based on confrontation type attention mechanism |
US10671752B1 (en) * | 2019-11-20 | 2020-06-02 | Capital One Services, Llc | Computer-based methods and systems for managing private data of users |
CN111639359A (en) * | 2020-04-22 | 2020-09-08 | 中国科学院计算技术研究所 | Method and system for detecting and early warning privacy risks of social network pictures |
CN111859454A (en) * | 2020-07-28 | 2020-10-30 | 桂林慧谷人工智能产业技术研究院 | Privacy protection method for defending link prediction based on graph neural network |
CN112182245A (en) * | 2020-09-28 | 2021-01-05 | 中国科学院计算技术研究所 | Knowledge graph embedded model training method and system and electronic equipment |
CN113190688A (en) * | 2021-05-08 | 2021-07-30 | 中国人民解放军国防科技大学 | Complex network link prediction method and system based on logical reasoning and graph convolution |
CN113220897A (en) * | 2021-04-29 | 2021-08-06 | 天津大学 | Knowledge graph embedding model based on entity-relation association graph |
CN113282818A (en) * | 2021-01-29 | 2021-08-20 | 中国人民解放军国防科技大学 | Method, device and medium for mining network character relationship based on BilSTM |
CN113360286A (en) * | 2021-06-21 | 2021-09-07 | 中国人民解放军国防科技大学 | Link prediction method based on knowledge graph embedding |
CN113361658A (en) * | 2021-07-15 | 2021-09-07 | 支付宝(杭州)信息技术有限公司 | Method, device and equipment for training graph model based on privacy protection |
-
2022
- 2022-01-20 CN CN202210066876.0A patent/CN114513337B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3557505A1 (en) * | 2018-04-20 | 2019-10-23 | Facebook, Inc. | Contextual auto-completion for assistant systems |
CN108647334A (en) * | 2018-05-11 | 2018-10-12 | 电子科技大学 | A kind of video social networks homology analysis method under spark platforms |
WO2019231481A1 (en) * | 2018-05-29 | 2019-12-05 | Visa International Service Association | Privacy-preserving machine learning in the three-server model |
CN110147450A (en) * | 2019-05-06 | 2019-08-20 | 北京科技大学 | A kind of the knowledge complementing method and device of knowledge mapping |
CN111046187A (en) * | 2019-11-13 | 2020-04-21 | 山东财经大学 | Sample knowledge graph relation learning method and system based on confrontation type attention mechanism |
US10671752B1 (en) * | 2019-11-20 | 2020-06-02 | Capital One Services, Llc | Computer-based methods and systems for managing private data of users |
CN111639359A (en) * | 2020-04-22 | 2020-09-08 | 中国科学院计算技术研究所 | Method and system for detecting and early warning privacy risks of social network pictures |
CN111859454A (en) * | 2020-07-28 | 2020-10-30 | 桂林慧谷人工智能产业技术研究院 | Privacy protection method for defending link prediction based on graph neural network |
CN112182245A (en) * | 2020-09-28 | 2021-01-05 | 中国科学院计算技术研究所 | Knowledge graph embedded model training method and system and electronic equipment |
CN113282818A (en) * | 2021-01-29 | 2021-08-20 | 中国人民解放军国防科技大学 | Method, device and medium for mining network character relationship based on BilSTM |
CN113220897A (en) * | 2021-04-29 | 2021-08-06 | 天津大学 | Knowledge graph embedding model based on entity-relation association graph |
CN113190688A (en) * | 2021-05-08 | 2021-07-30 | 中国人民解放军国防科技大学 | Complex network link prediction method and system based on logical reasoning and graph convolution |
CN113360286A (en) * | 2021-06-21 | 2021-09-07 | 中国人民解放军国防科技大学 | Link prediction method based on knowledge graph embedding |
CN113361658A (en) * | 2021-07-15 | 2021-09-07 | 支付宝(杭州)信息技术有限公司 | Method, device and equipment for training graph model based on privacy protection |
Non-Patent Citations (3)
Title |
---|
H. A. DEYLAMI AND M. ASADPOUR: ""Link prediction in social networks using hierarchical community detection"", 《2015 7TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT)》 * |
Y. WANG等: """Efficient Privacy Preserving Matchmaking for Mobile Social Networking against Malicious Users"", 《2012 IEEE 11TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS》 * |
张钊等: ""用于知识表示学习的对抗式负样本生成"", 《计算机应用》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115238827A (en) * | 2022-09-16 | 2022-10-25 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting sample detection system training method and device |
CN115238827B (en) * | 2022-09-16 | 2022-11-25 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting sample detection system training method and device |
CN117290888A (en) * | 2023-11-23 | 2023-12-26 | 江苏风云科技服务有限公司 | Information desensitization method for big data, storage medium and server |
CN117290888B (en) * | 2023-11-23 | 2024-02-09 | 江苏风云科技服务有限公司 | Information desensitization method for big data, storage medium and server |
Also Published As
Publication number | Publication date |
---|---|
CN114513337B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Luo et al. | Feature inference attack on model predictions in vertical federated learning | |
EP3114540B1 (en) | Neural network and method of neural network training | |
Thiesson et al. | Learning mixtures of DAG models | |
CN114513337B (en) | Privacy protection link prediction method and system based on mail data | |
CN112199717B (en) | Privacy model training method and device based on small amount of public data | |
Yuan et al. | Es attack: Model stealing against deep neural networks without data hurdles | |
CN111242157A (en) | Unsupervised domain self-adaption method combining deep attention feature and conditional opposition | |
CN112883200A (en) | Link prediction method for knowledge graph completion | |
CN114417427B (en) | Deep learning-oriented data sensitivity attribute desensitization system and method | |
CN117201122B (en) | Unsupervised attribute network anomaly detection method and system based on view level graph comparison learning | |
CN115238827B (en) | Privacy-protecting sample detection system training method and device | |
CN115481431A (en) | Dual-disturbance-based privacy protection method for federated learning counterreasoning attack | |
CN111597352B (en) | Network space knowledge graph reasoning method and device combining ontology concepts and instances | |
Zheng et al. | Training data reduction in deep neural networks with partial mutual information based feature selection and correlation matching based active learning | |
Matsumoto et al. | XCSR based on compressed input by deep neural network for high dimensional data | |
CN112463956A (en) | Text summary generation system and method based on counterstudy and hierarchical neural network | |
CN112580728A (en) | Dynamic link prediction model robustness enhancing method based on reinforcement learning | |
Hartmann et al. | Distribution inference risks: Identifying and mitigating sources of leakage | |
CN112988851B (en) | Counterfactual prediction model data processing method, device, equipment and storage medium | |
Suri et al. | Dissecting distribution inference | |
CN113989595A (en) | Federal multi-source domain adaptation method and system based on shadow model | |
CN113935496A (en) | Robustness improvement defense method for integrated model | |
CN116545764B (en) | Abnormal data detection method, system and equipment of industrial Internet | |
EP4174738B1 (en) | Systems and methods for protecting trainable model validation datasets | |
CN114925699A (en) | High-mobility confrontation text generation method based on style transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |