WO2018076916A1 - 数据发布方法和装置及终端 - Google Patents

数据发布方法和装置及终端 Download PDF

Info

Publication number
WO2018076916A1
WO2018076916A1 PCT/CN2017/099042 CN2017099042W WO2018076916A1 WO 2018076916 A1 WO2018076916 A1 WO 2018076916A1 CN 2017099042 W CN2017099042 W CN 2017099042W WO 2018076916 A1 WO2018076916 A1 WO 2018076916A1
Authority
WO
WIPO (PCT)
Prior art keywords
network structure
bayesian network
attribute
attributes
actual
Prior art date
Application number
PCT/CN2017/099042
Other languages
English (en)
French (fr)
Inventor
王德政
苏森
申山宏
程祥
牛家浩
唐朋
杨健宇
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018076916A1 publication Critical patent/WO2018076916A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present application relates to, but is not limited to, the field of data security, and in particular, to a data publishing method and apparatus, and a terminal.
  • Privacy-preserving data publishing is designed to protect sensitive information of users during the process of publishing data.
  • the proposed differential privacy protection model provides a feasible solution to solve the data release problem that meets privacy protection. Unlike traditional anonymous-based privacy protection models (such as k-anonymity and l-diversity), the differential privacy protection model provides a strict and quantifiable means of privacy protection, and the privacy protection provided does not depend on The background knowledge that the attacker has.
  • the PrivBayes method solves the problem of data publishing that satisfies differential privacy. It first constructs a Bayesian network using raw data. In order to meet the privacy protection requirements, noise is added to the constructed Bayesian network to meet the differential privacy protection requirements; then the new data is generated and released using the Bayesian network containing noise.
  • the data publishing method in a single scenario cannot be directly applied to a multi-party scenario.
  • a distributed data generation algorithm that satisfies differential privacy (such as the DistDiffGen algorithm) solves the problem of data publishing between two parties, but cannot be applied to data publishing problems that satisfy differential privacy in a multi-party scenario.
  • the collaborative search log generation algorithm (such as CELS algorithm) solves the problem of multi-party search log publishing, but it cannot solve the data publishing problem with multiple attributes in multi-party scenarios. In addition, the privacy protection of this method is low. Based on the above analysis, multi-party data distribution that satisfies differential privacy protection in a big data environment cannot be realized at present.
  • the embodiment of the present application provides a data publishing method and device, and a terminal, which can improve the number in a large number According to the security of multi-party data release in the environment.
  • a data distribution method includes: updating an initial Bayesian network structure corresponding to an attribute set of data, and obtaining an updated actual Bayesian network structure;
  • the parameters in the leaf network structure obtain the target Bayesian network structure;
  • the target Bayesian network structure is used to publish data corresponding to all the attributes in the attribute set.
  • updating an initial Bayesian network structure corresponding to the attribute set of the data, and obtaining the updated actual Bayesian network structure may include: acquiring first mutual information of any two attributes in the attribute set; The initial Bayesian network structure is serially updated by the first mutual information, and the updated actual Bayesian network structure is obtained.
  • acquiring the first mutual information of any two attributes in the attribute set may include: dividing the attribute set into multiple views, where each view includes a partial attribute in the attribute set; utilizing an optimal multiparty The Laplace mechanism combines multiple marginal distributions corresponding to each view into the actual marginal distribution of each view, where the actual marginal distribution carries Laplacian noise; using the actual marginal distribution of each view The first mutual information of any two attributes in each view.
  • dividing the attribute set into the plurality of views may include dividing the attribute set into a plurality of views by using a non-overlapping attribute partitioning method, wherein the attribute pairs included in any two views do not overlap.
  • using the optimal multi-party Laplacian mechanism to merge the plurality of marginal distributions corresponding to each view into the actual marginal distribution of each view may include: obtaining each object based on the plurality of objects The marginal distribution of each view calculated by the possessed data, where Laplacian noise is added to the marginal distribution; multiple marginal distributions of multiple objects are merged into the actual marginal distribution of each view, and multiple margins are combined The minimum noise in the multiple Laplacian noises carried by the distribution is used as the Laplacian noise of the actual marginal distribution.
  • the method may further include: acquiring an initial Bayesian network structure including a parent-child relationship of all attributes in the attribute set, where The parent-child relationship is determined by multiple objects based on the specified way.
  • the specifying manner may be used to indicate that the parent-child relationship is determined as follows: the first one of the plurality of objects divides the attribute set into the first set and the second set, where The first set is used to save the attribute of the determined parent node, the initial state of the first set is empty, and the second set is used to save the attribute of the undetermined parent node; the first object selects an attribute from the second set and saves to a first set; the i-th object of the plurality of objects determines a parent node for a first preset number of attributes in the second set according to a preset manner, and migrates the attribute of the determined parent node from the second set to the first set, Where i is a positive integer less than k, and k is the number of multiple objects; the kth object of the plurality of objects determines the parent node for the second predetermined number of attributes in the second set according to a preset manner, and will determine The attributes of the parent node are migrated from the second collection to the first collection
  • the preset manner may include: acquiring second mutual information of each of the first attribute and the second attribute in the first set, wherein the second attribute is an attribute selected from the second set; using an index The mechanism selects the target mutual information from the plurality of second mutual information, and uses the first attribute corresponding to the target mutual information as the parent node of the second attribute.
  • serially updating the initial Bayesian network structure by using the first mutual information to obtain an updated actual Bayesian network structure may include: updating an initial Bayesian network structure, and obtaining an update.
  • the first Bayesian network structure updating the j-1 Bayesian network structure to obtain an updated j-th Bayesian network structure, where j is a positive integer greater than 1 and less than k;
  • the k-1 Bayesian network structure is updated to obtain the actual Bayesian network structure.
  • updating the initial Bayesian network structure to obtain the updated first Bayesian network structure may include: constructing an initial Bayesian using the boundary structure method of the associated strength perception by using the first mutual information a first boundary of the network structure; obtaining a first intra-boundary attribute of the first object of the plurality of objects and a first marginal distribution of the parent node of the attribute, wherein the first marginal distribution carries Laplace noise Using the exponential mechanism to select the parent node for each attribute in the first boundary, and obtain the updated first Bayesian network structure.
  • updating the j-1 Bayesian network structure to obtain an updated j-th Bayesian network structure may include: constructing a boundary construction method using association strength perception by using the first mutual information The jth boundary of the j-1 Bayesian network structure; obtaining the attribute in the jth boundary of the jth object statistics of the plurality of objects and the jth marginal distribution of the parent node of the attribute, wherein the jth marginal distribution carries Laplace noise; using the exponential mechanism to select the parent node for each attribute in the j-th boundary, and obtain the updated j-th Bayesian network structure.
  • learning parameters in the actual Bayesian network structure may include: obtaining Taking the conditional distribution of any attribute in the actual Bayesian network structure determined by each object of the plurality of objects and the parent node of any attribute; using the optimal multi-party Laplace mechanism to merge the obtained multiple conditional distributions into The actual conditional distribution of any attribute and the parent of either attribute, where the actual conditional distribution carries Laplace noise.
  • publishing the data corresponding to all the attributes in the attribute set using the target Bayesian network structure may include: using the product of the actual condition distribution of each attribute under the condition of the given parent node as the joint distribution of all the attributes ; Publish data generated by the union distribution corresponding to all attributes.
  • a data distribution apparatus comprising: an update unit configured to update an initial Bayesian network structure corresponding to a set of attributes of the data, to obtain an updated actual Bayesian a network structure; a learning unit configured to learn parameters in an actual Bayesian network structure to obtain a target Bayesian network structure; and a publishing unit configured to use the target Bayesian network structure to publish data corresponding to all attributes in the attribute set.
  • the updating unit may include: a first obtaining module configured to acquire first mutual information of any two attributes in the attribute set; and an updating module configured to use the first mutual information to the initial Bayesian network structure Perform a serial update to get the updated actual Bayesian network structure.
  • the first obtaining module may include: a dividing submodule configured to divide the attribute set into a plurality of views, wherein each view includes a partial attribute in the attribute set; the merge submodule is configured to utilize the most The Eutop Laplace mechanism combines multiple marginal distributions corresponding to each view into the actual marginal distribution of each view, where the actual marginal distribution carries Laplace noise; the computational sub-module is configured to utilize The actual marginal distribution of each view calculates the first mutual information for any two attributes in each view.
  • the dividing sub-module may be configured to divide the attribute set into a plurality of views by using a non-overlapping attribute dividing device, wherein the attribute pairs included in any two views do not overlap.
  • the merging sub-module may be configured to: obtain a marginal distribution of each view calculated based on data owned by each of the plurality of objects, wherein Laplace noise is added to the marginal distribution; Multiple marginal distributions of multiple objects are merged into the actual marginal distribution of each view, and the minimum noise of multiple Laplace noises carried by multiple marginal distributions is taken as the Laplace noise of the actual marginal distribution.
  • the updating unit may further include: a second obtaining module configured to acquire an initial Bayesian network structure including a parent-child relationship of all attributes in the attribute set, wherein the parent-child relationship is specified by the plurality of objects The way is determined.
  • a second obtaining module configured to acquire an initial Bayesian network structure including a parent-child relationship of all attributes in the attribute set, wherein the parent-child relationship is specified by the plurality of objects The way is determined.
  • the update module may include: a first update submodule configured to update the initial Bayesian network structure to obtain an updated first Bayesian network structure; and a second update submodule configured to The j-1 Bayesian network structure is updated to obtain an updated j-th Bayesian network structure, where j is a positive integer greater than 1 and less than k; and the third update sub-module is configured to be k- 1 Bayesian network structure is updated to obtain the actual Bayesian network structure.
  • the first update submodule may be configured to: construct, by using the first mutual information, a first boundary of the initial Bayesian network structure by using a boundary strength constructing boundary construction device; acquiring the first of the plurality of objects The first boundary attribute of the object statistics and the first marginal distribution of the parent node of the attribute, wherein the first marginal distribution carries Laplacian noise; and the exponential mechanism is used to select a parent node for each attribute in the first boundary , get the updated first Bayesian network structure.
  • the second update submodule may be configured to: construct, by using the first mutual information, a jth boundary of the j-1 Bayesian network structure by using a boundary strength constructing boundary construction device; acquiring multiple objects The j-th boundary attribute of the j-th object statistics and the j-th marginal distribution of the parent node of the attribute, wherein the j-th marginal distribution carries Laplacian noise; and the exponential mechanism is used to select each attribute in the j-th boundary The parent node gets the updated j-Bayesian network structure.
  • the learning unit may include: a third obtaining module configured to acquire a conditional distribution of any of the attributes of the actual Bayesian network structure determined by each of the plurality of objects and the parent node of any of the attributes;
  • the merging module is configured to merge the obtained multiple conditional distributions into an actual conditional distribution of any attribute and a parent node of any attribute by using an optimal multi-party Laplace mechanism, wherein the actual conditional distribution carries the Laplain Noise.
  • the issuing unit may include a processing module configured to use a product of an actual conditional distribution of each attribute under a given parent node condition as a joint distribution of all attributes; a publishing module configured to be published by the joint distribution The generated data corresponds to all attributes.
  • a terminal comprising: a processor; a memory configured to store processor-executable instructions; a transmission device configured to perform information transceiving communication according to control of the processor; wherein the processor Configured to do the following: update the property collection with the data Corresponding initial Bayesian network structure, get the updated actual Bayesian network structure; learn the parameters in the actual Bayesian network structure to obtain the target Bayesian network structure; use the target Bayesian network structure to publish the corresponding attributes The data for all the attributes in the collection.
  • the processor may be further configured to: obtain first mutual information of any two attributes in the attribute set; serially update the initial Bayesian network structure by using the first mutual information, and obtain an update After the actual Bayesian network structure.
  • a storage medium which may be arranged to store program code for performing an update of an initial Bayesian network structure corresponding to a set of attributes of data, after being updated The actual Bayesian network structure; learning the parameters in the actual Bayesian network structure to obtain the target Bayesian network structure; using the target Bayesian network structure to publish data corresponding to all the attributes in the attribute set.
  • the initial Bayesian network structure corresponding to the attribute set of the data is updated, and the updated actual Bayesian network structure is obtained; the parameters in the actual Bayesian network structure are learned, and the target Bayesian network is obtained.
  • FIG. 1 is a schematic diagram of a computer terminal implementing a data distribution method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a data distribution system
  • FIG. 3 is a flowchart of a data distribution method according to an embodiment of the present application.
  • FIG. 4 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application.
  • FIG. 5 is an exemplary schematic diagram of a data distribution system according to an embodiment of the present application.
  • FIG. 6 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application.
  • FIG. 7 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application.
  • FIG. 8 is an exemplary schematic diagram of a data distribution system according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a data distribution apparatus according to an embodiment of the present application.
  • the computer terminal may include one or more (only one shown) processor 101 (the processor 101 may include, but is not limited to, a microprocessor (MCU, Microcontroller). Unit) or a processing device such as a Programmable Gate Array (FPGA), a memory 103 for storing data, and a transmission device 105 for communication functions.
  • processor 101 may include, but is not limited to, a microprocessor (MCU, Microcontroller). Unit
  • processing device such as a Programmable Gate Array (FPGA), a memory 103 for storing data, and a transmission device 105 for communication functions.
  • FIG. 1 is merely illustrative and does not limit the structure of the above electronic device.
  • the memory 103 can be used to store software programs of the application software and modules, such as program instructions or modules corresponding to the data distribution method in the embodiment, and the processor 101 executes various functions by running software programs and modules stored in the memory 103.
  • Application and data processing that is, the above method is implemented.
  • Memory 103 can include high speed random access memory and can also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 103 can further include memory remotely located relative to the processor, which can be connected to the computer terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the processor 101 is configured to perform an operation of updating an initial Bayesian network structure corresponding to a set of attributes of the data, obtaining an updated actual Bayesian network structure, and learning parameters in the actual Bayesian network structure.
  • Target Bayesian network structure uses the target Bayesian network structure to publish data corresponding to all attributes in the attribute set.
  • the processor 101 may be further configured to: obtain first mutual information of any two attributes in the attribute set; serially update the initial Bayesian network structure by using the first mutual information, Updated actual Bayesian network structure.
  • Transmission device 105 is configured to receive or transmit data via a network.
  • a network may include a wireless network provided by a communication provider of a computer terminal.
  • the transmission device 105 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 105 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • Semi-trusted curator A third party refers to an individual or organization that collaborates with one or more data owners for data distribution. Semi-trusted means that a third party will coordinate one or more of the relevant protocol rules of the algorithm. The data owner performs data publishing, but it may use the resources it has to steal the user's private information in the process of interacting with the data owner.
  • Marginal Distribution It is the marginal distribution, which refers to the multi-variable probability density function commonly used in statistics to sum a certain variable, so that the influence of the variable can be ignored in the result, and the resulting probability distribution.
  • Bayesian network A probabilistic pattern model in which a set of random variables and their conditional probability distributions are known by directed acyclic graphs.
  • Search frontier It consists of two parts, one part is a set of candidate attributes - the set of parent pairs (ie attribute pairs, expressed as: ⁇ attribute, parent node>), and the other part is the candidate attribute - parent node
  • the edge distribution of the pair, the boundary can be seen as updating each data owner A priori knowledge of the Bayesian network structure.
  • Conditional Distribution Two related random variables X' and Y are known.
  • the differential privacy protection model has become the standard privacy protection model in the field of data analysis.
  • the differential privacy protection model has strict mathematical definitions and does not make any assumptions about the background knowledge owned by the attacker. Given the databases D and D', assuming that D and D' differ by one and only one record r, then for data analysis algorithm A that satisfies differential privacy protection, the analysis results in databases D and D' will have approximately the same Probability distributions. In this case, no matter how rich the background knowledge of the attacker, it is impossible to judge whether the record r exists in the database.
  • the similarity of the analysis results is controlled by privacy parameters (ie privacy budget). The smaller the privacy parameter, the higher the privacy protection of the algorithm.
  • the differential privacy protection model protects users' privacy by adding noise during data analysis.
  • Differential privacy protection model Given algorithm A, assume that databases D and D' are arbitrary adjacent databases. For any possible output S of the algorithm A, if the ratio of the probability that the algorithm A outputs S in the database D and the probability that the algorithm A outputs S in the database D' is smaller than the constant value e, the algorithm A is said to satisfy the differential privacy protection. That is, Pr[A(D) ⁇ S] ⁇ e ⁇ ⁇ Pr[A(D') ⁇ S]. From the perspective of probability distribution, the differential privacy protection model makes the impact of any record on the final analysis result of the algorithm limited.
  • the output is an entity object r ⁇ Range
  • u(D,r) is the availability function
  • ⁇ u is the sensitivity of the function u(D,r)
  • the data distribution system includes data owners (P 1 , P 2 , . . . , P k ), and each data owner has its own data (ie, D 1 , D 2 stored in the data warehouse D). ,..., D k ), the semi-trusted third party T will post the data D' to the data analyst U after processing the data in the data warehouse.
  • the data owner, the semi-trusted third party and the data analysis convinced can use their skills to attack the data warehouse (such as attack 1, attack 2, attack 3), resulting in lower security of the current data distribution system.
  • the method of the present application can solve the above problem.
  • a method embodiment of a data distribution method is provided. It should be noted that the steps shown in the flowchart of the drawing may be performed in a computer system such as a set of computer executable instructions, and Although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
  • FIG. 3 is a flowchart of a data publishing method according to an embodiment of the present application. As shown in FIG. 3, the method includes the following steps:
  • Step S301 updating an initial Bayesian network structure corresponding to the attribute set of the data, to obtain an updated actual Bayesian network structure
  • Step S302 learning parameters in the actual Bayesian network structure, and obtaining a target Bayesian network structure
  • Step S303 using the target Bayesian network structure to publish data corresponding to all the attributes in the attribute set.
  • the initial Bayesian network structure corresponding to the attribute set of the data is updated, and the updated actual Bayesian network structure is obtained; the parameters in the actual Bayesian network structure are learned, and the target Bayesian network structure is obtained;
  • the target Bayesian network structure is used to publish data corresponding to all the attributes in the attribute set, thereby improving the security when implementing multi-party data publishing in a big data environment, and realizing the technical effect of improving the security of data distribution.
  • the above parameters are the parameters of the Bayesian network, such as the conditional distribution of each node in the Bayesian network when its parent node is given.
  • the above steps S301 to S303 may be run on a terminal used by a semi-trusted third party, or on a terminal device in a network consisting of a semi-trusted third party and a data owner, the number of data owners Can be multiple.
  • the data owner initializes an initial Bayesian network structure corresponding to the set of attributes and sends it to the semi-trusted third party; the semi-trusted third party and the data owner serially update the initial Bayesian network structure through the first mutual information, Obtain the updated actual Bayesian network structure; semi-trusted third parties and data owners learn parameters in the actual Bayesian network structure in parallel; semi-trusted third parties use learning to participate in the parameters
  • the actual Bayesian network structure after the number publishes data corresponding to all the attributes in the attribute set.
  • step S301 the initial Bayesian network structure corresponding to the attribute set of the data is updated, and the updated actual Bayesian network structure is obtained, which may include: acquiring the first mutual information of any two attributes in the attribute set; A mutual information serially updates the initial Bayesian network structure to obtain an updated actual Bayesian network structure.
  • obtaining the first mutual information of any two attributes in the attribute set may include: dividing the attribute set into a plurality of views, wherein each view includes a partial attribute in the attribute set; utilizing an optimal multi-party Laplacian Mechanism, which combines multiple marginal distributions corresponding to each view into the actual marginal distribution of each view, where the actual marginal distribution carries Laplacian noise; the actual marginal distribution of each view is used to calculate each view The first mutual information of any two attributes.
  • the attribute set is divided into multiple views, the attribute set is divided into multiple views by using the non-overlapping attribute partitioning method, wherein the attribute pairs included in any two views do not overlap, and the obtained set of views is obtained.
  • the semi-trusted third party and the data owner collaborate to calculate the first mutual information of any two attributes in the attribute set of the data, and the semi-trusted third party divides the attribute set into multiple views, wherein each of the plurality of views includes Some attributes in the attribute collection, the attribute pairs included in any two of the multiple views do not overlap; each of the data owners uses the data they own to calculate the marginal distribution of each view;
  • the third party and multiple data owners use the optimal multi-party Laplacian mechanism to combine multiple marginal distributions (such as adding multiple marginal distributions) to the actual marginal distribution of each view, where multiple marginal distributions
  • the marginal distribution calculated for multiple data owners the actual marginal distribution carries Laplace noise; the semi-trusted third party uses the actual marginal distribution of each view to calculate the first of any two attributes in each view.
  • Mutual information is provided.
  • Combining multiple marginal distributions corresponding to each view into the actual marginal distribution of each view using the optimal multi-party Laplace mechanism may include: acquiring each object based on multiple objects (ie, data owners) The calculated marginal distribution of each view, where Laplacian noise is added to the marginal distribution; multiple marginal distributions of multiple objects are combined into the actual marginal distribution of each view, in order to meet the differential privacy protection requirements , the data owner and the semi-trusted third party use the optimal multi-party Laplace mechanism (ie, the optimal multi-party Laplace mechanism) to add Laplace to the merged marginal distribution.
  • Noise which is the minimum noise of multiple Laplace noises carried by multiple marginal distributions as Laplace noise of the actual marginal distribution.
  • a semi-trusted third party and multiple data owners utilizing an optimal multi-party Laplacian mechanism to merge multiple marginal distributions into the actual marginal distribution of each view may include: each data owner owns own Data statistics The marginal distribution of all views in the previous step, each data owner sends the calculated marginal distribution to a semi-trusted third party, where Laplacian noise is added to the marginal distribution; semi-trusted third parties will Multiple marginal distributions (such as merging in cumulative form) are the actual marginal distributions of each view.
  • data owners and semi-trusted third parties utilize the optimal multi-party Laplace mechanism (ie, optimal multi-party pull).
  • the Plass mechanism adds Laplace noise to the combined marginal distribution, ie, the minimum noise of multiple Laplace noises carried by multiple marginal distributions is taken as the Laplace noise of the actual marginal distribution.
  • An initial Bayesian network structure including a parent-child relationship of all attributes in the attribute set may be obtained before updating the initial Bayesian network structure corresponding to the attribute set of the data, wherein the parent-child relationship is determined by the plurality of objects based on the specified manner . That is, multiple data owners determine the parent-child relationship of all attributes in the attribute set based on the exponential mechanism, and determine the initial Bayesian network structure including the parent-child relationship of all attributes in the attribute set.
  • the above Bayesian network structure initialization refers to the data owner jointly selecting the initial parent node for all attributes, constructing the initial k degree Bayesian network structure (where k degrees means that the number of parent nodes of each attribute is at most k ).
  • the specifying manner may be used to indicate that the parent-child relationship is determined as follows: the first one of the plurality of objects divides the attribute set into the first set and the second set, wherein the first set is used to save the determined parent The attribute of the node, the initial state of the first set is empty, the second set is used to save the attribute of the undetermined parent node; the first object selects one attribute from the second set and saves to the first set; the i-th of the plurality of objects The object determines the parent node for the first preset number of attributes in the second set according to a preset manner, and migrates the attribute of the determined parent node from the second set to the first set, where i is a positive integer less than k, k The number of the plurality of objects; the kth object of the plurality of objects determines the parent node for the second predetermined number of attributes in the second set according to a preset manner, and migrates the attribute of the determined parent node from the second set to the first A collection.
  • the foregoing preset manner may be: acquiring the first attribute and the second attribute in the first set.
  • the second mutual information wherein the second attribute is an attribute selected from the second set; the target mutual information is selected from the plurality of second mutual information by using an exponential mechanism, and the first attribute corresponding to the target mutual information is used as the second attribute
  • the parent node may be: acquiring the first attribute and the second attribute in the first set.
  • the second mutual information wherein the second attribute is an attribute selected from the second set; the target mutual information is selected from the plurality of second mutual information by using an exponential mechanism, and the first attribute corresponding to the target mutual information is used as the second attribute
  • the parent node may be: acquiring the first attribute and the second attribute in the first set.
  • the second mutual information wherein the second attribute is an attribute selected from the second set; the target mutual information is selected from the plurality of second mutual information by using an exponential mechanism, and the first attribute corresponding to the target mutual information is used as the second attribute
  • the parent node may be: acquiring the first
  • Step S11 the semi-trusted third-party designated data owner learns the parent node as an attribute according to the order of P 1 , P 2 , . . . , P K , and determines the number of attributes that each data owner needs to learn. (K-1) data owners learn each Symbol Indicates rounding down), PK learning One. Where d is the number of attributes in the attribute collection.
  • Step S12 the first data owner P 1 is The attributes learn the parent node.
  • P 1 divides the attribute set A into two groups A h (ie the first set) and A n (ie the second set),
  • a h is a set consisting of all the attributes of the selected parent node
  • a n is selected by all unselected A collection of attributes that define the parent node. Among them, the initial state of A h is empty.
  • P 1 randomly picks an attribute X 1 ' from A n , records its parent node as empty, and moves X 1 ' from A n to A h .
  • P 1 selects an attribute X i from A n , and selects min ⁇ k,
  • P 1 takes the mutual information between the attribute and the candidate parent node as the scoring function, and uses the exponential mechanism to select a set of attributes - the parent node pair (X i , ⁇ i ) from all the candidate attribute-parent pairs and record it as (X 2 ' , ⁇ 2 ), ⁇ 2 is the parent of X 2 ', then move X 2 ' from A n to A h .
  • P 1 repeat the above process until The attributes are selected by the parent node.
  • P 1 will aggregate A h , A n and Group attribute - the parent node pair is sent to P 2 .
  • Step S13, P 2 is New attributes select the parent node, and the sets A h , A n and Group Properties - the transmission to parent node P 3.
  • step S14 P K sends the initialized Bayesian network structure N 0 to the semi-trusted third party.
  • the initial Bayesian network structure is serially updated by the first mutual information, and the updated actual Bayesian network structure is obtained, which may include: performing an initial Bayesian network structure. Updating, obtaining the updated first Bayesian network structure; updating the j-1 Bayesian network structure to obtain an updated j-th Bayesian network structure, where j is greater than 1 and less than k Integer; update the k-1 Bayesian network structure to obtain the actual Bayesian network structure.
  • the semi-trusted third party and the data owner serially update the initial Bayesian network structure through the first mutual information, and obtain the updated actual Bayesian network structure including: the semi-trusted third party and the plurality of data owners
  • a data owner updates the initial Bayesian network structure to obtain an updated first Bayesian network structure; a semi-trusted third party and a jth data owner among the plurality of data owners to the j-1th
  • the leaf network structure is updated to obtain an updated j-th Bayesian network structure, where j is a positive integer greater than 1 and less than k; a semi-trusted third party and a k-th data owner among multiple data owners
  • the k-1 Bayesian network structure is updated to obtain the actual Bayesian network structure.
  • updating the initial Bayesian network structure, and obtaining the updated first Bayesian network structure may include: constructing an initial Bayesian network structure by using a first strength information and using a boundary strength construction method of association strength perception a first boundary; acquiring a first intra-boundary attribute of the first object of the plurality of objects and a first marginal distribution of the parent node of the attribute, wherein the first marginal distribution carries Laplace noise; using an exponential mechanism The parent node is selected for each attribute in the first boundary to obtain the updated first Bayesian network structure.
  • the semi-trusted third party uses the first mutual information to construct the first boundary of the initial Bayesian network structure by using the boundary strength constructing boundary construction method; the first data owner counts the first boundary attribute and the attribute's parent node The marginal distribution, and the first marginal distribution with Laplace noise is sent to the semi-trusted third party; the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the first boundary, and gets the updated The first Bayesian network structure.
  • updating the j-1 Bayesian network structure, and obtaining the updated j-th Bayesian network structure comprises: constructing the j-1th shell by using the first mutual information and using the boundary strength constructing boundary construction method The jth boundary of the network structure of the leaves; obtaining the attribute in the jth boundary of the ninth object of the plurality of objects and the jth marginal distribution of the parent node of the attribute, wherein the jth marginal distribution carries the Laplace noise Using the exponential mechanism to select the parent node for each attribute in the j-th boundary, and obtain the updated j-th Bayesian network structure.
  • the semi-trusted third party and the jth data owner of the plurality of data owners update the j-1 Bayesian network structure, and the updated j-th Bayes network structure includes: semi-trusted third-party utilization
  • the first mutual information uses the boundary construction method of correlation strength perception to construct the jth boundary of the j-1 Bayesian network structure; the jth data owner counts the attribute in the jth boundary and the jth marginal distribution of the parent node of the attribute And send the j-th margin distribution with Laplace noise to the semi-trusted third party; the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the j-th boundary, thereby obtaining the updated j Bayesian network structure.
  • the amount of noise added to the statistical information is proportional to the number of candidate attributes-parent pairs (ie, attribute pairs).
  • the boundary can be used to reasonably limit the number of candidate attributes-parent pairs. However, this will inevitably result in a certain loss of information.
  • the boundary needs to contain more effective candidate attributes-parent pair. The stronger the association with an attribute is, the more likely it is to become its parent. Therefore, the boundary construction method of association strength perception can be utilized. To carry out the boundary construction, the basic idea of this method is to add edges between the attributes with strong correlation strength. The process is as follows:
  • Step 1 Given the Bayesian network structure and the mutual information size between the two attributes, the mutual information size between the attributes is used to measure the strength of the association between the attributes. The greater the mutual information, the stronger the correlation strength.
  • step 2 the attribute pair with the largest mutual information is preferentially selected. If the attribute pair has an edge in the current Bayesian network structure, the attribute pair is reselected; otherwise, step 3 is performed.
  • Step 3 If the attribute does not need to add a parent node to the corresponding two attributes, return to step 2; if only one of the attributes needs to add a parent node, add an edge between the attribute pairs, and make another attribute as the The parent of the attribute, while avoiding ringing; if both attributes need to add a parent, perform the following steps to determine the direction of the edge.
  • Step 4 If the directions of the edges are different, it will affect the dependencies between the attributes, thus affecting the selection of the trailing edges, and then affecting the construction of the final boundary.
  • the sparsity Sparse(x) and the influence degree Impact(x, y) can be introduced.
  • the sparsity Sparse(x) indicates the total number of parent nodes that all ancestors of the attribute x need to add, and the parent node is added to the node with large sparsity.
  • the impact degree (x, y) indicates that the direction of the edge is x.
  • step S301 can be implemented by the following steps:
  • Step S21 the first half with a trusted third party data owner P 1 N 0 the initialization to update the network structure.
  • the semi-trusted third party uses N 0 and the previously calculated inter-attribute information to construct the boundary using the boundary construction method of correlation strength perception.
  • P 1 counts the marginal distribution of all attributes and their parent nodes in the boundary and sends them to semi-trusted third parties. In order to meet the differential privacy protection requirements, P 1 needs to add Laplace noise to the statistical marginal distribution.
  • the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute within the boundary, thereby obtaining the Bayesian network structure N 1 .
  • step S22 the semi-trusted third party and the second data owner P 2 update the network.
  • the semi-trusted third party uses N 1 and the calculated inter-information information to construct the boundary using the boundary construction method of correlation strength perception.
  • P 2 counts the marginal distribution of all attributes and their parent nodes within the boundary and sends them to semi-trusted third parties, which semi-trusted third parties accumulate them with the statistics of P 1 .
  • P 2 needs to add Laplace noise to the statistical marginal distribution.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution, leaving only the noise generated by P 2 .
  • the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary to obtain the Bayesian network structure N 2 .
  • step S23 the semi-trusted third party updates the network with the data owner P 3 , . . . P K until the final Bayesian network structure N K (ie the actual Bayesian network structure) is obtained.
  • learning the parameters in the actual Bayesian network structure may include: acquiring a conditional distribution of any of the attributes of the actual Bayesian network structure determined by each of the plurality of objects and the parent node of any of the attributes;
  • the optimal multi-party Laplace mechanism combines the acquired multiple conditional distributions into the actual conditional distribution of any attribute and the parent node of any attribute, wherein the actual conditional distribution carries Laplace noise.
  • the data owner counts the marginal distribution of all attributes in the Bayesian network structure - the parent node, and sends the statistics to the semi-trusted third party; the semi-trusted third party combines the corresponding marginal distribution of each attribute-parent node as the Attribute - the marginal distribution of the parent pair.
  • data owners and semi-trusted third parties use the optimal multi-party Laplace mechanism to add Laplace noise to the combined marginal distribution.
  • the semi-trusted third party uses the product of the actual conditional distribution of each attribute under the given parent node condition as a joint distribution of all attributes; the semi-trusted third party publishes data corresponding to all attributes generated by the joint distribution.
  • the above method can be implemented by a multi-party data distribution device (that is, a PrivSeq algorithm device), which includes four modules: a data pre-processing module, a Bayesian network structure learning module, a Bayesian parameter learning module, and a data generation module.
  • a data pre-processing module a Bayesian network structure learning module
  • a Bayesian parameter learning module a Bayesian parameter learning module
  • a data generation module The function of each module is as follows:
  • the data preprocessing module the data owner performs the following processing on the attribute set according to the value of each attribute of the data: first, the attribute whose value is a continuous value (such as an attribute whose height, age, and the like are in a continuous interval) is performed. The discretization process is converted into an attribute whose value is a discrete value, and then the attribute whose value is non-binary data is converted into an attribute whose value is binary data.
  • a continuous value such as an attribute whose height, age, and the like are in a continuous interval
  • the Bayesian network structure learning module constructs a Bayesian network for the attribute set of data, and has the functions of mutual information calculation of two-two attributes, Bayesian network structure initialization, and serial update Bayesian network structure.
  • the Bayesian parameter learning module calculates the edge distribution of each attribute node in the Bayesian network.
  • the data generation module regenerates the data according to the structure of the Bayesian network and the edge distribution of each attribute node.
  • the configuration of the device is as follows:
  • each data owner is configured with one type A server, and each data owner's data is stored in a respective class A server, and the class A server.
  • a data preprocessing module, a Bayesian network structure learning module and a Bayesian parameter learning module are arranged.
  • a Class B server is configured for the semi-trusted third party.
  • the Bayesian network structure learning module, the Bayesian parameter learning module and the data generation module are arranged on the Class B server.
  • the Class B server of the semi-trusted third party and the Class A server of each data owner are connected via the Internet.
  • the semi-trusted third party cooperates with the Class A servers of the Class B servers to perform data publishing for differential privacy protection according to the PrivSeq algorithm flow (ie, running the corresponding algorithm software).
  • nodes in the Bayesian network there are four nodes in the Bayesian network, namely node A, node B, node C, and node D, where A is the root node (that is, there is no parent node), and parent node B is the parent of A and C.
  • the node is A, and the parent nodes of D are A and C.
  • P (A, B, C, D) P (A) * P (B
  • a method for implementing multi-party data distribution satisfying differential privacy is provided, which can help users fully analyze and mine the value in data under the premise of protecting user privacy, and provides more basis for business promotion and scientific research.
  • the utility is improved to ensure the quality of the overall data service; the serial update mechanism is combined with the boundary construction method of the association strength perception to reasonably limit the amount of information transmitted between the data owner and the semi-trusted third party, thereby Reduce the communication overhead and reduce the cost of data services in a big data environment while using high-quality data from all parties.
  • FIG. 5 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application. 5, K hospitals to be described in detail in the present application (No. P 1, P 2, ..., P k, K ⁇ 2) medical syndication data as an example.
  • the medical data of K hospitals exist on their respective physical hosts, semi-trusted third parties and each The hospitals are connected via the internet.
  • the semi-trusted third party coordinates the parties to perform data release work (publishing overall medical data) that satisfies differential privacy protection according to the PrivSeq algorithm flow.
  • Step S502 each hospital uses its own data to count the marginal distribution of all the views in the previous step, and sends the statistical result to the semi-trusted third party, and the semi-trusted third party merges the corresponding marginal distribution of each view as the view.
  • the marginal distribution, K hospitals and semi-trusted third parties use the optimal multi-party Laplace mechanism to add Laplace noise to the combined marginal distribution;
  • Step S503 the semi-trusted third party calculates the mutual information of the two attributes in all views by using the marginal distribution containing the noise;
  • Step S504 the semi-trusted third-party designated hospital learns the parent node according to the order of P 1 , P 2 , . . . , P K , and specifies that the number of parent nodes of each attribute is at most k, and determines each hospital.
  • Step S505 P 1 will be divided into two attribute set A n-A h and A, A h is a set consisting of all the attributes have been selected parent node, n-A is a set consisting of all the attributes of the parent node is not selected, Wherein, the initial state of A h is empty;
  • Step S506 P 1 A n randomly selected from a property of X 1 ', referred to as a parent node will be empty, and X 1' to move from A n A h;
  • Step S507 P 1 is selected from A n an attribute X i, select min from A h in ⁇ k,
  • Step S508, P 1 process step S507 is repeated until it is Attributes select the parent node
  • Step S509 P 1 will set A h , A n and Group attribute - the parent node pair is sent to P 2 ;
  • Step S510, P 2 in accordance with step S507 and the processes of step S508 New attributes select the parent node and will aggregate A h , A n and Group attribute - the parent node pair is sent to P 3 ;
  • Steps S511, P 3 , . . . , P K repeat the process of step S510 until the parent node is selected for all the attributes, thereby obtaining the Bayesian network structure N 0 ;
  • Step S512 P K sends the initialized Bayesian network structure N 0 to the semi-trusted third party;
  • Step S513 the semi-trusted third party uses the mutual information between the attributes calculated by N 0 and step S503, and constructs the boundary by using the boundary construction method of the associated intensity perception;
  • Step S514 the boundary of the P 1 All statistical properties and marginal distribution parent node sends the semi-trusted third party, in order to satisfy the differential privacy requirements, P 1 Laplace noise to be added in the marginal distribution statistics;
  • Step S515 the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary range to obtain the Bayesian network structure N 1 ;
  • Step S516 the semi-trusted third-party mutual information between N 1 - calculated in step S503 attributes, building constructors associated boundary The boundary strength perceived;
  • Step S517 and the marginal distributions of all of the properties of the parent node sends the P 2 statistical boundary half trusted third party, which the semi-trusted third party in step S514 1 P statistics accumulation, in order to meet the requirements of the differential privacy P 2 needs to add Laplace noise to the marginal distribution of statistics.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution. Keep the noise generated by P 2 ;
  • Step S518, the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary range to obtain the Bayesian network structure N 2 ;
  • Step S519 repeating the process from step S516 to step S518, the semi-trusted third party and the hospital P 3 , . . . P K update the network until the final Bayesian network structure N K is obtained ;
  • Step S520 each hospital counts the marginal distribution of all the attributes of the parent node in the Bayesian network structure, and sends the statistical result to the semi-trusted third party;
  • Step S521 the semi-trusted third party merges the corresponding marginal distribution of each attribute-parent node as the marginal distribution of the attribute-parent pair, and the hospital and the semi-trusted third party use the optimal multi-party Laplace mechanism to add the merged marginal distribution.
  • Laplace noise
  • Step S522 the semi-trusted third party uses the product of the conditional distribution of each node of the given parent node in the Bayesian network containing the noise as a joint distribution of the data attributes;
  • step S523 the semi-trusted third party uses the joint distribution to generate new data.
  • FIG. 6 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application.
  • K stores detailed description of the present application (No. P 1, P 2, ..., P k, K ⁇ 2)
  • Example syndication overall purchase record Example syndication overall purchase record.
  • Step S601 The semi-trusted third party divides the attribute set A (including the user's name, gender, age, purchase commodity, etc.) by using the non-overlapping attribute division method to obtain a set of views, and the view is a set containing partial attributes.
  • the view V 1 (X 11 , X 12 , ..., X 1i );
  • Step S602 each store uses the data owned by itself to count the marginal distribution of all the views in the previous step, and sends the statistical result to the semi-trusted third party, and the semi-trusted third party merges the corresponding marginal distribution of each view as the view.
  • the marginal distribution, K stores and semi-trusted third parties use the optimal multi-party Laplace mechanism to add Laplace noise to the combined marginal distribution;
  • Step S603 the semi-trusted third party calculates the mutual information of the two attributes in all views by using the marginal distribution containing the noise;
  • Step S604 the semi-trusted third-party designated store learns the parent node according to the order of P 1 , P 2 , . . . , P K , and specifies that the number of parent nodes of each attribute is at most k, and determines each store.
  • Step S605 P 1 A property sets divided into two groups A h and A n, A h is a set consisting of all the attributes have been selected parent node, A n is a set consisting of all the attributes of a parent node is not selected. Wherein, the initial state of A h is empty;
  • Step S606 P 1 A n randomly selected from a property of X 1 ', referred to as a parent node will be empty, and X 1' to move from A n A h;
  • Step S607 P 1 is selected from A n an attribute X i, select min from A h in ⁇ k,
  • Step S608, P 1 process step S607 is repeated until as Attributes select the parent node
  • Step S609 P 1 will set A h , A n and Group attribute - the parent node pair is sent to P 2 ;
  • Step S610, P 2 according to the process of step S607 and step S608 is New attributes select the parent node and will aggregate A h , A n and Group attribute - the parent node pair is sent to P 3 ;
  • Steps S611, P 3 , . . . , P K repeat the process of step S610 until the parent node is selected for all the attributes, thereby obtaining the Bayesian network structure N 0 ;
  • Step S612 P K sends the initialized Bayesian network structure N 0 to the semi-trusted third party;
  • Step S613 the semi-trusted third party uses the mutual information between the attributes calculated by N 0 and step S603, and constructs the boundary by using the boundary construction method of the associated intensity perception;
  • Step S614 the marginal distribution of all property and its parent node within statistical boundaries P 1 and sent to semi-trusted third party.
  • P 1 needs to add Laplace noise to the marginal distribution of statistics;
  • Step S615 the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary range to obtain the Bayesian network structure N 1 ;
  • Step S616 semi-trusted third party using the mutual information between the N 1 - calculated in step S603 attributes, building constructors associated boundary The boundary strength perceived;
  • Step S617 and the marginal distributions of all of the properties of the parent node sends the P 2 statistical boundary half trusted third party, which the semi-trusted third party in step S614 1 P statistics accumulation, in order to meet the requirements of the differential privacy P 2 needs to add Laplace noise to the marginal distribution of statistics.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution. Keep the noise generated by P 2 ;
  • Step S618, the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary range to obtain the Bayesian network structure N 2 ;
  • Step S619 repeating the process from step S616 to step S618, the semi-trusted third party and the store P 3 , . . . P K update the network until the final Bayesian network structure N K is obtained ;
  • Step S620 each store counts the marginal distribution of all the attributes of the parent node in the Bayesian network structure, and sends the statistical result to the semi-trusted third party;
  • Step S621 the semi-trusted third party merges the corresponding marginal distribution of each attribute-parent node as the marginal distribution of the attribute-parent pair, and the store and the semi-trusted third party use the optimal multi-party Laplace mechanism to add the merged marginal distribution.
  • Laplace noise
  • Step S622 the semi-trusted third party uses the product of the conditional distribution of each node in the Bayesian network containing noise as the data attribute as a joint distribution;
  • step S623 the semi-trusted third party uses the joint distribution to generate new data.
  • FIG. 7 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application.
  • K bank to be described in detail in this application (numbered P 1, P 2, ..., P k, K ⁇ 2) syndication overall transaction information as an example.
  • the transaction information data of K banks exist on their respective physical hosts, and the semi-trusted third parties and each bank are connected via the Internet.
  • the semi-trusted third party coordinates the parties to perform data release (overall transaction information) that satisfies differential privacy protection according to the PrivSeq algorithm flow.
  • Step S701 The semi-trusted third party divides the attribute set A (including the attributes including name, gender, age, withdrawal amount, etc.) by using a non-overlapping attribute division method to obtain a set of views, and the view is a set including partial attributes, such as a view.
  • V 1 (X 11 , X 12 , ..., X 1i );
  • Step S702 each bank uses the data owned by itself to calculate the marginal distribution of all the views in the previous step, and sends the statistical result to the semi-trusted third party, and the semi-trusted third party merges the corresponding marginal distribution of each view as the view.
  • the marginal distribution, K banks and semi-trusted third parties use the optimal multi-party Laplace mechanism to add Laplace noise to the combined marginal distribution;
  • Step S703 the semi-trusted third party calculates the mutual information of the two attributes in all views by using the marginal distribution containing the noise;
  • Step S704 the semi-trusted third-party designated bank learns the parent node according to the order of P 1 , P 2 , . . . , P K , and specifies that the number of parent nodes of each attribute is at most k, and determines each bank.
  • Step S705 P 1 will be divided into two attribute set A n-A h and A, A h is a set consisting of all the attributes have been selected parent node, n-A is a set consisting of all the attributes of the parent node is not selected, Wherein, the initial state of A h is empty;
  • Step S706 P 1 A n randomly selected from a property of X 1 ', referred to as a parent node will be empty, and X 1' to move from A n A h;
  • Step S707 P 1 is selected from A n an attribute X i, select min from A h in ⁇ k,
  • Step S708, P 1 process step S707 is repeated until as Attributes select the parent node
  • Step S709 P 1 will set A h , A n and Group attribute - the parent node pair is sent to P 2 ;
  • Step S710, P 2 in accordance with step S707 and the processes of step S708 New attributes select the parent node and will aggregate A h , A n and Group attribute - the parent node pair is sent to P 3 ;
  • Steps S711, P 3 , . . . , P K repeat the process of step S710 until the parent node is selected for all the attributes, thereby obtaining the Bayesian network structure N 0 ;
  • Step S712 P K sends the initialized Bayesian network structure N 0 to the semi-trusted third party;
  • Step S713 the semi-trusted third party uses the mutual information between the attributes calculated by N 0 and step S703, and constructs the boundary by using the boundary construction method of the associated intensity perception;
  • Step S714 the marginal distribution of all property and its parent node within statistical boundaries P 1 and sent to semi-trusted third party.
  • P 1 needs to add Laplace noise to the marginal distribution of statistics;
  • Step S715 the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary range to obtain the Bayesian network structure N 1 ;
  • Step S716 semi-trusted third party using the mutual information between the N 1 - calculated in step S703 attributes, building constructors associated boundary The boundary strength perceived;
  • S717 is a step, and the marginal distributions of all of the properties of the parent node sends the P 2 statistical boundary half trusted third party, which the semi-trusted third party in step S714 1 P statistics accumulation, in order to meet the requirements of the differential privacy P 2 needs to add Laplace noise to the marginal distribution of statistics.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution. Keep the noise generated by P 2 ;
  • Step S719 repeating the process from step S716 to step S718, the semi-trusted third party and the bank P 3 , . . . P K update the network until the final Bayesian network structure N K is obtained ;
  • Step S720 each bank counts the marginal distribution of all the attributes of the parent node in the Bayesian network structure, and sends the statistical result to the semi-trusted third party;
  • Step S721 the semi-trusted third party merges the corresponding marginal distribution of each attribute-parent node as the marginal distribution of the attribute-parent pair, and the bank and the semi-trusted third party use the optimal multi-party Laplace mechanism to add the merged marginal distribution.
  • Laplace noise
  • Step S722 the semi-trusted third party uses the product of the conditional distribution of each node in the Bayesian network containing noise as the data attribute as a joint distribution;
  • Step S723 the semi-trusted third party uses the joint distribution to generate new data.
  • FIG. 8 is an exemplary schematic diagram of a data distribution system in accordance with an embodiment of the present application.
  • the K School (No. P 1, P 2, ..., P k, K ⁇ 2) jointly issued the overall student test scores as an example a detailed description of the application 8.
  • the test scores of K schools exist on their respective physical hosts.
  • the semi-trusted third parties and each school are connected via the Internet.
  • the semi-trusted third parties coordinate the parties to meet the data of differential privacy protection according to the PrivSeq algorithm process (the whole student) Exam results) release work.
  • Step S801 the semi-trusted third party divides the attribute set A (including the attributes including the student number, the name, the gender, the grade, etc.) by using the non-overlapping attribute division method to obtain a set of views, and the view is a set including partial attributes, such as a view.
  • V 1 (X 11 , X 12 , ..., X 1i );
  • Step S802 each school uses the data owned by itself to count the margins of all the views in the previous step. Distribution, and send the statistical results to semi-trusted third parties, semi-trusted third parties combine the corresponding marginal distribution of each view as the marginal distribution of the view, K schools and semi-trusted third parties use the optimal multi-party Laplace mechanism Add Laplace noise to the combined marginal distribution;
  • Step S803 the semi-trusted third party calculates the mutual information of the two attributes in all views by using the marginal distribution containing the noise;
  • Step S804 the semi-trusted third-party designated school learns the parent node according to the order of P 1 , P 2 , . . . , P K , and specifies that the number of parent nodes of each attribute is at most k, and determines each school.
  • Step S805 P 1 A property sets divided into two groups A h and A n, A h is a set consisting of all the attributes have been selected parent node, A n is a set consisting of all the attributes of a parent node is not selected. Obviously the initial state of A h is empty;
  • Step S806 P 1 A n randomly selected from a property of X 1 ', referred to as a parent node will be empty, and X 1' to move from A n A h;
  • Step S807 P 1 is selected from A n an attribute X i, select min from A h in ⁇ k,
  • P 1 takes the mutual information between the attribute and the candidate parent node as the scoring function, and uses the exponential mechanism to select a set of attributes - the parent node pair (X i , ⁇ i ) from all the candidate attribute-parent pairs and record it as (X 2 ' , ⁇ 2 ), ⁇ 2 is the parent of X 2 '. Then move X 2 ' from A n to A h ;
  • Step S808, P 1 process step S807 is repeated until it is Attributes select the parent node
  • Step S809 P 1 will set A h , A n and Group attribute - the parent node pair is sent to P 2 ;
  • Step S810, P 2 in accordance with step S807 and the processes of step S808 New attributes select the parent node and will aggregate A h , A n and Group attribute - the parent node pair is sent to P 3 ;
  • Steps S811, P 3 , . . . , P K repeat the process of step S810 until the parent node is selected for all the attributes, thereby obtaining the Bayesian network structure N 0 ;
  • Step S812 P K sends the initialized Bayesian network structure N 0 to the semi-trusted third party;
  • Step S813 the semi-trusted third party uses the mutual information between the attributes calculated by N 0 and step S803, and constructs the boundary by using the boundary construction method of the associated intensity perception;
  • Step S814 the marginal distribution of all property and its parent node within statistical boundaries P 1 and sent to semi-trusted third party.
  • P 1 needs to add Laplace noise to the marginal distribution of statistics;
  • Step S815 the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary range to obtain the Bayesian network structure N 1 ;
  • Step S816 semi-trusted third party using the mutual information between the N 1 - calculated in step S803, attributes, building constructors associated boundary The boundary strength perceived;
  • Step S817 and the marginal distributions of all of the properties of the parent node sends the P 2 statistical boundary half trusted third party, which the semi-trusted third party in step S814 1 P statistics accumulation, in order to meet the requirements of the differential privacy P 2 needs to add Laplace noise to the marginal distribution of statistics.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution. Keep the noise generated by P 2 ;
  • Step S819 repeating the process from step S816 to step S818, the semi-trusted third party and the school P 3 , . . . P K update the network until the final Bayesian network structure N K is obtained ;
  • Step S820 each school counts the marginal distribution of all the attributes of the parent node in the Bayesian network structure, and sends the statistical result to the semi-trusted third party;
  • Step S821 the semi-trusted third party merges the corresponding marginal distribution of each attribute-parent node as the marginal distribution of the attribute-parent pair, and the school and the semi-trusted third party use the optimal multi-party Laplace mechanism to add the merged marginal distribution.
  • Laplace noise
  • Step S822 the semi-trusted third party uses the product of the conditional distribution of each node in the Bayesian network containing noise as the data attribute as a joint distribution;
  • step S823 the semi-trusted third party uses the joint distribution to generate new data.
  • the differential privacy model in the data privacy domain is used to provide ⁇ -differential privacy protection for each data owner in the multi-party data joint publishing process, which can protect the privacy of users and provide a more secure data publishing strategy;
  • Serial Bayesian network update mechanism There is no overlapping attribute partitioning method and optimal multi-party Laplace mechanism, so that under the condition that each data owner's data satisfies ⁇ -differential privacy, the noise is added to the greatest extent, so that the utility of the published data is improved, and the whole is guaranteed.
  • the quality of the data service; using the serial update mechanism combined with the boundary construction method of the association strength perception, the amount of information transmitted between the data owner and the semi-trusted third party is reasonably limited, thereby providing high data in the comprehensive utilization of the parties. At the same time of quality service, reduce communication overhead and reduce the cost of data services in a big data environment.
  • a data distribution apparatus is also provided in the embodiment of the present application.
  • the device is used to implement the above embodiments and exemplary embodiments, and the description has been omitted.
  • the term "module” may implement software, hardware or a combination of software and hardware for a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 9 is a schematic diagram of a data distribution apparatus according to an embodiment of the present application. As shown in FIG. 9, the apparatus may include an update unit 91, a learning unit 92, and a distribution unit 93.
  • the updating unit 91 is configured to update an initial Bayesian network structure corresponding to the attribute set of the data, to obtain an updated actual Bayesian network structure;
  • the learning unit 92 is configured to learn parameters in the actual Bayesian network structure to obtain a target Bayesian network structure
  • a publishing unit 93 is configured to publish data corresponding to all attributes in the attribute set using the target Bayesian network structure.
  • the update unit updates the initial Bayesian network structure corresponding to the attribute set of the data, and obtains the updated actual Bayesian network structure; the learning unit learns the parameters in the actual Bayesian network structure to obtain the target Bayesian The network structure; the publishing unit uses the target Bayesian network structure to publish data corresponding to all the attributes in the attribute set, thereby improving the security when implementing multi-party data publishing in a big data environment, and realizing the security of data publishing.
  • the updating unit 91 may include: a first obtaining module configured to acquire first mutual information of any two attributes in the attribute set; and an updating module configured to perform stringing on the initial Bayesian network structure by using the first mutual information The row is updated to get the updated actual Bayesian network structure.
  • the first obtaining module may include: a dividing submodule configured to group the attributes Divided into multiple views, wherein each view includes a partial attribute in the attribute set; the merge sub-module is configured to merge the multiple marginal distributions corresponding to each view into each using the optimal multi-party Laplace mechanism The actual marginal distribution of the view, where the actual marginal distribution carries Laplace noise; the computational sub-module is configured to calculate the first mutual information of any two attributes in each view using the actual marginal distribution of each view.
  • the partitioning sub-module may be configured to divide the attribute set into a plurality of views by using a non-overlapping attribute dividing device, wherein the attribute pairs included in any two views do not overlap.
  • the merging sub-module may be configured to: obtain a marginal distribution of each view calculated based on data owned by each of the plurality of objects, wherein Laplace noise is added to the marginal distribution; The multiple marginal distributions are merged into the actual marginal distribution of each view, and the minimum noise among the multiple Laplace noises carried by the multiple marginal distributions is taken as the Laplacian noise of the actual marginal distribution.
  • the updating unit 91 may further include: a second obtaining module configured to acquire an initial Bayesian network structure including a parent-child relationship of all attributes in the attribute set, wherein the parent-child relationship is determined by the plurality of objects based on a specified manner .
  • the update module may include: a first update submodule configured to update the initial Bayesian network structure to obtain an updated first Bayesian network structure; and a second update submodule configured to be the jth
  • the -1 Bayesian network structure is updated to obtain an updated j-th Bayesian network structure, where j is a positive integer greater than 1 and less than k
  • a third update sub-module configured to be the k-1th Baye
  • the network structure is updated to obtain the actual Bayesian network structure.
  • the updating unit 91 in the above embodiment may be further configured to control a plurality of data owners to determine a parent-child relationship of all attributes in the attribute set based on an exponential mechanism, and determine an initial Bayesian network including a parent-child relationship of all attributes in the attribute set. structure.
  • the first update submodule may be configured to: construct, by using the first mutual information, a first boundary of the initial Bayesian network structure by using a boundary strength constructing boundary construction device; and acquiring the first object statistics of the plurality of objects a first boundary attribute and a first marginal distribution of the parent node of the attribute, wherein the first marginal distribution carries Laplace noise; and the exponential mechanism is used for each of the first boundaries
  • the attributes select the parent node and get the updated first Bayesian network structure.
  • the second update submodule may be configured to: construct, by using the first mutual information, a jth boundary of the j-1 Bayesian network structure by using a boundary strength constructing boundary construction device; and acquiring a jth object of the plurality of objects The attribute of the jth boundary and the jth marginal distribution of the parent node of the attribute, wherein the jth marginal distribution carries Laplacian noise; and the exponential mechanism is used to select a parent node for each attribute in the jth boundary, Get the updated j-Bayesian network structure.
  • the updating unit 91 can implement the above functions according to the following steps:
  • Step S21 the first half with a trusted third party data owner P 1 N 0 the initialization to update the network structure.
  • the semi-trusted third party uses N 0 and the previously calculated inter-attribute information to construct the boundary using the boundary construction method of correlation strength perception.
  • P 1 counts the marginal distribution of all attributes and their parent nodes in the boundary and sends them to semi-trusted third parties. In order to meet the differential privacy protection requirements, P 1 needs to add Laplace noise to the statistical marginal distribution.
  • the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute within the boundary, thereby obtaining the Bayesian network structure N 1 .
  • step S22 the semi-trusted third party and the second data owner P 2 update the network.
  • the semi-trusted third party uses N 1 and the calculated inter-information information to construct the boundary using the boundary construction method of correlation strength perception.
  • P 2 counts the marginal distribution of all attributes and their parent nodes within the boundary and sends them to semi-trusted third parties, which semi-trusted third parties accumulate them with the statistics of P 1 .
  • P 2 needs to add Laplace noise to the statistical marginal distribution.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution, leaving only the noise generated by P 2 .
  • the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary to obtain the Bayesian network structure N 2 .
  • step S23 the semi-trusted third party updates the network with the data owner P 3 , . . . P K until the final Bayesian network structure N K (ie the actual Bayesian network structure) is obtained.
  • the learning unit 92 may include: a third obtaining module configured to acquire a plurality of objects a conditional distribution of any of the attributes of the actual Bayesian network structure and the parent node of any of the attributes determined by each object; the merging module configured to merge the acquired plurality of conditional distributions using the optimal multi-party Laplace mechanism The actual conditional distribution of the parent node for either attribute and any attribute, where the actual conditional distribution carries Laplace noise.
  • the learning unit 92 can implement the above functions as follows:
  • Step S31 the first half with a trusted third party data owner P 1 N 0 the initialization to update the network structure.
  • the semi-trusted third party uses N 0 and the previously calculated inter-attribute information to construct the boundary using the boundary construction method of correlation strength perception.
  • P 1 counts the marginal distribution of all attributes and their parent nodes in the boundary and sends them to semi-trusted third parties. In order to meet the differential privacy protection requirements, P 1 needs to add Laplace noise to the statistical marginal distribution.
  • the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute within the boundary, thereby obtaining the Bayesian network structure N 1 .
  • step S32 the semi-trusted third party and the second data owner P 2 update the network.
  • the semi-trusted third party uses N 1 and the calculated inter-information information to construct the boundary using the boundary construction method of correlation strength perception.
  • P 2 counts the marginal distribution of all attributes and their parent nodes within the boundary and sends them to semi-trusted third parties, which semi-trusted third parties accumulate them with the statistics of P 1 .
  • P 2 needs to add Laplace noise to the statistical marginal distribution.
  • P 1 , P 2 and semi-trusted third parties use the safety function evaluation protocol to remove the Laplace noise generated by P 1 in the marginal distribution, leaving only the noise generated by P 2 .
  • the semi-trusted third party uses the exponential mechanism to select the parent node for each attribute in the boundary to obtain the Bayesian network structure N 2 .
  • step S33 the semi-trusted third party updates the network with the data owner P 3 , . . . P K until the final Bayesian network structure N K (ie the actual Bayesian network structure) is obtained.
  • the issuing unit 93 may include a processing module configured to use a product of an actual conditional distribution of each attribute under a given parent node condition as a joint distribution of all attributes; a publishing module configured to issue a correspondence generated by the joint distribution Data for all attributes.
  • a device for implementing multi-party data distribution satisfying differential privacy which can help users fully analyze and mine the value in data under the premise of protecting user privacy, and provides more basis for business promotion and scientific research.
  • the utility is improved to ensure the quality of the overall data service; the serial update mechanism is combined with the boundary construction method of the association strength perception to reasonably limit the amount of information transmitted between the data owner and the semi-trusted third party, thereby Reduce the communication overhead and reduce the cost of data services in a big data environment while using high-quality data from all parties.
  • modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, by the same processor; or by different processors.
  • the embodiment of the present application also provides a storage medium.
  • the storage medium may be configured to store program code for performing the following steps: S1, updating an initial Bayesian network structure corresponding to the attribute set of the data, and obtaining an updated actual Bayesian network structure. S2, learning the parameters in the actual Bayesian network structure, and obtaining the target Bayesian network structure; S3, using the target Bayesian network structure to publish data corresponding to all the attributes in the attribute set.
  • the storage medium is further configured to store program code for performing the following steps: S4, obtaining first mutual information of any two attributes in the attribute set; S5, initial first Bayesian network structure by the first mutual information Perform a serial update to get the updated actual Bayesian network structure.
  • the foregoing storage medium may include, but not limited to, a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, a magnetic disk, or an optical disk.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • mobile hard disk a magnetic disk
  • magnetic disk a magnetic disk
  • optical disk a variety of media that can store program code.
  • the processor may perform: updating an initial Bayesian network structure corresponding to the attribute set of the data according to the stored program code in the storage medium, and obtaining an updated actual Bayesian network structure; learning the actual Bayesian The parameters in the network structure obtain the target Bayesian network structure; the target Bayesian network structure is used to publish data corresponding to all the attributes in the attribute set.
  • the processor may perform: acquiring first mutual information of any two attributes in the attribute set according to the stored program code in the storage medium; serially updating the initial Bayesian network structure by using the first mutual information , get the updated actual Bayesian network structure.
  • Such software may be distributed on a computer readable medium, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer.
  • communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .
  • the embodiment of the present application provides a data publishing method, device, and device, which improve security when multi-party data is released in a big data environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种数据发布方法,包括:更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。

Description

数据发布方法和装置及终端 技术领域
本申请涉及但不限于数据安全领域,尤其涉及一种数据发布方法和装置及终端。
背景技术
满足隐私保护的数据发布(privacy-preserving data publishing)旨在发布数据的过程中保护用户的敏感信息。差分隐私保护模型的提出为解决满足隐私保护的数据发布问题提供了一种可行的方案。与传统的基于匿名的隐私保护模型(如k-匿名和l-多样性)不同,差分隐私保护模型提供了一种严格、可量化的隐私保护手段,并且所提供的隐私保护强度并不依赖于攻击者所掌握的背景知识。
当前,在单方场景下,PrivBayes(贝叶斯)方法解决了满足差分隐私的数据发布问题,它首先利用原始数据构建一个贝叶斯网络。为了满足隐私保护需求,在构建的贝叶斯网络中加入噪音,使其达到差分隐私保护要求;然后利用含有噪音的贝叶斯网络生成新的数据并发布。然而,单方场景下的数据发布方法不能直接应用于多方场景。在多方场景下,满足差分隐私的分布式数据生成算法(如DistDiffGen算法)解决了两方数据发布问题,而不能适用于多方场景下满足差分隐私的数据发布问题。协同搜索日志生成算法(如CELS算法)解决了多方搜索日志发布问题,但是不能解决多方场景下具有多个属性的数据发布问题,另外,该方法的隐私保护强度较低。基于上述分析,目前还不能实现大数据环境下满足差分隐私保护的多方数据发布。
发明概述
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种数据发布方法和装置及终端,能够提高在大数 据环境下多方数据发布时的安全性。
根据本申请实施例的一个方面,提供了一种数据发布方法,该方法包括:更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在示例性实施方式中,更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构,可以包括:获取属性集合中任意两个属性的第一互信息;通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
在示例性实施方式中,获取属性集合中任意两个属性的第一互信息,可以包括:将属性集合划分为多个视图,其中,每个视图包括属性集合中的部分属性;利用最优多方拉普拉斯机制,将对应于每个视图的多个边际分布合并为每个视图的实际边际分布,其中,实际边际分布中携带有拉普拉斯噪音;利用每个视图的实际边际分布计算每个视图中任意两个属性的第一互信息。
在示例性实施方式中,将属性集合划分为多个视图可以包括:采用无重叠属性划分方法将属性集合划分为多个视图,其中,任意两个视图所包括的属性对不重叠。
在示例性实施方式中,利用最优多方拉普拉斯机制,将对应于每个视图的多个边际分布合并为每个视图的实际边际分布,可以包括:获取基于多个对象中每个对象拥有的数据计算得到的每个视图的边际分布,其中,边际分布中添加有拉普拉斯噪音;将多个对象的多个边际分布合并为每个视图的实际边际分布,并将多个边际分布携带的多个拉普拉斯噪音中的最小噪音作为实际边际分布的拉普拉斯噪音。
在示例性实施方式中,在更新与数据的属性集合对应的初始贝叶斯网络结构之前,上述方法还可以包括:获取包括属性集合中所有属性的父子节点关系的初始贝叶斯网络结构,其中,父子节点关系由多个对象基于指定方式确定。
在示例性实施方式中,指定方式可以用于指示按照如下方式确定父子节点关系:多个对象中的第一对象将属性集合划分为第一集合和第二集合,其 中,第一集合用于保存已经确定父节点的属性,第一集合的初始状态为空,第二集合用于保存未确定父节点的属性;第一对象从第二集合中选取一个属性保存至第一集合;多个对象中的第i对象按照预设方式为第二集合中第一预设数量的属性确定父节点,并将确定了父节点的属性从第二集合迁移至第一集合,其中,i为小于k的正整数,k为多个对象的数量;多个对象中的第k对象按照预设方式为第二集合中第二预设数量的属性确定父节点,并将确定了父节点的属性从第二集合迁移至第一集合。
在示例性实施方式中,预设方式可以包括:获取第一集合中每个第一属性与第二属性的第二互信息,其中,第二属性为从第二集合中选取的属性;使用指数机制从多个第二互信息中选取出目标互信息,将与目标互信息对应的第一属性作为第二属性的父节点。
在示例性实施方式中,通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构,可以包括:对初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构;对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;对第k-1贝叶斯网络结构进行更新,得到实际贝叶斯网络结构。
在示例性实施方式中,对初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构,可以包括:利用第一互信息,采用关联强度感知的边界构造方法构建初始贝叶斯网络结构的第一边界;获取多个对象中的第一对象统计的第一边界内属性及该属性的父节点的第一边际分布,其中,第一边际分布中携带有拉普拉斯噪音;利用指数机制为第一边界内的每个属性选取父节点,得到更新后的第一贝叶斯网络结构。
在示例性实施方式中,对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,可以包括:利用第一互信息,采用关联强度感知的边界构造方法构建第j-1贝叶斯网络结构的第j边界;获取多个对象中第j对象统计的第j边界内属性及该属性的父节点的第j边际分布,其中,第j边际分布中携带有拉普拉斯噪音;利用指数机制为第j边界内的每个属性选取父节点,得到更新后的第j贝叶斯网络结构。
在示例性实施方式中,学习实际贝叶斯网络结构中的参数可以包括:获 取多个对象中每个对象确定的实际贝叶斯网络结构中任一属性和任一属性的父节点的条件分布;利用最优多方拉普拉斯机制将获取到的多个条件分布合并为任一属性和任一属性的父节点的实际条件分布,其中,实际条件分布中携带有拉普拉斯噪音。
在示例性实施方式中,利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据可以包括:将每个属性在给定父节点条件下的实际条件分布的乘积作为所有属性的联合分布;发布由联合分布生成的对应于所有属性的数据。
根据本申请实施例的另一个方面,提供了一种数据发布装置,该装置包括:更新单元,配置为更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习单元,配置为学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;发布单元,配置为利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在示例性实施方式中,更新单元可以包括:第一获取模块,配置为获取属性集合中任意两个属性的第一互信息;更新模块,配置为通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
在示例性实施方式中,第一获取模块可以包括:划分子模块,配置为将属性集合划分为多个视图,其中,每个视图包括属性集合中的部分属性;合并子模块,配置为利用最优多方拉普拉斯机制,将对应于每个视图的多个边际分布合并为每个视图的实际边际分布,其中,实际边际分布中携带有拉普拉斯噪音;计算子模块,配置为利用每个视图的实际边际分布计算每个视图中任意两个属性的第一互信息。
在示例性实施方式中,划分子模块可以配置为采用无重叠属性划分装置将属性集合划分为多个视图,其中,任意两个视图所包括的属性对不重叠。
在示例性实施方式中,合并子模块可以配置为:获取基于多个对象中每个对象拥有的数据计算得到的每个视图的边际分布,其中,边际分布中添加有拉普拉斯噪音;将多个对象的多个边际分布合并为每个视图的实际边际分布,并将多个边际分布携带的多个拉普拉斯噪音中的最小噪音作为实际边际分布的拉普拉斯噪音。
在示例性实施方式中,更新单元还可以包括:第二获取模块,配置为获取包括属性集合中所有属性的父子节点关系的初始贝叶斯网络结构,其中,父子节点关系由多个对象基于指定方式确定。
在示例性实施方式中,更新模块可以包括:第一更新子模块,配置为对初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构;第二更新子模块,配置为对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;第三更新子模块,配置为对第k-1贝叶斯网络结构进行更新,得到实际贝叶斯网络结构。
在示例性实施方式中,第一更新子模块可以配置为:利用第一互信息,采用关联强度感知的边界构造装置构建初始贝叶斯网络结构的第一边界;获取多个对象中的第一对象统计的第一边界内属性及该属性的父节点的第一边际分布,其中,第一边际分布中携带有拉普拉斯噪音;利用指数机制为第一边界内的每个属性选取父节点,得到更新后的第一贝叶斯网络结构。
在示例性实施方式中,第二更新子模块可以配置为:利用第一互信息,采用关联强度感知的边界构造装置构建第j-1贝叶斯网络结构的第j边界;获取多个对象中第j对象统计的第j边界内属性及该属性的父节点的第j边际分布,其中,第j边际分布中携带有拉普拉斯噪音;利用指数机制为第j边界内的每个属性选取父节点,得到更新后的第j贝叶斯网络结构。
在示例性实施方式中,学习单元可以包括:第三获取模块,配置为获取多个对象中每个对象确定的实际贝叶斯网络结构中任一属性和任一属性的父节点的条件分布;合并模块,配置为利用最优多方拉普拉斯机制将获取到的多个条件分布合并为任一属性和任一属性的父节点的实际条件分布,其中,实际条件分布中携带有拉普拉斯噪音。
在示例性实施方式中,发布单元可以包括:处理模块,配置为将每个属性在给定父节点条件下的实际条件分布的乘积作为所有属性的联合分布;发布模块,配置为发布由联合分布生成的对应于所有属性的数据。
根据本申请的另一个实施例,提供了一种终端,包括:处理器;配置为存储处理器可执行指令的存储器;配置为根据处理器的控制进行信息收发通信的传输装置;其中,处理器配置为执行以下操作:更新与数据的属性集合 对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在示例性实施方式中,处理器还可以配置为执行以下操作:获取属性集合中任意两个属性的第一互信息;通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
根据本申请的另一个实施例,提供了一种存储介质,存储介质可以被设置为存储用于执行以下步骤的程序代码:更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在本申请实施例中,更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据,从而提高了在大数据环境下多方数据发布时的安全性,实现了提高数据发布的安全性的技术效果。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1是实施根据本申请实施例提供的数据发布方法的计算机终端的示意图;
图2是一种数据发布系统的示意图;
图3是根据本申请实施例的数据发布方法的流程图;
图4是根据本申请实施例的数据发布系统的示例性示意图;
图5是根据本申请实施例的数据发布系统的示例性示意图;
图6是根据本申请实施例的数据发布系统的示例性示意图;
图7是根据本申请实施例的数据发布系统的示例性示意图;
图8是根据本申请实施例的数据发布系统的示例性示意图;
图9是根据本申请实施例的数据发布装置的示意图。
详述
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置(即终端)中执行。以运行在计算机终端上为例,如图1所示,计算机终端可以包括一个或多个(图中仅示出一个)处理器101(处理器101可以包括但不限于微处理器(MCU,Microcontroller Unit)或可编程逻辑器件(FPGA,Field Programmable Gate Array)等的处理装置)、用于存储数据的存储器103、以及用于通信功能的传输装置105。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。
存储器103可用于存储应用软件的软件程序以及模块,如本实施例中的数据发布方法对应的程序指令或模块,处理器101通过运行存储在存储器103内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器103可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器103可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
例如,上述处理器101配置为执行以下操作:更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在示例性实施方式中,处理器101还可以配置为执行以下操作:获取属性集合中任意两个属性的第一互信息;通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
传输装置105配置为经由一个网络接收或者发送数据。上述网络的实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输装置105包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通信。在一个实例中,传输装置105可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通信。
首先,在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释:
本地数据集:每个数据拥有者各自拥有属于自己的数据集。
半可信第三方(semi-trusted curator):第三方指协同一个或多个数据拥有者进行数据发布的个人或机构,半可信指第三方会严格遵守算法的相关协议规则协调一个或多个数据拥有者进行数据发布工作,但它可能在与数据拥有者交互信息的过程中,利用自己掌握的资源窃取数据中用户的隐私信息。
边缘分布(Marginal Distribution):也即边际分布,指统计学中常用的对多变量的概率密度函数针对某个变量进行求和,从而在结果中可以忽略该变量影响,所得到的概率分布。
例如:假设有三个变量x1,x2,x3联合概率分布为P(x1,x2,x3),则关于其中一个变量x1的边缘分布为
Figure PCTCN2017099042-appb-000001
则关于其中一个变量x2,x3的边缘分布为
Figure PCTCN2017099042-appb-000002
贝叶斯网络(Bayesian network):是一种概率图型模型,借由有向无环图(directed acyclic graphs)中得知一组随机变量及其改组条件概率分配(conditional probability distributions)。
边界(search frontier):它包含两部分,一部分是一组候选属性-父节点对(即属性对,表示为:<属性,父节点>)构成的集合,另一部分是由这些候选属性-父节点对的边缘分布构成,边界可以被看做是每个数据拥有者更新 贝叶斯网络结构的先验知识。
条件分布(Conditional Distribution):已知两个相关的随机变量X′和Y,随机变量Y在条件{X′=x}下的条件概率分布是指当已知X′的取值为某个特定值x之时,Y的概率分布。
差分隐私保护模型:差分隐私保护模型已成为数据分析领域标准的隐私保护模型,差分隐私保护模型具有严格的数学定义,并且不对攻击者所拥有的背景知识进行任何假设。给定数据库D和D’,假设D和D’相差一条且仅一条记录r,那么,对于满足差分隐私保护的数据分析算法A,其在数据库D和D’中的分析结果将具有近似相同的概率分布。在这种情况下,无论攻击者拥有如何丰富的背景知识,都无法判断记录r是否存在于数据库中。分析结果的相似性是通过隐私参数(即隐私预算)来控制的。隐私参数越小,说明算法的隐私保护强度越高。差分隐私保护模型是通过在数据分析的过程中加入噪音来保护用户的隐私。因此,如何在满足差分隐私保护的条件下,减少数据分析过程中加入的噪音量是相关研究中面临的主要挑战。对于任意两个数据库,假如它们相差一条且仅一条记录,则称这两个数据库为相邻数据库。差分隐私保护模型的定义如下。
差分隐私保护模型:给定算法A,假设数据库D和D’为任意相邻数据库。对于算法A的任意可能输出结果S,如果算法A在数据库D中输出S的概率与算法A在数据库D’中输出S的概率的比值小于常数值e,称算法A满足差分隐私保护。即Pr[A(D)∈S]≤eε×Pr[A(D')∈S]。从概率分布的角度来看,差分隐私保护模型使得任何记录对于算法最终分析结果的影响都是有限的。
指数机制:给定数据库D,输出为一实体对象r∈Range,u(D,r)为可用性函数,Δu为函数u(D,r)的敏感度,若算法A以正比于
Figure PCTCN2017099042-appb-000003
的概率从Range中选择输出r,则算法A满足差分隐私保护。
如图2所示,数据发布系统包括数据拥有者(P1、P2,…,Pk),每个数据拥有者都有各自的数据(即保存在数据仓库D中的D1、D2、…,Dk),半可信第三方T将数据仓库中的数据处理之后发布数据D’给数据分析者U,在目前的发布系统中,数据拥有者、半可信第三方以及数据分析者均可能利用其掌握的技能对数据仓库发起攻击(如攻击1、攻击2、攻击3),从而造 成了当前的数据发布系统的安全性较低。而利用本申请的方法恰好可以解决上述问题。
根据本申请实施例,提供了一种数据发布方法的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
图3是根据本申请实施例的数据发布方法的流程图,如图3所示,该方法包括如下步骤:
步骤S301,更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;
步骤S302,学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;
步骤S303,利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
通过上述实施例,更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据,从而提高了在大数据环境下实现多方数据发布时的安全性,实现了提高数据发布的安全性的技术效果。
上述的参数即贝叶斯网络的参数,如指贝叶斯网络中,每个节点在其父节点被给定的情况下的条件分布。
示例性地,上述步骤S301至S303可以在半可信第三方所使用的终端上运行,或者在由半可信第三方和数据拥有者组成的网络中的终端设备上运行,数据拥有者的数量可以为多个。
例如,数据拥有者初始化对应于属性集合的初始贝叶斯网络结构并发送给半可信第三方;半可信第三方和数据拥有者通过第一互信息串行更新初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;半可信第三方和数据拥有者并行学习实际贝叶斯网络结构中的参数;半可信第三方利用学习到参 数后的实际贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在步骤S301中,更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构,可以包括:获取属性集合中任意两个属性的第一互信息;通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
示例性地,获取属性集合中任意两个属性的第一互信息可以包括:将属性集合划分为多个视图,其中,每个视图包括属性集合中的部分属性;利用最优多方拉普拉斯机制,将对应于每个视图的多个边际分布合并为每个视图的实际边际分布,其中,实际边际分布中携带有拉普拉斯噪音;利用每个视图的实际边际分布计算每个视图中任意两个属性的第一互信息。
需要说明的是,将属性集合划分为多个视图时,采用无重叠属性划分方法将属性集合划分为多个视图,其中,任意两个视图所包括的属性对不重叠,在得到的一组视图(即多个视图)中,视图为包含部分属性的集合,如视图V1=(X11,X12,...,X1i)。
半可信第三方和数据拥有者协同计算数据的属性集合中任意两个属性的第一互信息,半可信第三方将属性集合划分为多个视图,其中,多个视图中每个视图包括属性集合中的部分属性,多个视图中任意两个视图所包括的属性对不重叠;多个数据拥有者中的每个数据拥有者利用自己拥有的数据计算每个视图的边际分布;半可信第三方和多个数据拥有者利用最优多方拉普拉斯机制将多个边际分布合并(如将多个边际分布加在一起)为每个视图的实际边际分布,其中,多个边际分布为多个数据拥有者分别计算得到的边际分布,实际边际分布中携带有拉普拉斯噪音;半可信第三方利用每个视图的实际边际分布计算每个视图中任意两个属性的第一互信息。
利用最优多方拉普拉斯机制,将对应于每个视图的多个边际分布合并为每个视图的实际边际分布,可以包括:获取基于多个对象(即数据拥有者)中每个对象拥有的数据计算得到的每个视图的边际分布,其中,边际分布中添加有拉普拉斯噪音;将多个对象的多个边际分布合并为每个视图的实际边际分布,为了满足差分隐私保护要求,数据拥有者和半可信第三方利用最优多方Laplace机制(即最优多方拉普拉斯机制)为合并的边际分布添加Laplace 噪音,即将多个边际分布携带的多个拉普拉斯噪音中的最小噪音作为实际边际分布的拉普拉斯噪音。
示例性地,半可信第三方和多个数据拥有者利用最优多方拉普拉斯机制,将多个边际分布合并为每个视图的实际边际分布可以包括:每个数据拥有者利用自己拥有的数据统计上一步中所有视图的边际分布,每个数据拥有者将计算得到的边际分布发送给半可信第三方,其中,边际分布中添加有拉普拉斯噪音;半可信第三方将多个边际分布合并(如以累加的形式合并)为每个视图的实际边际分布,为了满足差分隐私保护要求,数据拥有者和半可信第三方利用最优多方Laplace机制(即最优多方拉普拉斯机制)为合并的边际分布添加Laplace噪音,即将多个边际分布携带的多个拉普拉斯噪音中的最小噪音作为实际边际分布的拉普拉斯噪音。
在更新与数据的属性集合对应的初始贝叶斯网络结构之前,可获取包括属性集合中所有属性的父子节点关系的初始贝叶斯网络结构,其中,父子节点关系由多个对象基于指定方式确定。即多个数据拥有者基于指数机制确定属性集合中所有属性的父子节点关系,并确定包括属性集合中所有属性的父子节点关系的初始贝叶斯网络结构。
上述的贝叶斯网络结构初始化是指数据拥有者共同为所有属性选择初始的父节点,构造初始的k度贝叶斯网络结构(其中,k度表示每个属性的父节点个数至多为k)。
示例性地,指定方式可以用于指示按照如下方式确定父子节点关系:多个对象中的第一对象将属性集合划分为第一集合和第二集合,其中,第一集合用于保存已经确定父节点的属性,第一集合的初始状态为空,第二集合用于保存未确定父节点的属性;第一对象从第二集合中选取一个属性保存至第一集合;多个对象中的第i对象按照预设方式为第二集合中第一预设数量的属性确定父节点,并将确定了父节点的属性从第二集合迁移至第一集合,其中,i为小于k的正整数,k为多个对象的数量;多个对象中的第k对象按照预设方式为第二集合中第二预设数量的属性确定父节点,并将确定了父节点的属性从第二集合迁移至第一集合。
上述的预设方式可以指:获取第一集合中每个第一属性与第二属性的第 二互信息,其中,第二属性为从第二集合中选取的属性;使用指数机制从多个第二互信息中选取出目标互信息,将与目标互信息对应的第一属性作为第二属性的父节点。
示例性地,上述实施例可通过如下步骤实现:
步骤S11,半可信第三方指定数据拥有者按照P1,P2,...,PK的顺序为属性学习父节点,并确定每个数据拥有者所需学习的属性的个数,前(K-1)个数据拥有者每人学习
Figure PCTCN2017099042-appb-000004
个(符号
Figure PCTCN2017099042-appb-000005
表示向下取整),第PK个学习
Figure PCTCN2017099042-appb-000006
个。其中,d为属性集合中属性的数量。
步骤S12,第一个数据拥有者P1
Figure PCTCN2017099042-appb-000007
个属性学习父节点。
P1将属性集A分成两组Ah(即第一集合)和An(即第二集合),Ah是由所有已经选定父节点的属性构成的集合,An是由所有未选定父节点的属性构成的集合。其中,Ah初始状态为空。
P1从An中随机选取一个属性X1',将其父节点记为空,并将X1'从An移至Ah
P1从An中选取一个属性Xi,从Ah中选取min{k,|Ah|}个属性组成Πi,Xi和Πi构成一组候选属性-父节点对。P1以属性和候选父节点间互信息为评分函数,利用指数机制从所有的候选属性-父节点对中选取一组属性-父节点对(Xii)并记为(X2',Π2),Π2为X2'的父节点,然后将X2'从An移至Ah
P1重复上述过程,直至为
Figure PCTCN2017099042-appb-000008
个属性选定父节点。
P1将集合Ah、An
Figure PCTCN2017099042-appb-000009
组属性-父节点对发送给P2
步骤S13,P2
Figure PCTCN2017099042-appb-000010
个新的属性选定父节点,并将集合Ah、An
Figure PCTCN2017099042-appb-000011
组属性-父节点对发送给P3
步骤S14,PK将初始化的贝叶斯网络结构N0发送给半可信第三方。
在步骤S301中,通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构,可以包括:对初始贝叶斯网络结构进行 更新,得到更新后的第一贝叶斯网络结构;对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;对第k-1贝叶斯网络结构进行更新,得到实际贝叶斯网络结构。
半可信第三方和数据拥有者通过第一互信息串行更新初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构包括:半可信第三方与多个数据拥有者中的第一数据拥有者对初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构;半可信第三方与多个数据拥有者中的第j数据拥有者对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;半可信第三方与多个数据拥有者中的第k数据拥有者对第k-1贝叶斯网络结构进行更新,得到实际贝叶斯网络结构。
示例性地,对初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构可以包括:利用第一互信息,采用关联强度感知的边界构造方法构建初始贝叶斯网络结构的第一边界;获取多个对象中的第一对象统计的第一边界内属性及该属性的父节点的第一边际分布,其中,第一边际分布中携带有拉普拉斯噪音;利用指数机制为第一边界内的每个属性选取父节点,得到更新后的第一贝叶斯网络结构。
半可信第三方利用第一互信息,采用关联强度感知的边界构造方法构建初始贝叶斯网络结构的第一边界;第一数据拥有者统计第一边界内属性及该属性的父节点的第一边际分布,并将加入有拉普拉斯噪音的第一边际分布发送给半可信第三方;半可信第三方利用指数机制为第一边界内的每个属性选取父节点,得到更新后的第一贝叶斯网络结构。
示例性地,对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构包括:利用第一互信息,采用关联强度感知的边界构造方法构建第j-1贝叶斯网络结构的第j边界;获取多个对象中第j对象统计的第j边界内属性及该属性的父节点的第j边际分布,其中,第j边际分布中携带有拉普拉斯噪音;利用指数机制为第j边界内的每个属性选取父节点,得到更新后的第j贝叶斯网络结构。
半可信第三方与多个数据拥有者中的第j数据拥有者对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构包括:半可信第三方利用 第一互信息,采用关联强度感知的边界构造方法构建第j-1贝叶斯网络结构的第j边界;第j数据拥有者统计第j边界内属性及该属性的父节点的第j边际分布,并将加入有拉普拉斯噪音的第j边际分布发送给半可信第三方;半可信第三方利用指数机制为第j边界内的每个属性选取父节点,从而得到更新后的第j贝叶斯网络结构。
在贝叶斯网络结构学习过程中,统计信息中加入的噪音量与候选属性-父节点对(即属性对)的数量成正比。为了减少噪音加入,提高数据效用,可利用边界合理限制候选属性-父节点对的数量。然而,这样必然会造成一定的信息损失。为了减少这种信息损失,边界内需包含更多有效的候选属性-父节点对,与某一属性关联强度越强的属性越有可能成为其父节点,因此,可利用关联强度感知的边界构造方法进行边界构造,该方法的基本思想是在关联强度较强的属性间添加边,过程如下:
步骤1,给定贝叶斯网络结构和两两属性间互信息大小,其中,属性间互信息大小用来度量属性间关联强度,互信息越大,关联强度越强。
步骤2,优先选取互信息最大的属性对,如果该属性对在当前贝叶斯网络结构中存在边,则重新选取属性对;否则,执行步骤3。
步骤3,如果该属性对对应的两个属性均不需添加父节点,则返回步骤2;如果只有其中一个属性需要添加父节点,则在属性对之间添加边,并令另一个属性作为该属性的父节点,同时避免出现环;如果两个属性均需添加父节点,则执行以下步骤来确定边的方向。
步骤4,若边的方向不同,则会影响属性间的依赖关系,从而影响后面边的选取,进而影响最终边界的构造,选取边的方向时,尽量使得最终的边界包含更多有效的候选属性-父节点对,为了判断边的方向对最终边界的影响,可引入稀疏度Sparse(x)和影响度Impact(x,y)。其中,稀疏度Sparse(x)表示该属性x的所有祖先节点还需添加的父节点总数,优先为稀疏度大的节点添加父节点;影响度Impact(x,y)表示确定边的方向为x指向y后将不能被添加到网络结构中边的数量,优先选定影响度小的方向。本文中,当Sparse(x)·Impact(x,y)≤Sparse(y)·Impact(y,x),选定方向为x指向y。
执行步骤2至步骤4,直至为所有属性选取一定的父节点,则边界构造 完成。
示例性地,步骤S301可以通过如下步骤实现:
步骤S21,半可信第三方与第一个数据拥有者P1对初始化网络结构N0进行更新。
半可信第三方利用N0和之前计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界。
P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音。
半可信第三方利用指数机制在边界范围内为每个属性选取父节点,从而得到贝叶斯网络结构N1
步骤S22,半可信第三方与第二个数据拥有者P2对网络进行更新。
半可信第三方利用N1和计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界。
P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与P1的统计结果累加。为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音。为了提高边际分布的数据效用,P1、P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音。
半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S23,半可信第三方与数据拥有者P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK(即实际贝叶斯网络结构)。
在步骤S302中,学习实际贝叶斯网络结构中的参数可以包括:获取多个对象中每个对象确定的实际贝叶斯网络结构中任一属性和任一属性的父节点的条件分布;利用最优多方拉普拉斯机制将获取到的多个条件分布合并为任一属性和任一属性的父节点的实际条件分布,其中,实际条件分布中携带有拉普拉斯噪音。
多个数据拥有者获取实际贝叶斯网络结构中任一属性和任一属性的父节 点的条件分布;多个数据拥有者和半可信第三方利用最优多方拉普拉斯机制将多个条件分布合并为任一属性和任一属性的父节点的实际条件分布,其中,多个条件分布为多个数据拥有者分别获取的任一属性和任一属性的父节点的条件分布,实际条件分布中携带有拉普拉斯噪音。
数据拥有者统计贝叶斯网络结构中所有属性-父节点的边际分布,并将统计结果发送给半可信第三方;半可信第三方将每个属性-父节点相应的边际分布合并作为该属性-父节点对的边际分布。为了满足差分隐私保护要求,数据拥有者和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音。
在步骤S303中,利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据可以包括:将每个属性在给定父节点条件下的实际条件分布的乘积作为所有属性的联合分布;发布由联合分布生成的对应于所有属性的数据。
半可信第三方将每个属性在给定父节点条件下的实际条件分布的乘积作为所有属性的联合分布;半可信第三方发布由联合分布生成的对应于所有属性的数据。
上述的方法可以通过多方数据发布的装置(也即PrivSeq算法装置)实现,该装置包括四个模块:数据预处理模块、贝叶斯网络结构学习模块、贝叶斯参数学习模块和数据生成模块。每个模块的功能如下:
数据预处理模块,数据拥有者根据数据的每个属性的取值,对属性集进行如下处理:先将取值为连续值的属性(如身高、年龄等取值范围为连续区间的属性)进行离散化处理,转化成取值为离散值的属性,再将取值为非二进制数据的属性,转换成取值为二进制数据的属性。
贝叶斯网络结构学习模块,为数据的属性集构建贝叶斯网络,具有两两属性的互信息计算、贝叶斯网络结构初始化、串行更新贝叶斯网络结构等功能。
贝叶斯参数学习模块,计算贝叶斯网络中每个属性节点的边缘分布。
数据生成模块,根据贝叶斯网络的结构和每个属性节点的边缘分布,重新生成数据。
在多方数据发布过程中,该装置的配置说明如下:
如图4所示,假设K个数据拥有者联合进行数据发布,则为每个数据拥有者配置一台A类服务器,每个数据拥有者的数据存储于各自的A类服务器上,A类服务器上布置了数据预处理模块、贝叶斯网络结构学习模块和贝叶斯参数学习模块。同时,为半可信第三方配置一台B类服务器,B类服务器上布置了贝叶斯网络结构学习模块、贝叶斯参数学习模块和数据生成模块。半可信第三方的B类服务器和每个数据拥有者的A类服务器通过互联网连接。半可信第三方根据PrivSeq算法流程(即运行相应的算法软件)通过B类服务器协调各方的A类服务器进行满足差分隐私保护的数据发布工作。
例如,贝叶斯网络中存在四个节点,分别为节点A、节点B、节点C以及节点D,其中,A为根节点(即不存在父节点),B的父节点为A,C的父节点为A,D的父节点为A和C。那么属性A、B、C、D的联合分布为:P(A,B,C,D)=P(A)*P(B|A)*P(C|A)*P(D|A,C)。
在上述实施例中,提供了一种实现满足差分隐私的多方数据发布的方法,能够在保护用户隐私的前提下帮助用户充分分析和挖掘数据中的价值,为业务推广和科学研究提供更多依据。运用数据隐私领域领先的差分隐私模型在多方数据联合发布过程为每个数据拥有者的数据提供ε-差分隐私保护,可以保障用户的隐私,提供更安全的数据发布策略;采用串行的贝叶斯网络更新机制,并结合无重叠属性划分方法和最优多方Laplace机制,从而在每个数据拥有者的数据满足ε-差分隐私的条件下,最大程度地减少噪音的加入,使得发布的数据的效用得到提升,保证整体数据服务的质量;采用串行更新机制并结合关联强度感知的边界构造方法,对数据拥有者和半可信第三方之间传递的信息量进行合理的限制,从而在综合利用各方数据提供高质量服务的同时,减少通信开销,降低大数据环境下数据服务的成本。
下面结合附图及实施例对本申请进行详细说明。
图5是根据本申请实施例的数据发布系统的示例性示意图。如图5所示,以K个医院(编号为P1、P2、…,Pk,K≥2)联合发布医疗数据为例对本申请进行详细描述。
K个医院的医疗数据分别存在于各自的物理主机上,半可信第三方和每 个医院通过互联网连接。半可信第三方根据PrivSeq算法流程协调各方进行满足差分隐私保护的数据发布工作(发布整体医疗数据)。
步骤S501,半可信第三方采用无重叠属性划分方法对属性集A(如包含姓名、性别、年龄、疾病等属性)进行划分,得到一组视图,视图为包含部分属性的集合,如视图V1=(X11,X12,...,X1i);
步骤S502,每个医院利用自己拥有的数据统计上一步中所有视图的边际分布,并将统计结果发送给半可信第三方,半可信第三方将每个视图相应的边际分布合并作为该视图的边际分布,K个医院和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S503,半可信第三方利用含有噪音的边际分布,计算所有视图中两两属性的互信息;
步骤S504,半可信第三方指定医院按照P1,P2,...,PK的顺序为属性学习父节点,规定每个属性的父节点个数至多为k,并确定每个医院所需学习的属性的个数,前(K-1)个医院分别学习
Figure PCTCN2017099042-appb-000012
个,第PK个学习
Figure PCTCN2017099042-appb-000013
个;
步骤S505,P1将属性集A分成两组Ah和An,Ah是由所有已经选定父节点的属性构成的集合,An是由所有未选定父节点的属性构成的集合,其中,Ah初始状态为空;
步骤S506,P1从An中随机选取一个属性X1',将其父节点记为空,并将X1'从An移至Ah
步骤S507,P1从An中选取一个属性Xi,从Ah中选取min{k,|Ah|}个属性组成Πi,Xi和Πi构成一组候选属性-父节点对,P1以属性和候选父节点间互信息为评分函数,利用指数机制从所有的候选属性-父节点对中选取一组属性-父节点对(Xii)并记为(X2',Π2),Π2为X2'的父节点,然后将X2'从An移至Ah
步骤S508,P1重复步骤S507过程,直至为
Figure PCTCN2017099042-appb-000014
个属性选定父节点;
步骤S509,P1将集合Ah,An
Figure PCTCN2017099042-appb-000015
组属性-父节点对发送给P2
步骤S510,P2按照步骤S507和步骤S508过程为
Figure PCTCN2017099042-appb-000016
个新的属性选定父 节点并将集合Ah,An
Figure PCTCN2017099042-appb-000017
组属性-父节点对发送给P3
步骤S511,P3,...,PK重复步骤S510过程直至为所有属性选定父节点,从而得到贝叶斯网络结构N0
步骤S512,PK将初始化的贝叶斯网络结构N0发送给半可信第三方;
步骤S513,半可信第三方利用N0和步骤S503中计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界;
步骤S514,P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音;
步骤S515,半可信第三方利用指数机制在边界范围内为每个属性选取父节点从而得到贝叶斯网络结构N1
步骤S516,半可信第三方利用N1和步骤S503中计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界;
步骤S517,P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与步骤S514中P1的统计结果累加,为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音,为了提高边际分布的数据效用,P1、P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音;
步骤S518,半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S519,重复步骤S516至步骤S518过程,半可信第三方与医院P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK
步骤S520,每个医院统计贝叶斯网络结构中所有属性-父节点的边际分布,并将统计结果发送给半可信第三方;
步骤S521,半可信第三方将每个属性-父节点相应的边际分布合并作为该属性-父节点对的边际分布,医院和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S522,半可信第三方将含有噪音的贝叶斯网络中每个节点在给定父节点的条件分布的乘积作为数据属性的联合分布;
步骤S523,半可信第三方利用该联合分布生成新的数据。
图6是根据本申请实施例的数据发布系统的示例性示意图。如图6所示,以K个商店(编号为P1、P2、…,Pk,K≥2)联合发布整体购买记录为例对本申请进行详细描述。
K个商店的购买记录分别存在于各自的物理主机上,半可信第三方和每个商店通过互联网连接,半可信第三方根据PrivSeq算法流程协调各方进行满足差分隐私保护的数据(整体购买记录)发布工作。
步骤S601,半可信第三方采用无重叠属性划分方法对属性集A(如包含用户的姓名,性别,年龄,购买商品等属性)进行划分,得到一组视图,视图为包含部分属性的集合,如视图V1=(X11,X12,...,X1i);
步骤S602,每个商店利用自己拥有的数据统计上一步中所有视图的边际分布,并将统计结果发送给半可信第三方,半可信第三方将每个视图相应的边际分布合并作为该视图的边际分布,K个商店和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S603,半可信第三方利用含有噪音的边际分布,计算所有视图中两两属性的互信息;
步骤S604,半可信第三方指定商店按照P1,P2,...,PK的顺序为属性学习父节点,规定每个属性的父节点个数至多为k,并确定每个商店所需学习的属性的个数,前(K-1)个商店分别学习
Figure PCTCN2017099042-appb-000018
个,第PK个学习
Figure PCTCN2017099042-appb-000019
个;
步骤S605,P1将属性集A分成两组Ah和An,Ah是由所有已经选定父节点的属性构成的集合,An是由所有未选定父节点的属性构成的集合。其中,Ah初始状态为空;
步骤S606,P1从An中随机选取一个属性X1',将其父节点记为空,并将X1'从An移至Ah
步骤S607,P1从An中选取一个属性Xi,从Ah中选取min{k,|Ah|}个属性组 成Πi,Xi和Πi构成一组候选属性-父节点对,P1以属性和候选父节点间互信息为评分函数,利用指数机制从所有的候选属性-父节点对中选取一组属性-父节点对(Xii)并记为(X2',Π2),Π2为X2'的父节点。然后将X2'从An移至Ah
步骤S608,P1重复步骤S607过程,直至为
Figure PCTCN2017099042-appb-000020
个属性选定父节点;
步骤S609,P1将集合Ah,An
Figure PCTCN2017099042-appb-000021
组属性-父节点对发送给P2
步骤S610,P2按照步骤S607和骤S608过程为
Figure PCTCN2017099042-appb-000022
个新的属性选定父节点并将集合Ah,An
Figure PCTCN2017099042-appb-000023
组属性-父节点对发送给P3
步骤S611,P3,...,PK重复步骤S610过程直至为所有属性选定父节点,从而得到贝叶斯网络结构N0
步骤S612,PK将初始化的贝叶斯网络结构N0发送给半可信第三方;
步骤S613,半可信第三方利用N0和步骤S603中计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界;
步骤S614,P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方。为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音;
步骤S615,半可信第三方利用指数机制在边界范围内为每个属性选取父节点从而得到贝叶斯网络结构N1
步骤S616,半可信第三方利用N1和步骤S603中计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界;
步骤S617,P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与步骤S614中P1的统计结果累加,为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音,为了提高边际分布的数据效用,P1,P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音;
步骤S618,半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S619,重复步骤S616至步骤S618过程,半可信第三方与商店P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK
步骤S620,每个商店统计贝叶斯网络结构中所有属性-父节点的边际分布,并将统计结果发送给半可信第三方;
步骤S621,半可信第三方将每个属性-父节点相应的边际分布合并作为该属性-父节点对的边际分布,商店和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S622,半可信第三方将含有噪音的贝叶斯网络中每个节点在给定父节点的条件分布的乘积作为数据属性的联合分布;
步骤S623,半可信第三方利用该联合分布生成新的数据。
图7是根据本申请实施例的数据发布系统的示例性示意图。如图7所示,以K个银行(编号为P1、P2、…,Pk,K≥2)联合发布整体交易信息为例对本申请进行详细描述。
K个银行的交易信息数据分别存在于各自的物理主机上,半可信第三方和每个银行通过互联网连接。半可信第三方根据PrivSeq算法流程协调各方进行满足差分隐私保护的数据(整体交易信息)发布工作。
步骤S701,半可信第三方采用无重叠属性划分方法对属性集A(如包含姓名,性别,年龄,取款金额等属性)进行划分,得到一组视图,视图为包含部分属性的集合,如视图V1=(X11,X12,...,X1i);
步骤S702,每个银行利用自己拥有的数据统计上一步中所有视图的边际分布,并将统计结果发送给半可信第三方,半可信第三方将每个视图相应的边际分布合并作为该视图的边际分布,K个银行和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S703,半可信第三方利用含有噪音的边际分布,计算所有视图中两两属性的互信息;
步骤S704,半可信第三方指定银行按照P1,P2,...,PK的顺序为属性学习父节点,规定每个属性的父节点个数至多为k,并确定每个银行所需学习的属 性的个数,前(K-1)个银行分别学习
Figure PCTCN2017099042-appb-000024
个,第PK个学习
Figure PCTCN2017099042-appb-000025
个;
步骤S705,P1将属性集A分成两组Ah和An,Ah是由所有已经选定父节点的属性构成的集合,An是由所有未选定父节点的属性构成的集合,其中,Ah初始状态为空;
步骤S706,P1从An中随机选取一个属性X1',将其父节点记为空,并将X1'从An移至Ah
步骤S707,P1从An中选取一个属性Xi,从Ah中选取min{k,|Ah|}个属性组成Πi,Xi和Πi构成一组候选属性-父节点对,P1以属性和候选父节点间互信息为评分函数,利用指数机制从所有的候选属性-父节点对中选取一组属性-父节点对(Xii)并记为(X2',Π2),Π2为X2'的父节点,然后将X2'从An移至Ah
步骤S708,P1重复步骤S707过程,直至为
Figure PCTCN2017099042-appb-000026
个属性选定父节点;
步骤S709,P1将集合Ah、An
Figure PCTCN2017099042-appb-000027
组属性-父节点对发送给P2
步骤S710,P2按照步骤S707和步骤S708过程为
Figure PCTCN2017099042-appb-000028
个新的属性选定父节点并将集合Ah,An
Figure PCTCN2017099042-appb-000029
组属性-父节点对发送给P3
步骤S711,P3,...,PK重复步骤S710过程直至为所有属性选定父节点,从而得到贝叶斯网络结构N0
步骤S712,PK将初始化的贝叶斯网络结构N0发送给半可信第三方;
步骤S713,半可信第三方利用N0和步骤S703中计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界;
步骤S714,P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方。为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音;
步骤S715,半可信第三方利用指数机制在边界范围内为每个属性选取父节点从而得到贝叶斯网络结构N1
步骤S716,半可信第三方利用N1和步骤S703中计算出的属性间互信息, 采用关联强度感知的边界构造方法构建边界;
步骤S717,P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与步骤S714中P1的统计结果累加,为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音,为了提高边际分布的数据效用,P1、P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音;
步骤S718,半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S719,重复步骤S716至步骤S718过程,半可信第三方与银行P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK
步骤S720,每个银行统计贝叶斯网络结构中所有属性-父节点的边际分布,并将统计结果发送给半可信第三方;
步骤S721,半可信第三方将每个属性-父节点相应的边际分布合并作为该属性-父节点对的边际分布,银行和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S722,半可信第三方将含有噪音的贝叶斯网络中每个节点在给定父节点的条件分布的乘积作为数据属性的联合分布;
步骤S723,半可信第三方利用该联合分布生成新的数据。
图8是根据本申请实施例的数据发布系统的示例性示意图。如图8所示,以K个学校(编号为P1、P2、…,Pk,K≥2)联合发布整体学生考试成绩为例对本申请进行详细描述。
K个学校的考试成绩分别存在于各自的物理主机上,半可信第三方和每个学校通过互联网连接,半可信第三方根据PrivSeq算法流程协调各方进行满足差分隐私保护的数据(整体学生考试成绩)发布工作。
步骤S801,半可信第三方采用无重叠属性划分方法对属性集A(如包含学号、姓名、性别、成绩等属性)进行划分,得到一组视图,视图为包含部分属性的集合,如视图V1=(X11,X12,...,X1i);
步骤S802,每个学校利用自己拥有的数据统计上一步中所有视图的边际 分布,并将统计结果发送给半可信第三方,半可信第三方将每个视图相应的边际分布合并作为该视图的边际分布,K个学校和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S803,半可信第三方利用含有噪音的边际分布,计算所有视图中两两属性的互信息;
步骤S804,半可信第三方指定学校按照P1,P2,...,PK的顺序为属性学习父节点,规定每个属性的父节点个数至多为k,并确定每个学校所需学习的属性的个数,前(K-1)个学校分别学习
Figure PCTCN2017099042-appb-000030
个,第PK个学习
Figure PCTCN2017099042-appb-000031
个;
步骤S805,P1将属性集A分成两组Ah和An,Ah是由所有已经选定父节点的属性构成的集合,An是由所有未选定父节点的属性构成的集合。显然Ah初始状态为空;
步骤S806,P1从An中随机选取一个属性X1',将其父节点记为空,并将X1'从An移至Ah
步骤S807,P1从An中选取一个属性Xi,从Ah中选取min{k,|Ah|}个属性组成Πi,Xi和Πi构成一组候选属性-父节点对。P1以属性和候选父节点间互信息为评分函数,利用指数机制从所有的候选属性-父节点对中选取一组属性-父节点对(Xii)并记为(X2',Π2),Π2为X2'的父节点。然后将X2'从An移至Ah
步骤S808,P1重复步骤S807过程,直至为
Figure PCTCN2017099042-appb-000032
个属性选定父节点;
步骤S809,P1将集合Ah,An
Figure PCTCN2017099042-appb-000033
组属性-父节点对发送给P2
步骤S810,P2按照步骤S807和步骤S808过程为
Figure PCTCN2017099042-appb-000034
个新的属性选定父节点并将集合Ah,An
Figure PCTCN2017099042-appb-000035
组属性-父节点对发送给P3
步骤S811,P3,...,PK重复步骤S810过程直至为所有属性选定父节点,从而得到贝叶斯网络结构N0
步骤S812,PK将初始化的贝叶斯网络结构N0发送给半可信第三方;
步骤S813,半可信第三方利用N0和步骤S803中计算出的属性间互信息, 采用关联强度感知的边界构造方法构建边界;
步骤S814,P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方。为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音;
步骤S815,半可信第三方利用指数机制在边界范围内为每个属性选取父节点从而得到贝叶斯网络结构N1
步骤S816,半可信第三方利用N1和步骤S803中计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界;
步骤S817,P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与步骤S814中P1的统计结果累加,为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音,为了提高边际分布的数据效用,P1,P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音;
步骤S818,半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S819,重复步骤S816至步骤S818过程,半可信第三方与学校P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK
步骤S820,每个学校统计贝叶斯网络结构中所有属性-父节点的边际分布,并将统计结果发送给半可信第三方;
步骤S821,半可信第三方将每个属性-父节点相应的边际分布合并作为该属性-父节点对的边际分布,学校和半可信第三方利用最优多方Laplace机制为合并的边际分布添加Laplace噪音;
步骤S822,半可信第三方将含有噪音的贝叶斯网络中每个节点在给定父节点的条件分布的乘积作为数据属性的联合分布;
步骤S823,半可信第三方利用该联合分布生成新的数据。
在上述实施例中,运用数据隐私领域领先的差分隐私模型在多方数据联合发布过程为每个数据拥有者的数据提供ε-差分隐私保护,可以保障用户的隐私,提供更安全的数据发布策略;采用串行的贝叶斯网络更新机制,并结 合无重叠属性划分方法和最优多方Laplace机制,从而在每个数据拥有者的数据满足ε-差分隐私的条件下,最大程度地减少噪音的加入,使得发布的数据的效用得到提升,保证整体数据服务的质量;采用串行更新机制并结合关联强度感知的边界构造方法,对数据拥有者和半可信第三方之间传递的信息量进行合理的限制,从而在综合利用各方数据提供高质量服务的同时,减少通信开销,降低大数据环境下数据服务的成本。
本申请实施例中还提供了一种数据发布装置。该装置用于实现上述实施例及示例性实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件、硬件或者软件和硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图9是根据本申请实施例的数据发布装置的示意图。如图9所示,该装置可以包括:更新单元91、学习单元92以及发布单元93。
更新单元91,配置为更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;
学习单元92,配置为学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;
发布单元93,配置为利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
通过上述实施例,更新单元更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习单元学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;发布单元利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据,从而提高了在大数据环境下实现多方数据发布时的安全性,实现了提高数据发布的安全性的技术效果。
示例性地,更新单元91可以包括:第一获取模块,配置为获取属性集合中任意两个属性的第一互信息;更新模块,配置为通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
示例性地,第一获取模块可以包括:划分子模块,配置为将属性集合划 分为多个视图,其中,每个视图包括属性集合中的部分属性;合并子模块,配置为利用最优多方拉普拉斯机制,将对应于每个视图的多个边际分布合并为每个视图的实际边际分布,其中,实际边际分布中携带有拉普拉斯噪音;计算子模块,配置为利用每个视图的实际边际分布计算每个视图中任意两个属性的第一互信息。
示例性地,划分子模块可以配置为采用无重叠属性划分装置将属性集合划分为多个视图,其中,任意两个视图所包括的属性对不重叠。在得到的一组视图(即多个视图)中,视图为包含部分属性的集合,如视图V1=(X11,X12,...,X1i)。
示例性地,合并子模块可以配置为:获取基于多个对象中每个对象拥有的数据计算得到的每个视图的边际分布,其中,边际分布中添加有拉普拉斯噪音;将多个对象的多个边际分布合并为每个视图的实际边际分布,并将多个边际分布携带的多个拉普拉斯噪音中的最小噪音作为实际边际分布的拉普拉斯噪音。
示例性地,更新单元91还可以包括:第二获取模块,配置为获取包括属性集合中所有属性的父子节点关系的初始贝叶斯网络结构,其中,父子节点关系由多个对象基于指定方式确定。
示例性地,更新模块可以包括:第一更新子模块,配置为对初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构;第二更新子模块,配置为对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;第三更新子模块,配置为对第k-1贝叶斯网络结构进行更新,得到实际贝叶斯网络结构。
上述实施例中的更新单元91还可以配置为控制多个数据拥有者基于指数机制确定属性集合中所有属性的父子节点关系,并确定包括属性集合中所有属性的父子节点关系的初始贝叶斯网络结构。
示例性地,第一更新子模块可以配置为:利用第一互信息,采用关联强度感知的边界构造装置构建初始贝叶斯网络结构的第一边界;获取多个对象中的第一对象统计的第一边界内属性及该属性的父节点的第一边际分布,其中,第一边际分布中携带有拉普拉斯噪音;利用指数机制为第一边界内的每 个属性选取父节点,得到更新后的第一贝叶斯网络结构。
示例性地,第二更新子模块可以配置为:利用第一互信息,采用关联强度感知的边界构造装置构建第j-1贝叶斯网络结构的第j边界;获取多个对象中第j对象统计的第j边界内属性及该属性的父节点的第j边际分布,其中,第j边际分布中携带有拉普拉斯噪音;利用指数机制为第j边界内的每个属性选取父节点,得到更新后的第j贝叶斯网络结构。
示例性地,更新单元91可以按照如下步骤实现上述功能:
步骤S21,半可信第三方与第一个数据拥有者P1对初始化网络结构N0进行更新。
半可信第三方利用N0和之前计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界。
P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音。
半可信第三方利用指数机制在边界范围内为每个属性选取父节点,从而得到贝叶斯网络结构N1
步骤S22,半可信第三方与第二个数据拥有者P2对网络进行更新。
半可信第三方利用N1和计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界。
P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与P1的统计结果累加。为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音。为了提高边际分布的数据效用,P1、P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音。
半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S23,半可信第三方与数据拥有者P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK(即实际贝叶斯网络结构)。
示例性地,学习单元92可以包括:第三获取模块,配置为获取多个对象 中每个对象确定的实际贝叶斯网络结构中任一属性和任一属性的父节点的条件分布;合并模块,配置为利用最优多方拉普拉斯机制将获取到的多个条件分布合并为任一属性和任一属性的父节点的实际条件分布,其中,实际条件分布中携带有拉普拉斯噪音。
示例性地,学习单元92可以按照如下步骤实现上述功能:
步骤S31,半可信第三方与第一个数据拥有者P1对初始化网络结构N0进行更新。
半可信第三方利用N0和之前计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界。
P1统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,为了满足差分隐私保护要求,P1需在统计的边际分布中加入Laplace噪音。
半可信第三方利用指数机制在边界范围内为每个属性选取父节点,从而得到贝叶斯网络结构N1
步骤S32,半可信第三方与第二个数据拥有者P2对网络进行更新。
半可信第三方利用N1和计算出的属性间互信息,采用关联强度感知的边界构造方法构建边界。
P2统计边界内所有属性及其父节点的边际分布并发送给半可信第三方,半可信第三方将其与P1的统计结果累加。为了满足差分隐私保护要求,P2需在统计的边际分布中加入Laplace噪音。为了提高边际分布的数据效用,P1、P2和半可信第三方利用安全功能评估协议去除边际分布中P1生成的Laplace噪音,只保留P2生成的噪音。
半可信第三方利用指数机制在边界范围内为每个属性选取父节点得到贝叶斯网络结构N2
步骤S33,半可信第三方与数据拥有者P3,...PK对网络进行更新直至得到最终的贝叶斯网络结构NK(即实际贝叶斯网络结构)。
示例地,发布单元93可以包括:处理模块,配置为将每个属性在给定父节点条件下的实际条件分布的乘积作为所有属性的联合分布;发布模块,配置为发布由联合分布生成的对应于所有属性的数据。
在上述实施例中,提供了一种实现满足差分隐私的多方数据发布的装置,能够在保护用户隐私的前提下帮助用户充分分析和挖掘数据中的价值,为业务推广和科学研究提供更多依据。运用数据隐私领域领先的差分隐私模型在多方数据联合发布过程为每个数据拥有者的数据提供ε-差分隐私保护,可以保障用户的隐私,提供更安全的数据发布策略;采用串行的贝叶斯网络更新机制,并结合无重叠属性划分方法和最优多方Laplace机制,从而在每个数据拥有者的数据满足ε-差分隐私的条件下,最大程度地减少噪音的加入,使得发布的数据的效用得到提升,保证整体数据服务的质量;采用串行更新机制并结合关联强度感知的边界构造方法,对数据拥有者和半可信第三方之间传递的信息量进行合理的限制,从而在综合利用各方数据提供高质量服务的同时,减少通信开销,降低大数据环境下数据服务的成本。
需要说明的是,上述模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:由同一处理器实现;或者,由不同的处理器实现。
本申请实施例还提供了一种存储介质。在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:S1,更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;S2,学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;S3,利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
示例性地,存储介质还被设置为存储用于执行以下步骤的程序代码:S4,获取属性集合中任意两个属性的第一互信息;S5,通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。
在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在本实施例中,处理器可以根据存储介质中已存储的程序代码执行:更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用目标贝叶斯网络结构发布对应于属性集合中所有属性的数据。
在本实施例中,处理器可以根据存储介质中已存储的程序代码执行:获取属性集合中任意两个属性的第一互信息;通过第一互信息对初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构。本实施例中的示例可以参考上述实施例及示例性实施方式中的描述,本实施例在此不再赘述。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上所述仅为本申请的示例性实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
工业实用性
本申请实施例提供一种数据发布方法和装置及装置,提高了在大数据环境下多方数据发布时的安全性。

Claims (27)

  1. 一种数据发布方法,包括:
    更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构(S301);
    学习所述实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构(S302);
    利用所述目标贝叶斯网络结构发布对应于所述属性集合中所有属性的数据(S303)。
  2. 根据权利要求1所述的方法,其中,所述更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构,包括:
    获取所述属性集合中任意两个属性的第一互信息;
    通过所述第一互信息对所述初始贝叶斯网络结构进行串行更新,得到更新后的所述实际贝叶斯网络结构。
  3. 根据权利要求2所述的方法,其中,所述获取所述属性集合中任意两个属性的第一互信息包括:
    将所述属性集合划分为多个视图,其中,每个所述视图包括所述属性集合中的部分属性;
    利用最优多方拉普拉斯机制,将对应于每个所述视图的多个边际分布合并为每个所述视图的实际边际分布,其中,所述实际边际分布中携带有拉普拉斯噪音;
    利用每个所述视图的实际边际分布,计算每个所述视图中任意两个属性的第一互信息。
  4. 根据权利要求3所述的方法,其中,所述将所述属性集合划分为多个视图包括:
    采用无重叠属性划分方法将所述属性集合划分为多个所述视图,其中,任意两个所述视图所包括的属性对不重叠,所述属性对包括所述属性集合中的两个属性。
  5. 根据权利要求3或4所述的方法,其中,所述利用最优多方拉普拉斯机制,将对应于每个所述视图的多个边际分布合并为每个所述视图的实际边际分布,包括:
    获取基于多个对象中每个所述对象拥有的数据计算得到的每个所述视图的边际分布,其中,所述边际分布中添加有拉普拉斯噪音;
    将多个所述对象的多个所述边际分布合并为每个所述视图的实际边际分布,并将多个所述边际分布携带的多个拉普拉斯噪音中的最小噪音作为所述实际边际分布的拉普拉斯噪音。
  6. 根据权利要求1所述的方法,在更新与数据的属性集合对应的初始贝叶斯网络结构之前,所述方法还包括:
    获取包括所述属性集合中所有属性的父子节点关系的所述初始贝叶斯网络结构,其中,所述父子节点关系由多个对象基于指定方式确定。
  7. 根据权利要求6所述的方法,其中,所述指定方式用于指示按照如下方式确定所述父子节点关系:
    多个所述对象中的第一对象将所述属性集合划分为第一集合和第二集合,其中,所述第一集合用于保存已经确定父节点的属性,所述第一集合的初始状态为空,所述第二集合用于保存未确定父节点的属性;
    所述第一对象从所述第二集合中选取一个属性保存至所述第一集合;
    多个所述对象中的第i对象按照预设方式为所述第二集合中第一预设数量的属性确定父节点,并将确定了父节点的属性从所述第二集合迁移至所述第一集合,其中,i为小于k的正整数,k为多个所述对象的数量;
    多个所述对象中的第k对象按照所述预设方式为所述第二集合中第二预设数量的属性确定父节点,并将确定了父节点的属性从所述第二集合迁移至所述第一集合。
  8. 根据权利要求7所述的方法,其中,所述预设方式包括:
    获取所述第一集合中每个第一属性与第二属性的第二互信息,其中,所述第二属性为从所述第二集合中选取的属性;
    使用指数机制从多个所述第二互信息中选取出目标互信息,将与所述目 标互信息对应的第一属性作为所述第二属性的父节点。
  9. 根据权利要求2所述的方法,其中,所述通过所述第一互信息对所述初始贝叶斯网络结构进行串行更新,得到更新后的实际贝叶斯网络结构,包括:
    对所述初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构;
    对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;
    对第k-1贝叶斯网络结构进行更新,得到所述实际贝叶斯网络结构。
  10. 根据权利要求9所述的方法,其中,所述对所述初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构包括:
    利用所述第一互信息,采用关联强度感知的边界构造方法构建所述初始贝叶斯网络结构的第一边界;
    获取多个对象中的第一对象统计的所述第一边界内属性及该属性的父节点的第一边际分布,其中,所述第一边际分布中携带有拉普拉斯噪音;
    利用指数机制为所述第一边界内的每个属性选取父节点,得到更新后的所述第一贝叶斯网络结构。
  11. 根据权利要求9所述的方法,其中,所述对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构包括:
    利用所述第一互信息,采用关联强度感知的边界构造方法构建第j-1贝叶斯网络结构的第j边界;
    获取多个对象中第j对象统计的所述第j边界内属性及该属性的父节点的第j边际分布,其中,所述第j边际分布中携带有拉普拉斯噪音;
    利用指数机制为所述第j边界内的每个属性选取父节点,得到更新后的所述第j贝叶斯网络结构。
  12. 根据权利要求1所述的方法,其中,所述学习所述实际贝叶斯网络结构中的参数包括:
    获取多个对象中每个所述对象确定的所述实际贝叶斯网络结构中任一属性和所述任一属性的父节点的条件分布;
    利用最优多方拉普拉斯机制将获取到的多个所述条件分布合并为所述任一属性和所述任一属性的父节点的实际条件分布,其中,所述实际条件分布中携带有拉普拉斯噪音。
  13. 根据权利要求1所述的方法,其中,所述利用所述目标贝叶斯网络结构发布对应于所述属性集合中所有属性的数据包括:
    将每个所述属性在给定父节点条件下的实际条件分布的乘积作为所有所述属性的联合分布;
    发布由所述联合分布生成的对应于所有所述属性的数据。
  14. 一种数据发布装置,包括:
    更新单元(91),配置为更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;
    学习单元(92),配置为学习所述实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;
    发布单元(93),配置为利用所述目标贝叶斯网络结构发布对应于所述属性集合中所有属性的数据。
  15. 根据权利要求14所述的装置,其中,所述更新单元包括:
    第一获取模块,配置为获取所述属性集合中任意两个属性的第一互信息;
    更新模块,配置为通过所述第一互信息对所述初始贝叶斯网络结构进行串行更新,得到更新后的所述实际贝叶斯网络结构。
  16. 根据权利要求15所述的装置,其中,所述第一获取模块包括:
    划分子模块,配置为将所述属性集合划分为多个视图,其中,每个所述视图包括所述属性集合中的部分属性;
    合并子模块,配置为利用最优多方拉普拉斯机制将对应于每个所述视图的多个边际分布合并为每个所述视图的实际边际分布,其中,所述实际边际分布中携带有拉普拉斯噪音;
    计算子模块,配置为利用每个所述视图的实际边际分布计算每个所述视图中任意两个属性的第一互信息。
  17. 根据权利要求16所述的装置,其中,所述划分子模块配置为采用无重叠属性划分装置将所述属性集合划分为多个所述视图,其中,任意两个所述视图所包括的属性对不重叠,所述属性对包括所述属性集合中的两个属性。
  18. 根据权利要求16或17所述的装置,其中,所述合并子模块配置为:
    获取基于多个对象中每个所述对象拥有的数据计算得到的每个所述视图的边际分布,其中,所述边际分布中添加有拉普拉斯噪音;
    将多个所述对象的多个所述边际分布合并为每个所述视图的实际边际分布,并将多个所述边际分布携带的多个拉普拉斯噪音中的最小噪音作为所述实际边际分布的拉普拉斯噪音。
  19. 根据权利要求14所述的装置,其中,所述更新单元还包括:
    第二获取模块,配置为获取包括所述属性集合中所有属性的父子节点关系的所述初始贝叶斯网络结构,其中,所述父子节点关系由多个对象基于指定方式确定。
  20. 根据权利要求15所述的装置,其中,所述更新模块包括:
    第一更新子模块,配置为对所述初始贝叶斯网络结构进行更新,得到更新后的第一贝叶斯网络结构;
    第二更新子模块,配置为对第j-1贝叶斯网络结构进行更新,得到更新后的第j贝叶斯网络结构,其中,j为大于1且小于k的正整数;
    第三更新子模块,配置为对第k-1贝叶斯网络结构进行更新,得到所述实际贝叶斯网络结构。
  21. 根据权利要求20所述的装置,其中,所述第一更新子模块配置为:
    利用所述第一互信息,采用关联强度感知的边界构造装置构建所述初始贝叶斯网络结构的第一边界;
    获取多个对象中的第一对象统计的所述第一边界内属性及该属性的父 节点的第一边际分布,其中,所述第一边际分布中携带有拉普拉斯噪音;
    利用指数机制为所述第一边界内的每个属性选取父节点,得到更新后的所述第一贝叶斯网络结构。
  22. 根据权利要求20所述的装置,其中,所述第二更新子模块配置为:
    利用所述第一互信息,采用关联强度感知的边界构造装置构建第j-1贝叶斯网络结构的第j边界;
    获取多个对象中第j对象统计的所述第j边界内属性及该属性的父节点的第j边际分布,其中,所述第j边际分布中携带有拉普拉斯噪音;
    利用指数机制为所述第j边界内的每个属性选取父节点,得到更新后的所述第j贝叶斯网络结构。
  23. 根据权利要求14所述的装置,其中,所述学习单元包括:
    第三获取模块,配置为获取多个对象中每个所述对象确定的所述实际贝叶斯网络结构中任一属性和所述任一属性的父节点的条件分布;
    合并模块,配置为利用最优多方拉普拉斯机制,将获取到的多个所述条件分布合并为所述任一属性和所述任一属性的父节点的实际条件分布,其中,所述实际条件分布中携带有拉普拉斯噪音。
  24. 根据权利要求14所述的装置,其中,所述发布单元包括:
    处理模块,配置为将每个所述属性在给定父节点条件下的实际条件分布的乘积作为所有所述属性的联合分布;
    发布模块,配置为发布由所述联合分布生成的对应于所有所述属性的数据。
  25. 一种终端,包括:
    处理器(101);
    配置为存储所述处理器可执行指令的存储器(103);
    配置为根据所述处理器的控制进行信息收发通信的传输装置(105);
    其中,所述处理器(101)配置为执行以下操作:更新与数据的属性集合对应的初始贝叶斯网络结构,得到更新后的实际贝叶斯网络结构;学习所 述实际贝叶斯网络结构中的参数,得到目标贝叶斯网络结构;利用所述目标贝叶斯网络结构发布对应于所述属性集合中所有属性的数据。
  26. 根据权利要求25所述的终端,其中,所述处理器(101)还配置为执行以下操作:获取所述属性集合中任意两个属性的第一互信息;通过所述第一互信息对所述初始贝叶斯网络结构进行串行更新,得到更新后的所述实际贝叶斯网络结构。
  27. 一种存储介质,存储有数据发布程序,所述数据发布程序被处理器执行时实现如权利要求1至13中任一项所述的数据发布方法的步骤。
PCT/CN2017/099042 2016-10-27 2017-08-25 数据发布方法和装置及终端 WO2018076916A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610957969.7A CN108009437B (zh) 2016-10-27 2016-10-27 数据发布方法和装置及终端
CN201610957969.7 2016-10-27

Publications (1)

Publication Number Publication Date
WO2018076916A1 true WO2018076916A1 (zh) 2018-05-03

Family

ID=62024310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099042 WO2018076916A1 (zh) 2016-10-27 2017-08-25 数据发布方法和装置及终端

Country Status (2)

Country Link
CN (1) CN108009437B (zh)
WO (1) WO2018076916A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144888A (zh) * 2019-12-24 2020-05-12 安徽大学 一种差分隐私保护的移动群智感知任务分配方法
CN115329898A (zh) * 2022-10-10 2022-11-11 国网浙江省电力有限公司杭州供电公司 基于差分隐私策略的分布式机器学习方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959956B (zh) * 2018-06-07 2021-06-22 广西师范大学 基于贝叶斯网络的差分隐私数据发布方法
CN110610098B (zh) * 2018-06-14 2023-05-30 中兴通讯股份有限公司 数据集生成方法及装置
CN113111383B (zh) * 2021-04-21 2022-05-20 山东大学 一种垂直分割数据的个性化差分隐私保护方法及系统
CN116702214B (zh) * 2023-08-02 2023-11-07 山东省计算中心(国家超级计算济南中心) 基于相干邻近度与贝叶斯网络的隐私数据发布方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011016281A2 (ja) * 2009-08-06 2011-02-10 株式会社シーエーシー ベイジアンネットワーク構造学習のための情報処理装置及びプログラム
CN104950808A (zh) * 2015-07-20 2015-09-30 攀枝花学院 基于加强朴素贝叶斯网络的机床热误差补偿方法
CN105006119A (zh) * 2015-06-30 2015-10-28 中国寰球工程公司 一种基于贝叶斯网络的报警系统优化方法
CN105046559A (zh) * 2015-09-10 2015-11-11 河海大学 一种基于贝叶斯网络和互信息的客户信用评分方法
CN105512247A (zh) * 2015-11-30 2016-04-20 上海交通大学 基于一致性特征的非交互式差分隐私发布模型的优化方法
CN105608388A (zh) * 2015-09-24 2016-05-25 武汉大学 一种基于相关性去除的差分隐私数据发布方法及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104869126B (zh) * 2015-06-19 2018-02-09 中国人民解放军61599部队计算所 一种网络入侵异常检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011016281A2 (ja) * 2009-08-06 2011-02-10 株式会社シーエーシー ベイジアンネットワーク構造学習のための情報処理装置及びプログラム
CN105006119A (zh) * 2015-06-30 2015-10-28 中国寰球工程公司 一种基于贝叶斯网络的报警系统优化方法
CN104950808A (zh) * 2015-07-20 2015-09-30 攀枝花学院 基于加强朴素贝叶斯网络的机床热误差补偿方法
CN105046559A (zh) * 2015-09-10 2015-11-11 河海大学 一种基于贝叶斯网络和互信息的客户信用评分方法
CN105608388A (zh) * 2015-09-24 2016-05-25 武汉大学 一种基于相关性去除的差分隐私数据发布方法及系统
CN105512247A (zh) * 2015-11-30 2016-04-20 上海交通大学 基于一致性特征的非交互式差分隐私发布模型的优化方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144888A (zh) * 2019-12-24 2020-05-12 安徽大学 一种差分隐私保护的移动群智感知任务分配方法
CN111144888B (zh) * 2019-12-24 2022-08-02 安徽大学 一种差分隐私保护的移动群智感知任务分配方法
CN115329898A (zh) * 2022-10-10 2022-11-11 国网浙江省电力有限公司杭州供电公司 基于差分隐私策略的分布式机器学习方法及系统

Also Published As

Publication number Publication date
CN108009437B (zh) 2022-11-22
CN108009437A (zh) 2018-05-08

Similar Documents

Publication Publication Date Title
WO2018076916A1 (zh) 数据发布方法和装置及终端
Gai et al. Optimal resource allocation using reinforcement learning for IoT content-centric services
Tsai et al. Bat algorithm inspired algorithm for solving numerical optimization problems
Li et al. Federated learning with soft clustering
Nie et al. Existence and global stability of equilibrium point for delayed competitive neural networks with discontinuous activation functions
Sommer et al. Comparison of graph node distances on clustering tasks
CN111400504A (zh) 企业关键人的识别方法和装置
Jiang et al. Extracting elite pairwise constraints for clustering
Zeyu et al. ECAPM: an enhanced coverage algorithm in wireless sensor network based on probability model
Wu et al. An ensemble of random decision trees with local differential privacy in edge computing
Wang et al. Efficient multi-modal hypergraph learning for social image classification with complex label correlations
Han et al. A clique-based discrete bat algorithm for influence maximization in identifying top-k influential nodes of social networks
Ružička et al. Fast and computationally efficient generative adversarial network algorithm for unmanned aerial vehicle–based network coverage optimization
Janssen et al. Nonuniform distribution of nodes in the spatial preferential attachment model
Wu et al. Efficient range-free localization using elliptical distance correction in heterogeneous wireless sensor networks
CN114817552A (zh) 一种习题关联关系处理方法、装置、设备及存储介质
Zhong et al. Simplifying node classification on heterophilous graphs with compatible label propagation
Jiang et al. Computational aspects of optional Pólya tree
Zhou et al. Asymptotical stability of stochastic neural networks with multiple time-varying delays
Yates et al. Assessing the effectiveness of k-shortest path sets in problems of network interdiction
Ouyang et al. Bayesian Multi‐net Classifier for classification of remote sensing data
Xu et al. Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams
Liu et al. Data fusion in wireless sensor networks
Jacroux et al. On the E-optimality of blocked main effects plans in blocks of different sizes
Mena et al. A heuristic in A* for inference in nonlinear Probabilistic Classifier Chains

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17865954

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17865954

Country of ref document: EP

Kind code of ref document: A1