CN109241774B - Differential privacy space decomposition method and system - Google Patents

Differential privacy space decomposition method and system Download PDF

Info

Publication number
CN109241774B
CN109241774B CN201811090763.4A CN201811090763A CN109241774B CN 109241774 B CN109241774 B CN 109241774B CN 201811090763 A CN201811090763 A CN 201811090763A CN 109241774 B CN109241774 B CN 109241774B
Authority
CN
China
Prior art keywords
data set
domain
node
tree
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811090763.4A
Other languages
Chinese (zh)
Other versions
CN109241774A (en
Inventor
周可
李春花
李晓翠
汪洋涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201811090763.4A priority Critical patent/CN109241774B/en
Publication of CN109241774A publication Critical patent/CN109241774A/en
Application granted granted Critical
Publication of CN109241774B publication Critical patent/CN109241774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Traffic Control Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a differential privacy space decomposition method, which comprises the following steps: the method comprises the steps of obtaining a d-dimensional point data set, generating a complete domain omega for constructing a beta tree and a point count cnt in the complete domain omega according to the d-dimensional point data set, reading all Laplace noise data from a file comprising Laplace noise, creating the beta tree of the d-dimensional point data set by using the obtained complete domain omega and the point count cnt in the complete domain omega, adding a product obtained by multiplying any one of Laplace noise data by a noise coefficient in the point count of each leaf node of the created beta tree, and selecting different Laplace noise data for different leaf nodes. The method can solve the technical problems that the privacy of the intermediate node is easy to expose, the regional decomposition is not accurate, the noise cost is high, and the depth of the spatial decomposition privacy tree is difficult to accurately determine in the existing differential privacy spatial decomposition method based on the Prasiian distribution.

Description

Differential privacy space decomposition method and system
Technical Field
The invention belongs to the technical field of privacy protection, and particularly relates to a differential privacy space decomposition method and system.
Background
Differential privacy as a new privacy protection framework, which adds appropriate noise to the query or analysis results to achieve privacy protection.
In the differential privacy protection process, based on security considerations, it is necessary to recursively decompose the entire domain into subdomains to generate a hierarchical privacy tree and add laplacian noise to the point count of each node in the privacy tree, which is called differential privacy spatial decomposition.
However, the existing differential privacy space decomposition method based on the laplacian distribution has some non-negligible technical problems: first, since the laplacian distribution is symmetric with respect to the origin, in the spatially resolved privacy tree, the number of points of each intermediate node is equal to the sum of the numbers of points of all its children nodes, thus when the sum of laplacian noise of all children nodes is cancelled, the privacy of the intermediate nodes is exposed; secondly, noise is added to all nodes of the spatial decomposition privacy tree in the existing method, and the noise from the root node to the leaf node can cause inaccurate area decomposition and higher noise cost; thirdly, the depth of the spatial decomposition privacy tree is difficult to accurately determine by the existing method, the noise added to the privacy tree is increased due to the overlarge depth of the spatial decomposition privacy tree, and the number of decomposed subdomains is too small due to the overlarge depth, so that the query or analysis result is inaccurate.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a differential privacy space decomposition method and a differential privacy space decomposition system, and aims to solve the technical problems that the privacy of an intermediate node is easy to expose, the noise cost is high due to inaccurate regional decomposition, and the depth of a spatial decomposition privacy tree is difficult to accurately determine in the existing differential privacy space decomposition method based on Laplace distribution.
To achieve the above object, according to one aspect of the present invention, there is provided a differential privacy space decomposition method, comprising the steps of:
(1) acquiring a d-dimensional point data set, generating a complete domain omega for constructing a beta tree and a point count cnt in the complete domain omega according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2;
(2) creating a beta tree of a d-dimensional point data set by using the complete domain omega obtained in the step (1) and the point count cnt in the complete domain omega;
(3) the product of multiplying any one of the laplacian noise data by the noise coefficient is added to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected for use for different leaf nodes.
Preferably, step (1) comprises in particular the following sub-steps:
(1-1) determining extreme points of a plurality of geographical coordinates from a preset area;
(1-2) acquiring a D-dimensional point data set, selecting D-dimensional points with geographic coordinates positioned in the area formed by the extreme points selected in the step (1-1) from the D-dimensional point data set to form a new data set D, wherein the total number of the selected D-dimensional points is a point count cnt;
and (1-3) constructing a complete domain omega according to the extreme values of the geographic coordinates in the formed new data set D.
Preferably, step (1) comprises in particular the following sub-steps:
(1-1) determining extreme points of a plurality of geographical coordinates from a preset area;
and (1-2) constructing a complete domain according to the extreme points of the plurality of geographic coordinates.
Preferably, step (2) comprises in particular the following sub-steps:
(2-1) creating a root node, setting a range of a domain of the root node to a range of a full domain Ω, and marking the root node as visited;
(2-2) creating n child nodes according to the created root node, and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree, and is a natural number greater than or equal to 2;
(2-3) evenly allocating the domain of the root node to n child nodes, for each child node, if the size of the domain allocated to the child node is larger than a domain threshold value theta, and the number of D-dimensional points falling into the domain of the child node in the data set D is larger than a point count threshold value, continuing to create n child nodes of the lower layer according to the child node, then continuing the above-mentioned processes of judging and creating a child node of the lower layer for each child node of the lower layer, and if the child node does not meet the condition that the size of the domain allocated to the child node is larger than the domain threshold value theta, or the number of D-dimensional points falling into the domain of the child node in the data set D is larger than the point count threshold value, marking the child node as visited, and finally generating a beta tree.
Preferably, step (2) comprises in particular the following sub-steps:
(2-1) creating a root node, setting a range of a domain of the root node to a range of a full domain Ω, and marking the root node as visited;
(2-2) creating n child nodes according to the created root node, and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree, and is a natural number greater than or equal to 2;
(2-3) equally allocating the domain of the root node to n child nodes, if the size of the domain allocated to each child node is larger than a domain threshold value theta, and the sum of the number of D-dimensional points falling into the domain of the child node in the data set D and any laplacian noise in the laplacian noise data is larger than a point count threshold value, continuing to create n child nodes of the lower layer according to the child node, and then continuing the above-mentioned processes of judging and creating the child nodes of the lower layer for each of the n child nodes of the lower layer; and if the child node does not meet the condition that the size of the domain allocated to the child node is larger than the domain threshold value theta, or the sum of the number of D-dimensional points falling into the domain of the child node in the data set D and any Laplace noise in the Laplace noise data is larger than a point counting threshold value, marking the child node as being visited, and finally generating the beta tree.
Preferably, the point count threshold θ is between 5 and 20, and the domain threshold is equal to 2-18=0.000003814697266。
Preferably, the noise figure is calculated by the formula: (k + n +1)/(k + n), where k ∈ [1, n ], and k is calculated as: and k is n-i +1, wherein i is equal to the sequence number-1 of the parent node to which the leaf node belongs.
According to another aspect of the present invention, there is provided a differential privacy space decomposition system comprising:
the device comprises a first module, a second module and a third module, wherein the first module is used for acquiring a d-dimensional point data set, generating a complete domain omega for constructing a beta tree and a point count cnt in the complete domain omega according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2;
a second module, configured to create a β -tree of a d-dimensional point data set by using the complete domain Ω obtained in the first module and the point count cnt in the complete domain Ω;
and a third module for adding a product obtained by multiplying any one of the laplacian noise data by the noise coefficient to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected to be used for different leaf nodes.
According to a further aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the differential privacy space decomposition method described above.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
1. when the Laplace noise is added into the node, the noise coefficient related to the fan-out number beta and the node serial number is adopted to multiply the Laplace noise data, and the product result is added into the point count of the node, at the moment, the sum of the noise of each sub-node of each intermediate node is no longer possible to be 0, and therefore the privacy of the intermediate node cannot be exposed.
2. Because the invention only adds the irremovable Laplacian noise to the leaf nodes, and does not add the noise to the intermediate nodes, compared with the traditional differential privacy space decomposition method, the invention adds less noise, thereby reducing the total noise cost.
3. According to the invention, whether the node is divided is determined according to whether the size of the domain allocated to the child node is larger than the domain threshold value theta and whether the number of d-dimensional points in the child node is larger than the point counting threshold value, so that the construction of the privacy tree integrates the fineness of the complete domain decomposition and the number of the added noise, and a more balanced domain decomposition effect is obtained.
Drawings
FIG. 1 is a flow chart of a differential privacy space decomposition method of the present invention.
FIG. 2 is a schematic diagram of a computer implementing the differential privacy space decomposition system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, according to a first embodiment of the present invention, there is provided a differential privacy space decomposition method, including the steps of:
(1) acquiring a d-dimensional Point data set (Point dataset), generating a complete domain omega for constructing a beta tree and a Point count cnt in the complete domain omega according to the d-dimensional Point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2;
specifically, the point data set obtained in this step may be obtained from a Distributed File System (DFS), or directly obtained from a database;
in the present embodiment, a point data set is a location-based point data set, and data in the point data set corresponds to coordinates of a certain location.
The laplacian distribution position parameter μ of the laplacian noise data is 0, and the laplacian distribution scale parameter λ is 0.1.
The method specifically comprises the following substeps:
(1-1) determining extreme points of a plurality of geographical coordinates from a preset area;
in the invention, the preset area is specifically an administrative geographical division capable of uniformly calculating the traffic road condition.
For example, if the preset area is a beijing city area and d is 2, the extreme points of the plurality of geographic coordinates are the point of maximum latitude 41.2 degrees and maximum longitude 117.2 degrees, and the point of minimum latitude 39.2 degrees and minimum longitude 115.2 degrees, respectively;
similarly, if d is 3, the extreme points of the plurality of geographic coordinates are the point at which the maximum latitude, the maximum longitude and the maximum altitude are located, and the point at which the minimum latitude, the minimum longitude and the minimum altitude are located, respectively.
(1-2) acquiring a D-dimensional Point data set, selecting D-dimensional points with geographic coordinates positioned in the area formed by the extreme points selected in the step (1-1) from the D-dimensional Point data set to form a new data set D, wherein the total number of the selected D-dimensional points is Point count (Point count) cnt;
specifically, if d is 2, the area formed by the plurality of extreme points is a rectangular area formed by four points (maximum latitude, maximum longitude), (maximum latitude, minimum longitude), (minimum latitude, maximum longitude), and (minimum latitude, minimum longitude).
If d is 3, the area formed by the extreme points is a hexahedral area surrounded by the 8 points (maximum latitude, maximum longitude, maximum altitude), (maximum latitude, maximum longitude, minimum altitude), (b, and (minimum latitude, minimum longitude, minimum altitude).
(1-3) constructing a complete domain omega according to the extreme value of the geographic coordinate in the formed new data set D;
specifically, if D is 2, the area surrounded by the maximum latitude, the minimum longitude and the minimum longitude in the new data set D is taken as the complete domain Ω; if D is 3, the area enclosed by the maximum latitude, the maximum altitude, the minimum longitude and the minimum altitude in the new data set D is taken as the complete domain Ω.
And (3) when all the geographic coordinates in the d-dimensional point data set are positioned in the area formed by the plurality of extreme points selected in the step (1-1), directly using the extreme points of the plurality of geographic coordinates selected in the step (1) to form a complete area.
(2) Creating a beta tree of a d-dimensional point data set by using the complete domain omega obtained in the step (1) and the point count cnt in the complete domain omega;
the method specifically comprises the following substeps:
(2-1) creating a root node, setting a range of a domain (domain) of the root node to a range of a full domain Ω, and marking the root node as visited;
(2-2) creating n child nodes according to the created root node, and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree, and is a natural number greater than or equal to 2;
(2-3) the domain of the root node is averagely allocated to n child nodes, for each child node, if the size of the allocated domain is larger than a domain threshold value theta, and the number of D-dimensional points falling into the domain of the child node in the data set D (namely the point count of the child node) is larger than a point count threshold value, the n child nodes at the lower layer are continuously created according to the child node, then the process of judging and creating the child node at the lower layer is continued for each of the n child nodes at the lower layer, if a certain child node does not meet the condition that the size of the allocated domain is larger than the domain threshold value theta, or the number of D-dimensional points falling into the domain of the child node in the data set D is larger than the point count threshold value, the child node is marked as visited, and finally the beta tree is obtained.
Specifically, the domain of the root node is averagely allocated to the n child nodes, specifically, the domain obtained by the root node is firstly divided into n domains with equal or similar sizes, and then the domains are respectively allocated to the n child nodes of the root node.
In this step, the point count threshold θ is between 5 and 20, preferably 10, and the domain threshold is equal to 2-18=0.000003814697266。
As another alternative, the number of D-dimensional points in the data set D falling into the domain of the child node in this step is greater than the point count threshold, and may be replaced by the sum of the number of D-dimensional points in the data set D falling into the domain of the child node and any laplacian noise in the laplacian noise data being greater than the point count threshold.
(3) Adding a product obtained by multiplying any one of laplacian noise data by a noise coefficient to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected to be used for different leaf nodes;
note that a leaf node refers to a node at the bottom of the tree that has no children (i.e., degree 0).
In this step, the calculation formula of the noise coefficient is: (k + n +1)/(k + n);
the noise coefficient is used to prevent the removal of laplacian noise, where k ∈ [1, n ], and k is calculated by the formula: and k is n-i +1, wherein i is equal to the sequence number-1 of the parent node to which the leaf node belongs.
According to a second embodiment of the present invention, there is provided a differential privacy space decomposition system, comprising:
the device comprises a first module, a second module and a third module, wherein the first module is used for acquiring a d-dimensional point data set, generating a complete domain omega for constructing a beta tree and a point count cnt in the complete domain omega according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2;
a second module, configured to create a β -tree of a d-dimensional point data set by using the complete domain Ω obtained in the first module and the point count cnt in the complete domain Ω;
and a third module for adding a product obtained by multiplying any one of the laplacian noise data by the noise coefficient to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected to be used for different leaf nodes.
In a specific application of the present invention, a server-side program extracts a d-dimensional point data set from a data set in a database or a distributed storage system (i.e., DFS), generates a complete domain Ω for constructing a β -tree and a point count cnt in the complete domain Ω according to the d-dimensional point data set in a distributed computing cluster (e.g., a cluster of a SPARK framework), and reads all laplacian noise data from a file including laplacian noise in a server-side file system; wherein d is a natural number greater than or equal to 2; creating a beta tree by using a complete domain omega obtained in a distributed computing framework, a point counting cnt in the complete domain omega and a d-dimensional point data set; the server-side application adds a product obtained by multiplying any one of laplacian noise data by a noise coefficient to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected to be used for different leaf nodes.
And finally, the server-side program distributes the maximum and minimum longitude and latitude data and the point count data of the leaf nodes of the beta tree to the mobile phone client for constructing the graphic display of the traffic road condition data by using the transmitted data on the mobile phone client.
According to a third embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the differential privacy space decomposition method described above.
FIG. 2 is a schematic diagram of a computer, which is a server, capable of implementing the differential privacy space decomposition system of the present invention.
The server includes a processor 100, which processor 100 may be implemented by one or more microprocessors or controllers.
Processor 100 is communicatively coupled to a main memory 300 via bus 200, where main memory 300 includes volatile memory 301 and non-volatile memory 302 (not shown).
The volatile Memory 301 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), and/or other types of ram devices. The non-volatile memory 302 may be implemented by flash memory and/or any other type of memory device. Access to the volatile memory 301 and the non-volatile memory 302 is achieved by a memory controller.
The server also includes an interface circuit 400. The interface circuit 400 may be implemented by any type of interface standard, such as an ethernet interface, a USB interface, and/or a PCI express interface.
One or more input devices 500 are connected to the interface circuit 400, the input devices 500 enabling a user to input data or instructions into the processor 100. The input device 500 may be implemented by, for example, a keyboard, a mouse, a touch screen, a Pad, a trackball (Trackpad), and/or a voice recognition system.
One or more output devices 600 may also be connected to interface circuit 400. Output device 600 may be implemented, for example, by a display device (e.g., a liquid crystal display, a CRT display, a printer, and/or speakers). The interface circuit 400 may thus include a graphics driver card.
The interface circuit 400 may be connected to an external client 3000 via a network 2000.
The server also includes one or more mass storage devices 700 for storing software and data. The mass storage device 700 may be a hard disk, an optical disk, or the like.
Instructions involved in the methods of the present invention may be stored in mass storage device 700, in volatile memory 301, in non-volatile memory 302, and/or in removable storage media (not shown).
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A differential privacy space decomposition method is characterized by comprising the following steps:
(1) acquiring a d-dimensional point data set, generating a complete domain for constructing a beta tree and a point count in the complete domain according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2; the point data set is obtained from a distributed file system or directly from a database, the point data set is a point data set based on a position, and the data in the point data set corresponds to the coordinate of a certain position;
the step (1) specifically comprises the following substeps:
(1-1) determining extreme points of a plurality of geographical coordinates from a preset area; the preset area is an administrative geographical division capable of uniformly calculating traffic road conditions;
(1-2) acquiring a D-dimensional point data set, selecting D-dimensional points with geographic coordinates positioned in the area formed by the extreme points selected in the step (1-1) from the D-dimensional point data set to form a new data set D, wherein the total number of the selected D-dimensional points is a point count;
(1-3) constructing a complete domain according to extreme points of the geographical coordinates in the formed new data set D;
(2) creating a beta tree of a d-dimensional point data set by using the complete domain obtained in the step (1) and the point count in the complete domain; the step (2) specifically comprises the following substeps:
(2-1) creating a root node, setting a range of a domain of the root node as a range of a full domain, and marking the root node as visited;
(2-2) creating n child nodes according to the created root node, and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree, and is a natural number greater than or equal to 2;
(2-3) evenly allocating the domain of the root node to n child nodes, for each child node, if the size of the domain allocated to the child node is larger than a domain threshold value theta and the number of D-dimensional points falling into the domain of the child node in the data set D is larger than a point count threshold value, continuing to create n child nodes of the lower layer according to the child node, then continuing the above-mentioned processes of judging and creating a child node of the lower layer for each child node of the lower layer, and if the child node does not meet the condition that the size of the domain allocated to the child node is larger than the domain threshold value theta or the number of D-dimensional points falling into the domain of the child node in the data set D is larger than the point count threshold value, marking the child node as visited, and finally generating a beta tree;
(3) adding a product obtained by multiplying any one of laplacian noise data by a noise coefficient to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected to be used for different leaf nodes;
and finally, the server-side program distributes the maximum and minimum longitude and latitude data and the point count data of the leaf nodes of the beta tree to the mobile phone client for constructing the graphic display of the traffic road condition data by using the transmitted data on the mobile phone client.
2. A differential privacy space decomposition method is characterized by comprising the following steps:
(1) acquiring a d-dimensional point data set, generating a complete domain for constructing a beta tree and a point count in the complete domain according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2; the point data set is obtained from a distributed file system or directly from a database, the point data set is a point data set based on a position, and the data in the point data set corresponds to the coordinate of a certain position;
the step (1) specifically comprises the following substeps:
(1-1) determining extreme points of a plurality of geographical coordinates from a preset area; the preset area is an administrative geographical division capable of uniformly calculating traffic road conditions;
(1-2) acquiring a D-dimensional point data set, selecting D-dimensional points with geographic coordinates positioned in the area formed by the extreme points selected in the step (1-1) from the D-dimensional point data set to form a new data set D, wherein the total number of the selected D-dimensional points is a point count;
(1-3) constructing a complete domain according to extreme points of the geographical coordinates in the formed new data set D;
(2) creating a beta tree of a d-dimensional point data set by using the complete domain obtained in the step (1) and the point count in the complete domain; the step (2) specifically comprises the following substeps:
(2-1) creating a root node, setting a range of a domain of the root node as a range of a full domain, and marking the root node as visited;
(2-2) creating n child nodes according to the created root node, and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree, and is a natural number greater than or equal to 2;
(2-3) equally allocating the domain of the root node to n child nodes, if the size of the domain allocated to each child node is larger than a domain threshold value theta, and the sum of the number of D-dimensional points falling into the domain of the child node in the data set D and any laplacian noise in the laplacian noise data is larger than a point count threshold value, continuing to create n child nodes of the lower layer according to the child node, and then continuing the above-mentioned processes of judging and creating the child nodes of the lower layer for each of the n child nodes of the lower layer; if the child node does not meet the condition that the size of the domain allocated to the child node is larger than a domain threshold value theta, or the sum of the number of D-dimensional points falling into the domain of the child node in the data set D and any Laplace noise in the Laplace noise data is larger than a point counting threshold value, marking the child node as being visited, and finally generating a beta tree;
(3) adding a product obtained by multiplying any one of laplacian noise data by a noise coefficient to the point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected to be used for different leaf nodes;
and finally, the server-side program distributes the maximum and minimum longitude and latitude data and the point count data of the leaf nodes of the beta tree to the mobile phone client for constructing the graphic display of the traffic road condition data by using the transmitted data on the mobile phone client.
3. The differential privacy spatial decomposition method according to claim 1 or 2, wherein the point count threshold θ is between 5 and 20, and the domain threshold is equal to 2-18=0.000003814697266。
4. The differential privacy space decomposition method according to claim 1 or 2, wherein the noise figure is calculated by the formula: (k + n +1)/(k + n), where k ∈ [1, n ], and k is calculated as: k = n-i +1, where i is equal to the sequence number-1 of the parent node to which the leaf node belongs.
5. A differential privacy space decomposition system, comprising:
the device comprises a first module, a second module and a third module, wherein the first module is used for acquiring a d-dimensional point data set, generating a complete domain for constructing a beta tree and a point count in the complete domain according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2; the module obtains a point data set from a distributed file system or directly from a database, wherein the point data set is a point data set based on a position, and the data in the point data set corresponds to the coordinate of a certain position;
this module specifically includes:
the first submodule is used for determining extreme points of a plurality of geographic coordinates from a preset area; the preset area is an administrative geographical division capable of uniformly calculating traffic road conditions;
the second submodule is used for obtaining a D-dimensional point data set, selecting D-dimensional points with geographic coordinates positioned in an area formed by the extreme points selected by the first submodule from the D-dimensional point data set to form a new data set D, and the total number of the selected D-dimensional points is point count;
the third sub-module is used for constructing a complete domain according to extreme points of the geographic coordinates in the formed new data set D;
the second module is used for establishing a beta tree of the d-dimensional point data set by utilizing the complete domain obtained in the first module and the point count in the complete domain; the second module specifically includes:
the first submodule is used for creating a root node, setting the range of the domain of the root node as the range of a complete domain, and marking the root node as being accessed;
the second submodule is used for creating n child nodes according to the created root node and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree and is a natural number which is greater than or equal to 2;
a third sub-module, configured to averagely allocate the domain of the root node to n child nodes, for each child node, if the size of the domain allocated to the child node is greater than a domain threshold θ, and the number of D-dimensional points in the data set D falling into the domain of the child node is greater than a point count threshold, continue to create n child nodes on its lower layer according to the child node, then continue the above-mentioned process of determining and creating a child node on its lower layer for each of the n child nodes on its lower layer, and if the child node does not satisfy that the size of the domain allocated to the child node is greater than the domain threshold θ, or the number of D-dimensional points in the data set D falling into the domain of the child node is greater than the point count threshold, mark the child node as visited, and finally generate a β -tree;
a third module for adding a product obtained by multiplying any one of laplacian noise data by a noise coefficient to a point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected for use for different leaf nodes;
and finally, the server-side program distributes the maximum and minimum longitude and latitude data and the point count data of the leaf nodes of the beta tree to the mobile phone client for constructing the graphic display of the traffic road condition data by using the transmitted data on the mobile phone client.
6. A differential privacy space decomposition system, comprising:
the device comprises a first module, a second module and a third module, wherein the first module is used for acquiring a d-dimensional point data set, generating a complete domain for constructing a beta tree and a point count in the complete domain according to the d-dimensional point data set, and reading all Laplace noise data from a file containing Laplace noise; wherein d is a natural number greater than or equal to 2; the module obtains a point data set from a distributed file system or directly from a database, wherein the point data set is a point data set based on a position, and the data in the point data set corresponds to the coordinate of a certain position;
this module specifically includes:
the first submodule is used for determining extreme points of a plurality of geographic coordinates from a preset area; the preset area is an administrative geographical division capable of uniformly calculating traffic road conditions;
the second submodule is used for obtaining a D-dimensional point data set, selecting D-dimensional points with geographic coordinates positioned in an area formed by the extreme points selected by the first submodule from the D-dimensional point data set to form a new data set D, and the total number of the selected D-dimensional points is point count;
the third sub-module is used for constructing a complete domain according to extreme points of the geographic coordinates in the formed new data set D;
the second module is used for establishing a beta tree of the d-dimensional point data set by utilizing the complete domain obtained in the first module and the point count in the complete domain; the second module specifically includes:
the first submodule is used for creating a root node, setting the range of the domain of the root node as the range of a complete domain, and marking the root node as being accessed;
the second submodule is used for creating n child nodes according to the created root node and marking all the n child nodes as not-visited, wherein n represents the fan-out number of the beta tree and is a natural number which is greater than or equal to 2;
a third sub-module, configured to averagely allocate the domain of the root node to n sub-nodes, and if, for each sub-node, the size of the domain to which the sub-node is allocated is larger than a domain threshold θ, and the sum of the number of D-dimensional points in the data set D that fall into the domain of the sub-node and any laplacian noise in the laplacian noise data is larger than a point count threshold, continue to create n sub-nodes of its lower layer according to the sub-node, and then continue the above-mentioned process of determining and creating the sub-node of its lower layer for each of the n sub-nodes of its lower layer; if the child node does not meet the condition that the size of the domain allocated to the child node is larger than a domain threshold value theta, or the sum of the number of D-dimensional points falling into the domain of the child node in the data set D and any Laplace noise in the Laplace noise data is larger than a point counting threshold value, marking the child node as being visited, and finally generating a beta tree;
a third module for adding a product obtained by multiplying any one of laplacian noise data by a noise coefficient to a point count of each leaf node of the created beta tree, wherein different laplacian noise data are selected for use for different leaf nodes;
and finally, the server-side program distributes the maximum and minimum longitude and latitude data and the point count data of the leaf nodes of the beta tree to the mobile phone client for constructing the graphic display of the traffic road condition data by using the transmitted data on the mobile phone client.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the differential privacy spatial decomposition method according to claim 1 or 2.
CN201811090763.4A 2018-09-19 2018-09-19 Differential privacy space decomposition method and system Active CN109241774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811090763.4A CN109241774B (en) 2018-09-19 2018-09-19 Differential privacy space decomposition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811090763.4A CN109241774B (en) 2018-09-19 2018-09-19 Differential privacy space decomposition method and system

Publications (2)

Publication Number Publication Date
CN109241774A CN109241774A (en) 2019-01-18
CN109241774B true CN109241774B (en) 2020-11-10

Family

ID=65059651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811090763.4A Active CN109241774B (en) 2018-09-19 2018-09-19 Differential privacy space decomposition method and system

Country Status (1)

Country Link
CN (1) CN109241774B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094497B (en) * 2021-06-07 2021-09-14 华中科技大学 Electronic health record recommendation method and shared edge computing platform

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573560A (en) * 2015-01-27 2015-04-29 上海交通大学 Differential private data publishing method based on wavelet transformation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090182797A1 (en) * 2008-01-10 2009-07-16 Microsoft Corporation Consistent contingency table release
JP6835559B2 (en) * 2016-12-09 2021-02-24 国立大学法人電気通信大学 Privacy protection data provision system
CN107360551B (en) * 2017-07-12 2018-07-24 安徽大学 Location privacy protection method based on difference privacy in vehicular ad hoc network
CN107526975A (en) * 2017-08-10 2017-12-29 中国人民大学 A kind of method based on difference secret protection decision tree

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573560A (en) * 2015-01-27 2015-04-29 上海交通大学 Differential private data publishing method based on wavelet transformation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
社交网络个性化差分隐私数据发布方法的研究;孙宇晴;《中国优秀硕士学位论文全文数据库信息科技辑》;20180615;正文第17页2-3段、第18页1-2段 *

Also Published As

Publication number Publication date
CN109241774A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
Hesselbarth et al. landscapemetrics: an open‐source R tool to calculate landscape metrics
Gao Downscaling global spatial population projections from 1/8-degree to 1-km grid cells
US10667082B2 (en) Method and apparatus for determining index grids of geo-fence
Crosweller et al. Global database on large magnitude explosive volcanic eruptions (LaMEVE)
Jiang et al. Spatial distribution of city tweets and their densities
Pereira et al. Urban centrality: a simple index
Leydesdorff Visualization of the citation impact environments of scientific journals: An online mapping exercise
CN109726587B (en) Spatial data partitioning method based on differential privacy
Błaszczak-Bąk et al. Application of the Msplit method for filtering airborne laser scanning data-sets to estimate digital terrain models
CN105824840A (en) Method and apparatus for region tag management
CN111161887A (en) Population migration big data-based epidemic area return population scale prediction method
US20200341965A1 (en) Data Tokenization System Maintaining Data Integrity
US20140370920A1 (en) Systems and methods for generating and employing an index associating geographic locations with geographic objects
CN108268614A (en) A kind of distribution management method of forest reserves spatial data
CN109241774B (en) Differential privacy space decomposition method and system
JP7292368B2 (en) A non-transitory computer-readable storage medium storing a method for identifying a device using attributes and location signatures from the device, a server of uniquely generated identifiers for the method, and a sequence of instructions for the method
CN104850623B (en) Multi-dimensional data analysis model dynamic expansion method and system
CN110209749A (en) A kind of geographical information query method and device based on HBase
CN107704685B (en) Mesh division method and device
DE112022003126T5 (en) ACCESSING TOPOLOGICAL MAPPING OF CORES
CN115423889A (en) Image processing method and device, electronic equipment and storage medium
Huck et al. Visualizing patterns in spatially ambiguous point data
CN109582718B (en) Data processing method, device and storage medium
Strumiłło-Rembowska et al. Data generation of vector maps using a hybrid method of analysis and selection of geodata necessary to optimize the process of spatial planning
Lloyd et al. Surface models and the spatial structure of population variables: Exploring smoothing effects using Northern Ireland grid square data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant