WO2020248150A1 - Method and system for answering multi-dimensional analytical queries under local differential privacy - Google Patents

Method and system for answering multi-dimensional analytical queries under local differential privacy Download PDF

Info

Publication number
WO2020248150A1
WO2020248150A1 PCT/CN2019/090837 CN2019090837W WO2020248150A1 WO 2020248150 A1 WO2020248150 A1 WO 2020248150A1 CN 2019090837 W CN2019090837 W CN 2019090837W WO 2020248150 A1 WO2020248150 A1 WO 2020248150A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
intervals
user data
range
data
Prior art date
Application number
PCT/CN2019/090837
Other languages
French (fr)
Inventor
Bolin Ding
Jingren Zhou
Tianhao WANG
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to CN201980096293.9A priority Critical patent/CN113811868A/en
Priority to PCT/CN2019/090837 priority patent/WO2020248150A1/en
Publication of WO2020248150A1 publication Critical patent/WO2020248150A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24535Query rewriting; Transformation of sub-queries or views
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • a well-studied DP model is in the centralized setting, where the trusted data collector obtains exact data from users and injects noise in the analytical process to guarantee DP.
  • users prefer not to have their private data leave their devices in an unprotected form, and thus, the centralized setting of DP is no longer applicable.
  • LDP local differential privacy model
  • Each user’s private data is encoded by a randomized algorithm before being sent to the data collector. LDP guarantees that the likelihood of any specific output of the algorithm varies little with input, i.e., the private data. In this way, users do not need to trust the data collector.
  • FIG. 1 illustrates an example network environment for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • FIG. 2 illustrates an example client-server diagram for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • FIG. 3 illustrates an example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • FIG. 4 illustrates an example diagram for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure.
  • FIG. 5 illustrates an example process for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure.
  • FIG. 6 illustrates another example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • FIG. 7 illustrates another example diagram for determining a plurality of first sub-intervals associated with the first range and a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure.
  • FIG. 8 illustrates another example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • FIG. 9 illustrates an example process for determining a plurality of first sub-intervals associated with the first range in accordance with an embodiment of the present disclosure.
  • FIG. 10 illustrates another example process for determining a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure.
  • FIG. 11 illustrates another example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • FIG. 12 illustrates an example system for implementing the processes for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • Systems and methods discussed herein are directed to improving the performance of answering a class of MDA queries, and more specifically to improving accuracy and scalability of answering a large class of MDA queries with tight error bounds, while the users’ sensitive data is collected under LDP.
  • the present disclosure relies on a hierarchical decomposition of the ordinal dimensions into sub-intervals and reducing the worst-case error from O (m d ) in the marginal based solution to log m O (d) , given that other terms such as dependent on data size and privacy budget remain the same.
  • FIG. 1 illustrates an example network environment 100 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • the example network environment 100 may include users 102, servers 104-1 and 104-2, user devices 106, network 108, database 110, and cloud storage 112.
  • the users 102 may be the administrators of the servers 104-1 and 104-2 or the analysts of service providers.
  • the users 102 may connect to the servers 104-1 and 104-2 via different types of terminal devices including but not limited to desktop computers, laptop computers, a mobile device, a built-in device in a motor vehicle, a wearable device, or a virtual reality (VR) device, and the like.
  • An MDA query typed by the users 102 may be processed by the servers 104-1 or 104-2.
  • the queries may be further forwarded to the user devices 106 via the network 108.
  • the user devices 106 may include any types of terminal devices such as a mobile phone, a tablet, a laptop computer, a desktop computer, a wearable device, a VR device, or a built-in device in a motor vehicle, and the like.
  • the user devices 106 may implement one or more internet services provided by the service providers.
  • User data associated with the internet services may be stored in the user devices 106.
  • User data may be periodically transmitted to the servers 104-1 and 104-2 to be stored in the database 110 and/or the cloud storage 112.
  • user data may be transmitted to the servers 104-1 and 104-2 upon request, for example, in response to an MDA query.
  • the servers 104-1 and 104-2 may implement a plurality of internet services provided by the service providers and manage user data associated with the services.
  • the network 108 may be a single network or a combination of different networks.
  • the network 108 may be a local area network (LAN) , a wide area network (WAN) , a public network, a private network, a proprietary network, a Public Telephone Switched Network (PSTN) , the Internet, a wireless network, a virtual network, or any combination thereof.
  • LAN local area network
  • WAN wide area network
  • PSTN Public Telephone Switched Network
  • the network 108 may also include various network access points, e.g., wired or wireless access points such as base stations or Internet exchange points, through which a data source may connect to the network 108 in order to transmit information via the network 108.
  • the services may be implemented in one or more servers individually or collaboratively.
  • the servers 104-1 and 104-2 may be individually connected to the network 108 or interconnected to form a server bank. It should be understood for those of ordinary skilled in the art that the above example network environment with numerically noted elements are merely for illustration purpose and the present disclosure is not intended to be limiting.
  • FIG. 2 illustrates an example client-server diagram 200 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • an MDA query 202 is received at an estimation processor 204 on the server side (e.g., the server 104-1, or 104-2 in FIG. 1) , it may be transmitted to the user devices (e.g., the user devices 106 in FIG. 1) to fetch the user data.
  • the user devices e.g., the user devices 106 in FIG. 1
  • each user may generate multi-dimensional data during using the service.
  • Some dimensions (also referred to as measure attributes) about the service usage are naturally known to the service provider and are non-sensitive, e.g., active time and purchase amount (for the billing purpose) .
  • Some other dimensions are sensitive, e.g., income, age and location, and users prefer to have them collected by the service provider in a privacy-preserving way.
  • An example of the multi-dimensional user data is shown in Table 1.
  • the user data shown in Table 1 has seven dimensions, of which, age, salary, state and operation system (OS) are sensitive dimensions.
  • OS state and operation system
  • Table 1 A relational table with sensitive dimensions
  • the MDA query 202 may be an analytical query that aggregates measure attributes under constraints on sensitive dimensions of user data.
  • the MDA query 202 may be generated by the service provider to analyze how the service performs. While all the dimensions may not be released to the public and the analytics may be conducted internally by the service provider, the service provider may have to guarantee that the sensitive dimensions are handled properly by providing an LDP collection algorithm that implements on each user device.
  • sensitive data 206 from the user may be individually encoded using an LDP algorithm at each of the user devices, and the LDP encoded sensitive data 208 may be transmitted to the server side.
  • a fact table 210 may be generated at the server side upon receiving the LDP encoded sensitive data 208.
  • the fact table 210 may be a combination of the LDP encoded sensitive data 208 collected from the users and non-sensitive data (i.e., public or known to the server) .
  • the LDP algorithm employs a degree of randomness as part of its logic to protect the user data privacy.
  • LDP may use uniformly random bits as an auxiliary input to guide the algorithm behavior.
  • the LDP encoded sensitive data may be embedded with random noise when transmitting to the server to protect the user data privacy.
  • the estimation processor 204 may perform an algorithm to estimate the answers to the MDA query with bounded errors.
  • FIG. 3 illustrates an example process 300 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • the server may receive a query for an analytical result of user data in at least a first range over a first data dimension, such as the MDA query as described with reference to FIG. 2.
  • the block 302 may be performed by the estimation processor 204 of FIG. 2.
  • the query may be directed to obtain an analytical result of user data by aggregating data values in at least a first range over a first data dimension under constraints on sensitive dimensions.
  • the first dimension is a sensitive dimension.
  • the server may receive a query for an average purchase made by users between age 30 to 40, with reference to Table. 1.
  • the server may receive a query for an average purchase made by users between age 30 to 40 and in the state of New York, with reference to Table 1.
  • the server may determine a plurality of sub-intervals associated with the at least first range.
  • the block 304 may be performed by the estimation processor 204 of FIG. 2.
  • the at least first range over the first dimension defined in the query may be partitioned into a plurality of sub-intervals. Details of the block 304 will be described with reference to FIG. 4 and FIG. 5.
  • the server may decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals.
  • the block 306 may be performed by the estimation processor 204 of FIG. 2.
  • the server may rewrite the initial query into a plurality of sub-queries and transmits the plurality of sub-queries to each user at the client side.
  • Each of the plurality of sub-queries may be directed to obtain an analytical value of the user data in one of the plurality of sub-intervals over the first dimension.
  • the server may fetch from the user devices, first user data in the plurality of sub-intervals over the first dimension.
  • the block 308 may be performed by the estimation processor 204 of FIG. 2.
  • each user’s private data may be encoded by a randomized algorithm before being sent to the data collector, i.e., the server. If the initial query for an analytical result of user data in the first range over the first data dimension is received at a user device, each individual data item within the first range may be visited and encoded to be sent to the server. The accumulated noise level received at the server due to the encoding of each individual data item may be high.
  • each user may respond to the server via returning an analytical value associated with each of the plurality of sub-intervals.
  • the user device may encode the analytical value associated with the sub-interval to be sent to the server.
  • the accumulated noise level received at the server due to encoding may be reduced.
  • the server may compute the analytical result of user data in at least the first range over the first data dimension in accordance with the received user data in the plurality of sub-intervals.
  • the block 310 may be performed by the estimation processor 204 of FIG. 2.
  • the answer to each of the plurality of sub-queries may be estimated using a weighted frequency estimator and the answer to the initial query is estimated by summing up the answers to all of the plurality of sub-queries.
  • FIG. 4 illustrates an example diagram 400 for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure.
  • the example diagram as illustrated is for a one-dimensional query, where the one-dimension has eight distinct values in the order of 1, 2, ..., 8 and a binary hierarchy of intervals with four levels is constructed.
  • a hierarchical collection of intervals with a fanout b which can be viewed as a perfect b-way tree, may be constructed.
  • each node may correspond to an interval, and have b children (except leaves) corresponding to b equal sized sub-intervals.
  • Dummy values may be added in D if m ⁇ b h , where h indicates the number of levels.
  • Level 0 in the hierarchy may correspond to the root ⁇ [z 1 , z m ] ⁇ , and may be recursively partitioned into b equal sized sub-intervals until reaching the leaves, i.e., intervals with unit length ⁇ [z 1 , z 1 ] , ..., [z m , z m ] ⁇ .
  • I D ⁇ L 0 , ...L h ⁇ may be set as the whole hierarchy
  • FIG. 5 illustrates an example process 500 for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure.
  • the blocks described herein may be performed by the estimation processor 204 of FIG. 2.
  • the server may pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension.
  • the binary hierarchy of intervals may be constructed with respect to a first data dimension.
  • the binary hierarchy of intervals may include four levels, where Level 0 is the root level covering all eight distinct values in the first data dimension.
  • Level 0 may be further partitioned into two equal sized sub-intervals 402 and 404 in Level 1.
  • Each of the sub-intervals 402 and 404 in Level 1 may be further partitioned into two equal sized sub-intervals 406, 408, 410, and 412 in Level 2.
  • the server may determine whether one interval in each of the plurality of levels of the binary hierarchy is within the at least first range.
  • the at least first range is denoted as [2, 3, 4, 5, 6, 7] and the sub-interval 402 covers the distinct values of the first data dimension 1, 2, 3 and 4, thus, the system may determine that the sub-interval 402 is not within the at least first range.
  • the sub-interval 408 covers the distinct values of the first data dimension 3 and 4 and may be determined within the at least first range.
  • the server may determine that the sub-intervals 408 and 410 in Level 2 and the sub-intervals 414, 416, 418, 420, 422, and 424 in Level 3 are within the at least first range of the first data dimension.
  • the server may determine whether the one interval is selected in an upper level of the binary hierarchy. Referring to the example diagram of FIG. 4, as the distinct values covered by each of the sub-intervals 416, 418, 420, and 422 in Level 3 are included in the sub-intervals 408 and 410, the server may determine that the sub-intervals 416, 418, 420, and 422 in Level 3 are already selected in the upper level, i.e., Level 2 of the binary hierarchy.
  • the server may select the one interval as one of the plurality of sub-intervals in responses to the one interval being within the at least first range and not being selected in an upper level of the binary hierarchy.
  • the server may determine that the at least first range [2, 3, 4, 5, 6, 7] with respect to the first data dimension indicated in the query includes a plurality of sub-intervals 408 with a value range of [3, 4] , 410 with a value range of [5, 6] , 414 with a value range of [2] and 424 with a value range of [7] .
  • FIG. 6 illustrates another example process 600 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • the example process 600 as illustrated in FIG. 6 includes the blocks of operations 302, 304, 306 and 308 that are similar to those described with reference to FIG. 3. Therefore, the blocks of operations 302, 304, 306 and 308 are not described in detail herein.
  • the server may estimate an analytical value of the first user data in each of the plurality of sub-intervals.
  • the block 602 may be performed by the estimation processor 204 of FIG. 2.
  • the at least first range over a first data dimension indicated in the query is determined to include a plurality of sub-intervals, and the query for an analytical result of user data in the entire first range is decomposed into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of sub-intervals.
  • Each user receives the decomposed plurality of sub-queries for analytical value of first user data in the plurality of sub-intervals, instead of the initial query, i.e., the query for an analytical result of user data in the entire first range.
  • the user device transmits the analytical value of the first user data with respect to each of the plurality of sub-intervals encoded using an LDP algorithm.
  • the server estimates the analytical value of the received user data in each of the plurality of sub-intervals using a weighted frequency estimator.
  • the server may obtain the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the received user data in the plurality of sub-intervals.
  • the server may estimate the answer to query q by summing up the estimated answers for the plurality of sub-queries, each estimated using the weighted frequency estimator as shown in Equation (3) :
  • the query q is received to obtain a summation of user data in a first range [2, 7] over the first dimension D 1 .
  • the query q may be denoted as q: SELECT SUM (M 1 ) FROM T WHERE D 1 ⁇ [2, 7] .
  • the first range [2, 7] may be decomposed into four sub-intervals 408, 410, 414 and 424.
  • the query q may be rewritten as the sum of four sub-queries shown below:
  • the server may answer the initial query q by assembling estimates for the four sub-queries.
  • FIG. 7 illustrates another example diagram 700 for determining a plurality of first sub-intervals associated with the first range and a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure.
  • FIG. 7 illustrates an example directed to a 2-dimensional MDA query. The query is directed to obtain the analytical result of user data in a first range over the first data dimension 702 and a second range over the second data dimension 704.
  • the first range [2, 3, 4, 5, 6, 7] may be determined to include four sub-intervals 408, 410, 414, and 424, similar to those described with reference to FIG. 4.
  • the second range [3, 4, 5, 6, 7, 8] may be determined to include two sub-intervals 710 with a value range of [3, 4] and 712 with a value range of [5, 6, 7, 8] .
  • a 2-dimensional hierarchy of intervals may be constructed in accordance with the first binary hierarchy of intervals and the second binary hierarchy of intervals.
  • the four sub-intervals in the first data dimension and the two sub-intervals in the second data dimension may be taken to generate eight combinations of sub-intervals 706.
  • Each of the sub-intervals 706 may correspond to a unique combination of one of the four sub-intervals in the first data dimension and one of the two sub-intervals in the second data dimension.
  • FIG. 8 illustrates another example process 800 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • the server may receive a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension.
  • the block 802 may be performed by the estimation processor 204 of FIG. 2.
  • the query may be directed to obtain an analytical result of user data by aggregating data values in the first range over the first data dimension and the second range over the second data dimension under constraints on sensitive dimensions of the user data.
  • the first data dimension and the second data dimension may be sensitive dimensions.
  • the server may receive a query for an average purchase made by users between age 30 to 40 and with salary ranged between 50K to 70K, with reference to Table 1.
  • the server may determine a plurality of first sub-intervals associated with the first range.
  • the block 804 may be performed by the estimation processor 204 of FIG. 2.
  • the determination of a plurality of first sub-intervals associated with the first range is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
  • the server may determine a plurality of second sub-intervals associated with the second range.
  • the block 806 may be performed by the estimation processor 204 of FIG. 2.
  • the block 806 may be performed by the query decomposition module 1210 of FIG. 12.
  • the second range [3, 4, 5, 6, 7, 8] over the second data dimension may be partitioned into two sub-intervals 710 with a range of [3, 4] and 712 with a range of [5, 6, 7, 8] .
  • the plurality of second sub-intervals associated with the second range over the second data dimension may be determined using a binary hierarchy similar to those described with respect to the partitioning of the first range over the first dimension.
  • first range over the first data dimension and the second range over the second dimension may be partitioned using different techniques. Further, the operations of block 804 and block 806 may be performed in parallel or in a sequential order.
  • the server may decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals.
  • the block 808 may be performed by the estimation processor 204 of FIG. 2.
  • a first dimension D 1 having m distinct values, in the order of z 1 , z 2 , ...z m may be constructed to a binary hierarchy with multiple levels, each level including equal sized sub-intervals.
  • the second dimension D 2 may be similarly constructed to a binary hierarchy with multiple levels, each level including equal sized sub-intervals.
  • a two-dimensional hierarchy can be constructed using Equation (4) :
  • Each may denote a 2-dimensional level. There may be a total of ⁇ h+1 ⁇ 2 2-dimensional levels in a 2-dimensional hierarchy. Each pair may denote a 2-dimensional sub-interval. Referring to FIG. 7, the first dimension 702 having eight distinct values [1, 2, 3, 4, 5, 6, 7, 8] may be constructed to a first binary hierarchy with four levels, and the second dimension 704 having eight distinct values [1, 2, 3, 4, 5, 6, 7, 8] may be constructed to a second binary hierarchy with four levels.
  • a 2-dimensional hierarchy constructed in accordance with the first binary hierarchy and the second binary hierarchy may include sixteen levels, among which, eight levels include the combinations of the plurality of first sub-intervals (i.e., 408, 410, 414 and 424) and the plurality of second sub-intervals (i.e., 708 and 710) .
  • the server may further decompose the query into eight sub-queries, each corresponding to one of the eight combinations of the plurality of first sub-intervals (i.e., 408, 410, 414 and 424) and the plurality of second sub-intervals (i.e., 708 and 710) .
  • the measures of the first dimension and the second dimension described above are for illustration purpose and the present disclosure is not intended to be limiting.
  • the measures of the first dimension and the second dimension can be set as different in accordance to the actual data set being collected.
  • the server may fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  • the block 810 may be performed by the estimation processor 204 of FIG. 2.
  • the block 810 may be performed by the data fetching module 1212 of FIG. 12.
  • the users at the client side may receive the decomposed plurality of sub-queries corresponding to the plurality of first sub-intervals and the plurality of second sub-intervals instead of receiving the initial query with respect to the entire first range over the first data dimension and the entire second range over the second data dimension.
  • the users may transmit the analytical value of the first user data corresponding to a combination of one of the plurality of first sub-interval and one of the plurality of second sub-interval, under the constraint of LDP encoding algorithm.
  • the server may compute the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  • the block 812 may be performed by the estimation processor 204 of FIG. 2. Similar to the operation described in the block 310 with reference to FIG. 3, the server may estimate the answer to each of the plurality of sub-queries using a weighted frequency estimator and estimate the answer to the initial query by summing up the answers to all of the plurality of sub-queries.
  • FIG. 9 illustrates an example process 900 for determining a plurality of first sub-intervals associated with the first range in accordance with an embodiment of the present disclosure.
  • the blocks described herein may be performed by the estimation processor 204 of FIG. 2.
  • the server may pre-generate a first binary hierarchy of intervals with respect to the first data dimension.
  • the operation of block 902 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
  • the server may determine whether one interval in each level of the first binary hierarchy is within the first range.
  • the operation of block 904 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
  • the server may determine whether the interval is selected in an upper level of the first binary hierarchy.
  • the operation of block 906 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
  • the server may select the interval as one of the plurality of sub-intervals in response to the interval being within the first range and not being selected in an upper level of the first binary hierarchy.
  • the operation of block 908 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
  • FIG. 10 illustrates another example process 1000 for determining a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure.
  • the blocks described herein may be performed by the estimation processor 204 of FIG. 2.
  • the server may pre-generate a second binary hierarchy of intervals with respect to the second data dimension.
  • a second binary hierarchy of intervals with respect to the second data dimension 704 includes four levels of intervals.
  • Level 0 of the second binary hierarchy includes eight distinct values [1, 2, 3, 4, 5, 6, 7, 8] and may be partitioned into two equal sized sub-intervals 708 and 710 in Level 1.
  • Level 1 may be further partitioned into four equal sized sub-intervals 712, 714, 716 and 718 in Level 2.
  • Level 2 may be even further partitioned into eight equal sized sub-intervals in Level 3, each has unit length.
  • the server may determine whether one interval in each level of the second binary hierarchy is within the second range.
  • the second range is denoted as [3, 4, 5, 6, 7, 8]
  • the server may determine that the sub-interval 708 in Level 1 with a range of [1, 2, 3, 4] is beyond the second range while the sub-interval 710 in Level 1 with a range of [5, 6, 7, 8] is within the second range.
  • the sub-intervals 714, 716 and 718 in Level 2 may be determined within the second range while the sub-interval 712 in Level 2 may be determined beyond the second range.
  • the sub-intervals 720 and 722 in Level 3 may be determined beyond the second range while the rest sub-intervals in Level 3 may be determined within the second range.
  • the server may determine whether the one interval is selected in an upper level of the second binary hierarchy. With reference to FIG. 7, the server may further determine that the sub-intervals 710 in Level 1 and 714 in Level 2 are not selected in an upper level of the second binary hierarchy. Other sub-intervals, such as 716 and 718 in Level 2 and the sub-intervals in Level 3 other than 720 and 722, although are within the second range, may be determined to have been selected in an upper level of the second binary hierarchy.
  • the server may select the one interval as one of the plurality of sub-intervals in response to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy.
  • the server may determine that the second range can be partitioned into sub-intervals 710 and 714.
  • FIG. 11 illustrates another example process 1100 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • the example process as illustrated in FIG. 11 includes the blocks of operations 802, 804, 806, 808 and 810 that are similar to those described with reference to FIG. 8. Therefore, the blocks of operations 802, 804, 806, 808 and 810 are not described in detail herein.
  • the blocks 1102 and 1104 may be performed by the estimation processor 204 of FIG. 2.
  • the server may estimate an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of second sub-intervals.
  • the users may receive a plurality of sub-queries directed to each of the combination of the plurality of first sub-intervals and the plurality of second sub-intervals.
  • the user device transmits an analytical value of the first user data corresponding to each combination of the plurality of first sub-intervals and the plurality of second sub-intervals under the constraint of the LDP encoding.
  • the LDP encoded analytical value of the first user data corresponding to each combination of the plurality of first sub-intervals and the plurality of second sub-intervals is embedded with noise, and thus, the server may estimate the analytical value as the answer to each sub-query.
  • the techniques that the server estimates analytical value in answering the 2-dimensional MDA query may be similar to those described with respect to answering the 1-dimensional MDA query. It should be understood for those of ordinary skilled in the art that the examples described above are for illustration purpose and the present disclosure is not intended to be limiting.
  • the techniques that the server estimates analytical value in answering the 2-dimensional MDA query can use different algorithms or models from those described with respect to answering the 1-dimensional MDA query.
  • the server may estimate the analytical result of user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of second sub-intervals.
  • the server estimates the analytical value of the first user data corresponding to each combination of the plurality of first sub-intervals and the plurality of second sub-intervals from each user, an analytical result of user data in the first range over the first data dimension and the second range over the second data dimension, as initially requested may be answered by summing up all the analytical results associated with the sub-interval combinations.
  • FIG. 12 illustrates an example system 1200 for implementing the processes for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
  • the techniques and mechanisms described herein may be implemented by multiple instances of the system 1200 as well as by any other computing device, system, and/or environment.
  • the system 1200 shown in FIG. 12 is only one example of a system and is not intended to suggest any limitation as to the scope of use or functionality of any computing device utilized to perform the processes and/or procedures described above.
  • Other well-known computing devices, systems, environments and/or configurations that may be suitable for use with the embodiments include, but are not limited to, personal computers, server computers (e.g., server 104-1 or 104-2 in FIG.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • the system 1200 may include one or more processors 1202 and system memory 1204 communicatively coupled to the processor (s) 1202.
  • the processor (s) 1202 may execute one or more modules and/or processes to cause the processor (s) 1202 to perform a variety of functions.
  • the processor (s) 1202 may include a central processing unit (CPU) , a graphics processing unit (GPU) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 1202 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
  • system memory 1204 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof.
  • the system memory 1204 may include one or more computer-executable modules (modules) 1206 that are executable by the processor (s) 1202.
  • the modules 1206 may include, but are not limited to, a receiving module 1208, a query decomposition module 1210, a data fetching module 1212, and an analyzing module 1214.
  • the receiving module 1208 may be configured to receive a query for an analytical result of user data in at least a first range over a first dimension as described with reference to FIG. 3.
  • the query decomposition module 1210 may be configured to determine a plurality of sub-intervals associated with the at least first range and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals as described with reference to FIG. 3.
  • the data fetching module 1212 may be configured to fetch, from user devices, first user data in the plurality of sub-intervals as described with reference to FIG. 3.
  • the analyzing module 1214 may be configured to compute the analytic result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals as described with reference to FIG. 3.
  • the receiving module 1208 may be further configured to receive a query for an analytic result of user data in a first range over a first data dimension and a second range over a second data dimension as described with reference to FIG. 8.
  • the receiving module 1208 may perform a step in block 302 described above with reference to FIG. 3.
  • the receiving module 1208 may further perform a step in block 802 described above with reference to FIG. 8.
  • the query decomposition module 1210 may be further configured to determine a plurality of first sub-intervals associated with the first range, determine a plurality of second sub-intervals associated with the second range, and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals as described with reference to FIG. 8.
  • the query decomposition module 1210 may be further configured to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the at least first range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the interval as one of the plurality of sub-intervals in responses to the one interval being within the at least first range and not being selected in an upper level of the binary hierarchy as described with reference to FIG. 5.
  • the query decomposition module 1210 may be further configured to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the first range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy as described with reference to FIG. 9.
  • the query decomposition module 1210 may be further configured to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the second data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the second range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the binary hierarchy as described with reference to FIG. 10.
  • the data fetching module 1212 may be configured further to fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals as described with reference to FIG. 8.
  • the data fetching module 1212 may perform a step in block 308 described above with reference to FIG. 3.
  • the data fetching module 1212 may perform a step in block 810 described above with reference to FIG. 8.
  • the analyzing module 1214 may be further configured to compute an analytical value of the first user data in each of the plurality of sub-intervals and obtain the analytic result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals as described with reference to FIG. 6.
  • the analyzing module 1214 may be further configured to compute the analytic result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals as described with reference to FIG. 8.
  • the analyzing module 1214 may be further configured to compute an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and obtain the analytic result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals as described with reference to FIG. 11.
  • the analyzing module 1214 may perform a step in block 310 described above with reference to FIG. 3.
  • the analyzing module 1214 may perform steps in blocks 602 and 604 described above with reference to FIG. 6.
  • the analyzing module 1214 may perform a step in block 812 described above with reference to FIG. 8.
  • the analyzing module 1214 may perform steps in blocks 1102 and 1104 described above with reference to FIG. 11.
  • the system 1200 may additionally include an input/output (I/O) interface 1216 for receiving data associated with the process described above, such as query for an analytical result of user data from user 102 as illustrated in FIG. 1, and for outputting the processed data, such as transmitting the decomposed a plurality of sub-queries to the user devices 106 as illustrated in FIG. 1.
  • the system 1200 may also include a communication module 1218 allowing the system 1200 to communicate with other devices (e.g., the user devices 106, the database 110, the cloud storage 112 as illustrated in FIG. 1) over a network (e.g., the network 108 as illustrated in FIG. 1) .
  • the network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF) , infrared, and other wireless media.
  • RF radio frequency
  • Computer-readable instructions include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like.
  • Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
  • the computer-readable storage media may include volatile memory (such as random-access memory (RAM) ) and/or non-volatile memory (such as read-only memory (ROM) , flash memory, etc. ) .
  • volatile memory such as random-access memory (RAM)
  • non-volatile memory such as read-only memory (ROM) , flash memory, etc.
  • the computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
  • a non-transient computer-readable storage medium is an example of computer-readable media.
  • Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media.
  • Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Computer-readable storage media includes, but is not limited to, phase change memory (PRAM) , static random-access memory (SRAM) , dynamic random-access memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read-only memory (EEPROM) , flash memory or other memory technology, compact disk read-only memory (CD-ROM) , digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
  • communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer-readable storage media do not include communication media.
  • the computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGs. 1-12.
  • computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
  • a method comprising: receiving a query for an analytical result of user data in at least a first range over a first data dimension; determining a plurality of sub-intervals associated with the at least first range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; fetching, from user devices, first user data in the plurality of sub-intervals; and computing the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
  • determining a plurality of sub-intervals associated with the at least first range comprises: pre-generating a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determining whether one interval in each of the plurality of levels of the binary hierarchy is within the first range; determining whether the one interval is selected in an upper level of the binary hierarchy; and selecting the one interval as one of the plurality of sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy.
  • pre-generating a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension further comprises: sorting values associated with the first data dimension; and recursively partitioning the values into two equal sized intervals until the interval is a unit length.
  • computing the analytical result of the user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals further comprises: estimating an analytical value of the first user data in each of the plurality of sub-intervals; and obtaining the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals.
  • a method comprising: receiving a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension; determining a plurality of first sub-intervals associated with the first range; determining a plurality of second sub-intervals associated with the second range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals; fetching, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals; and computing the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  • determining a plurality of first sub-intervals associated with the first range comprises: pre-generating a first binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determining whether one interval in each of the plurality of levels of the first binary hierarchy is within the first range; determining whether the one interval is selected in an upper level of the first binary hierarchy; and selecting the one interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the first binary hierarchy.
  • determining a plurality of second sub-intervals associated with the second range comprises: pre-generating a second binary hierarchy of intervals having a plurality of levels with respect to the second data dimension; determining whether one interval in each of the plurality of levels of the second binary hierarchy is within the second range; determining whether the one interval is selected in an upper level of the second binary hierarchy; and selecting the one interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy.
  • computing the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals further comprises: estimating an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and obtaining the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of second sub-intervals.
  • a system comprising: one or more processors, and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: a receiving module configured to receive a query for an analytical result of user data in at least a first range over a first data dimension; a query processing module configured to: determine a plurality of sub-intervals associated with the at least first range; and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; a data fetching module configured to fetch, from user devices, first user data in the plurality of sub-intervals; and an analyzing module configured to compute the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
  • the query processing module is further configured to: pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the first range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the one interval as one of the plurality of sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy.
  • the query processing module is further configured to: sort values associated with the first data dimension; and recursively partition the values into two equal sized intervals until the interval is a unit length.
  • the analyzing module configured to: estimate an analytical value of the first user data in each of the plurality of sub-intervals; and obtain the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals.
  • a system comprising: one or more processors, and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: a receiving module configured to receive a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension; a query processing module configured to: determine a plurality of first sub-intervals associated with the first range; determine a plurality of second sub-intervals associated with the second range; and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals; a data fetching module configured to fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals; and
  • the query processing module is further configured to: pre-generate a first binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the first binary hierarchy is within the first range; determine whether the one interval is selected in an upper level of the first binary hierarchy; and select the one interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the first binary hierarchy.
  • the query processing module is further configured to: pre-generate a second binary hierarchy of intervals having a plurality of levels with respect to the second data dimension; determine whether one interval in each of the plurality of levels of the second binary hierarchy is within the second range; determine whether the one interval is selected in an upper level of the second binary hierarchy; and select the one interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy.
  • the analyzing module configured to: estimate an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and obtain the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  • a computer-readable storage medium storing computer-readable instructions executable by one or more processors of a video compression system, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving a query for an analytical result of user data in at least a first range over a first data dimension; determine a plurality of sub-intervals associated with the at least first range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; fetching, from user devices, first user data in the plurality of sub-intervals; and computing the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems and methods for answering a multi-dimensional analytical query under local differential privacy including receiving a query for an analytical result of user data in at least a first range over a first data dimension; determining a plurality of sub-intervals associated with the at least first range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; fetching, from user devices, first user data in the plurality of sub-intervals; and computing the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.

Description

METHOD AND SYSTEM FOR ANSWERING MULTI-DIMENSIONAL ANALYTICAL QUERIES UNDER LOCAL DIFFERENTIAL PRIVACY BACKGROUND
Large volumes of users’ data about their profiles and activities are collected by enterprises to make informed business decisions. To meet users’ expectation of their privacy, applications and services must provide rigorous privacy guarantees on how their data is collected and analyzed. Differential privacy (DP) has emerged as the de facto standard for privacy guarantees, and is being used by a number of high tech companies.
A well-studied DP model is in the centralized setting, where the trusted data collector obtains exact data from users and injects noise in the analytical process to guarantee DP. In the absence of such a trusted party, users prefer not to have their private data leave their devices in an unprotected form, and thus, the centralized setting of DP is no longer applicable. In such scenarios, one can adopt the local differential privacy model (LDP) . Each user’s private data is encoded by a randomized algorithm before being sent to the data collector. LDP guarantees that the likelihood of any specific output of the algorithm varies little with input, i.e., the private data. In this way, users do not need to trust the data collector.
While each user’s sensitive data is collected under LDP, answering a class of multi-dimensional analytical (MDA) queries is still challenging.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.
FIG. 1 illustrates an example network environment for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
FIG. 2 illustrates an example client-server diagram for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates an example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
FIG. 4 illustrates an example diagram for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates an example process for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure.
FIG. 6 illustrates another example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
FIG. 7 illustrates another example diagram for determining a plurality of first sub-intervals associated with the first range and a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure.
FIG. 8 illustrates another example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
FIG. 9 illustrates an example process for determining a plurality of first sub-intervals associated with the first range in accordance with an embodiment of the present disclosure.
FIG. 10 illustrates another example process for determining a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure.
FIG. 11 illustrates another example process for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
FIG. 12 illustrates an example system for implementing the processes for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
Systems and methods discussed herein are directed to improving the performance of answering a class of MDA queries, and more specifically to improving accuracy and scalability of answering a large class of MDA queries with tight error bounds, while the users’ sensitive data is collected under LDP. The present disclosure relies on a hierarchical decomposition of the ordinal dimensions into sub-intervals and reducing the worst-case error from O (m d) in the marginal based solution to log mO (d) , given that other terms such as dependent on data size and privacy budget remain the same.
FIG. 1 illustrates an example network environment 100 for answering MDA queries under LDP in accordance with an embodiment of the present  disclosure. The example network environment 100 may include users 102, servers 104-1 and 104-2, user devices 106, network 108, database 110, and cloud storage 112. The users 102 may be the administrators of the servers 104-1 and 104-2 or the analysts of service providers. The users 102 may connect to the servers 104-1 and 104-2 via different types of terminal devices including but not limited to desktop computers, laptop computers, a mobile device, a built-in device in a motor vehicle, a wearable device, or a virtual reality (VR) device, and the like. An MDA query typed by the users 102 may be processed by the servers 104-1 or 104-2. The queries may be further forwarded to the user devices 106 via the network 108. The user devices 106 may include any types of terminal devices such as a mobile phone, a tablet, a laptop computer, a desktop computer, a wearable device, a VR device, or a built-in device in a motor vehicle, and the like. The user devices 106 according to the present disclosure may implement one or more internet services provided by the service providers. User data associated with the internet services may be stored in the user devices 106. User data may be periodically transmitted to the servers 104-1 and 104-2 to be stored in the database 110 and/or the cloud storage 112. In embodiments, user data may be transmitted to the servers 104-1 and 104-2 upon request, for example, in response to an MDA query. At least part of the user data is encrypted for privacy before transmitting to the servers 104-1 and 104-2, for example, using an LDP algorithm. The servers 104-1 and 104-2 may implement a plurality of internet services provided by the service providers and manage user data associated with the services. The network 108 may be a single network or a combination of different networks. For example, the network 108 may be a local area network (LAN) , a wide area  network (WAN) , a public network, a private network, a proprietary network, a Public Telephone Switched Network (PSTN) , the Internet, a wireless network, a virtual network, or any combination thereof. The network 108 may also include various network access points, e.g., wired or wireless access points such as base stations or Internet exchange points, through which a data source may connect to the network 108 in order to transmit information via the network 108. In embodiments, the services may be implemented in one or more servers individually or collaboratively. The servers 104-1 and 104-2 may be individually connected to the network 108 or interconnected to form a server bank. It should be understood for those of ordinary skilled in the art that the above example network environment with numerically noted elements are merely for illustration purpose and the present disclosure is not intended to be limiting.
FIG. 2 illustrates an example client-server diagram 200 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure. When an MDA query 202 is received at an estimation processor 204 on the server side (e.g., the server 104-1, or 104-2 in FIG. 1) , it may be transmitted to the user devices (e.g., the user devices 106 in FIG. 1) to fetch the user data. For an internet service, each user may generate multi-dimensional data during using the service. Some dimensions (also referred to as measure attributes) about the service usage are naturally known to the service provider and are non-sensitive, e.g., active time and purchase amount (for the billing purpose) . Some other dimensions are sensitive, e.g., income, age and location, and users prefer to have them collected by the service provider in a privacy-preserving way. An example of the multi-dimensional user data is shown in Table 1. The user data shown in Table 1 has seven  dimensions, of which, age, salary, state and operation system (OS) are sensitive dimensions.
Figure PCTCN2019090837-appb-000001
Table 1: A relational table with sensitive dimensions
The MDA query 202 may be an analytical query that aggregates measure attributes under constraints on sensitive dimensions of user data. The MDA query 202 may be generated by the service provider to analyze how the service performs. While all the dimensions may not be released to the public and the analytics may be conducted internally by the service provider, the service provider may have to guarantee that the sensitive dimensions are handled properly by providing an LDP collection algorithm that implements on each user device. As illustrated in FIG. 2, in response to the MDA query 202, sensitive data 206 from the user may be individually encoded using an LDP algorithm at each of the user devices, and the LDP encoded sensitive data 208 may be transmitted to the server side. A fact table 210 may be generated at the server side upon receiving the LDP encoded sensitive data 208. The fact table 210 may be a combination of the LDP encoded sensitive data 208 collected from the users and non-sensitive data (i.e., public or known to the server) . As encoding the sensitive data in the user devices, the LDP algorithm employs a degree of randomness as part of its logic to protect the user data  privacy. As a randomized algorithm, LDP may use uniformly random bits as an auxiliary input to guide the algorithm behavior. The LDP encoded sensitive data may be embedded with random noise when transmitting to the server to protect the user data privacy. As the encoded sensitive data is embedded with random noise, the estimation processor 204 may perform an algorithm to estimate the answers to the MDA query with bounded errors.
FIG. 3 illustrates an example process 300 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
At block 302, the server may receive a query for an analytical result of user data in at least a first range over a first data dimension, such as the MDA query as described with reference to FIG. 2. The block 302 may be performed by the estimation processor 204 of FIG. 2. The query may be directed to obtain an analytical result of user data by aggregating data values in at least a first range over a first data dimension under constraints on sensitive dimensions. According to the present embodiment, the first dimension is a sensitive dimension. For example, the server may receive a query for an average purchase made by users between age 30 to 40, with reference to Table. 1. As another example, the server may receive a query for an average purchase made by users between age 30 to 40 and in the state of New York, with reference to Table 1.
At block 304, the server may determine a plurality of sub-intervals associated with the at least first range. The block 304 may be performed by the estimation processor 204 of FIG. 2. To improve the estimation performance of answering the MDA query, the at least first range over the first dimension  defined in the query may be partitioned into a plurality of sub-intervals. Details of the block 304 will be described with reference to FIG. 4 and FIG. 5.
At block 306, the server may decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals. The block 306 may be performed by the estimation processor 204 of FIG. 2. According to the present embodiment, rather than transmitting the initial query (i.e., the MDA query) directly to each user at the client side, the server may rewrite the initial query into a plurality of sub-queries and transmits the plurality of sub-queries to each user at the client side. Each of the plurality of sub-queries may be directed to obtain an analytical value of the user data in one of the plurality of sub-intervals over the first dimension.
At block 308, the server may fetch from the user devices, first user data in the plurality of sub-intervals over the first dimension. The block 308 may be performed by the estimation processor 204 of FIG. 2. When using the LDP algorithm at each user device, each user’s private data may be encoded by a randomized algorithm before being sent to the data collector, i.e., the server. If the initial query for an analytical result of user data in the first range over the first data dimension is received at a user device, each individual data item within the first range may be visited and encoded to be sent to the server. The accumulated noise level received at the server due to the encoding of each individual data item may be high. According to the present disclosure, as the initial query is decomposed into a plurality of sub-queries, each user may respond to the server via returning an analytical value associated with each of the plurality of sub-intervals. Rather than encoding each individual data item in  a sub-interval, the user device may encode the analytical value associated with the sub-interval to be sent to the server. Thus, the accumulated noise level received at the server due to encoding may be reduced.
At block 310, the server may compute the analytical result of user data in at least the first range over the first data dimension in accordance with the received user data in the plurality of sub-intervals. The block 310 may be performed by the estimation processor 204 of FIG. 2. On the server, for a query q, the answer to each of the plurality of sub-queries may be estimated using a weighted frequency estimator and the answer to the initial query is estimated by summing up the answers to all of the plurality of sub-queries.
It should be understood for those of ordinary skilled in the art that the processes described above are intended to be illustrative. In embodiments, a process may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Further, the order in which the operations of the process as illustrated in FIG. 3 and set forth above is not intended to be limiting.
FIG. 4 illustrates an example diagram 400 for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure. The example diagram as illustrated is for a one-dimensional query, where the one-dimension has eight distinct values in the order of 1, 2, …, 8 and a binary hierarchy of intervals with four levels is constructed.
In a more general example, suppose the dimension D has m distinct values, in the order of z 1, z 2, …z m, a hierarchical collection of intervals with a fanout b, which can be viewed as a perfect b-way tree, may be constructed.  According to the hierarchical collection of intervals, each node may correspond to an interval, and have b children (except leaves) corresponding to b equal sized sub-intervals. Dummy values may be added in D if m≠b h, where h indicates the number of levels. Level 0 in the hierarchy may correspond to the root { [z 1, z m] } , and may be recursively partitioned into b equal sized sub-intervals until reaching the leaves, i.e., intervals with unit length { [z 1, z 1] , …, [z m, z m] } . There may be b l intervals on level l, each including m/b l values as shown in Equation (1) :
Figure PCTCN2019090837-appb-000002
Further, I D= {L 0, …L h} may be set as the whole hierarchy
Figure PCTCN2019090837-appb-000003
FIG. 5 illustrates an example process 500 for determining a plurality of sub-intervals associated with the at least first range in accordance with an embodiment of the present disclosure. The blocks described herein may be performed by the estimation processor 204 of FIG. 2.
At block 502, the server may pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension. Referring to the example diagram of FIG. 4, the binary hierarchy of intervals may be constructed with respect to a first data dimension. The binary hierarchy of intervals may include four levels, where Level 0 is the root level covering all eight distinct values in the first data dimension. Level 0 may be further partitioned into two equal sized sub-intervals 402 and 404 in Level 1. Each of the sub-intervals 402 and 404 in Level 1 may be further partitioned into two equal sized sub-intervals 406, 408, 410, and 412 in Level 2. Further, each of the sub-intervals 406, 408, 410, and 412 in Level 2 may be partitioned into two equal sized sub-intervals in Level 3. As each of the sub-intervals in Level 3 is  unit length, the partition operation reaches the leaves of the hierarchy. If the number of distinct values in the first data dimension does not satisfy m=b h, dummy values may be added to the set of the distinct values.
At block 504, the server may determine whether one interval in each of the plurality of levels of the binary hierarchy is within the at least first range. Referring to the example diagram of FIG. 4, the at least first range is denoted as [2, 3, 4, 5, 6, 7] and the sub-interval 402 covers the distinct values of the  first data dimension  1, 2, 3 and 4, thus, the system may determine that the sub-interval 402 is not within the at least first range. Further, the sub-interval 408 covers the distinct values of the  first data dimension  3 and 4 and may be determined within the at least first range. Checking through the entire hierarchy of intervals, the server may determine that the sub-intervals 408 and 410 in Level 2 and the sub-intervals 414, 416, 418, 420, 422, and 424 in Level 3 are within the at least first range of the first data dimension.
At block 506, the server may determine whether the one interval is selected in an upper level of the binary hierarchy. Referring to the example diagram of FIG. 4, as the distinct values covered by each of the sub-intervals 416, 418, 420, and 422 in Level 3 are included in the sub-intervals 408 and 410, the server may determine that the sub-intervals 416, 418, 420, and 422 in Level 3 are already selected in the upper level, i.e., Level 2 of the binary hierarchy.
At block 508, the server may select the one interval as one of the plurality of sub-intervals in responses to the one interval being within the at least first range and not being selected in an upper level of the binary hierarchy. Referring to the example diagram of FIG. 4, the server may determine that the at least first range [2, 3, 4, 5, 6, 7] with respect to the first data dimension  indicated in the query includes a plurality of sub-intervals 408 with a value range of [3, 4] , 410 with a value range of [5, 6] , 414 with a value range of [2] and 424 with a value range of [7] .
It should be understood for those of ordinary skilled in the art that the processes described above are intended to be illustrative. In embodiments, a process may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Further, the order in which the operations of the process as illustrated in FIG. 5 and set forth above is not intended to be limiting.
FIG. 6 illustrates another example process 600 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure. The example process 600 as illustrated in FIG. 6 includes the blocks of  operations  302, 304, 306 and 308 that are similar to those described with reference to FIG. 3. Therefore, the blocks of  operations  302, 304, 306 and 308 are not described in detail herein.
At block 602, the server may estimate an analytical value of the first user data in each of the plurality of sub-intervals. The block 602 may be performed by the estimation processor 204 of FIG. 2. As described above, the at least first range over a first data dimension indicated in the query is determined to include a plurality of sub-intervals, and the query for an analytical result of user data in the entire first range is decomposed into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of sub-intervals. Each user receives the decomposed plurality of sub-queries for analytical value of first user data in the plurality of sub-intervals, instead of the initial query, i.e., the query for an analytical result of user data in the entire  first range. In response to each of the plurality of sub-queries, the user device transmits the analytical value of the first user data with respect to each of the plurality of sub-intervals encoded using an LDP algorithm. Upon receiving the encoded analytical value of the first user data with respect to each of the plurality of sub-intervals from the user, the server estimates the analytical value of the received user data in each of the plurality of sub-intervals using a weighted frequency estimator.
At block 604, the server may obtain the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the received user data in the plurality of sub-intervals. The block 604 may be performed by the estimation processor 204 of FIG. 2. Taking query q=Q T (SUM (M) , D∈ [l, r] ) as an example, the first range [l, r] may be decomposed into 2 (b-1) log bm (or less) disjoint sub-intervals, I 1, …I p, in the binary hierarchy I D. The query q may be decomposed into sub-queries corresponding to those sub-intervals. If every user responds to the server with the values in each of the sub-intervals in I D, in an LDP way, each sub-query q=Q T (SUM (M) , D∈I i) may be estimated (i=1, 2, …, p) by a weighted frequency oracle on
Figure PCTCN2019090837-appb-000004
as
Figure PCTCN2019090837-appb-000005
The query q may be answered by assembling the estimates for the p≤2 (b-1) log bm sub-queries, as shown in Equation (2) :
Figure PCTCN2019090837-appb-000006
The server may estimate the answer to query q by summing up the estimated answers for the plurality of sub-queries, each estimated using the weighted frequency estimator
Figure PCTCN2019090837-appb-000007
as shown in Equation (3) :
Figure PCTCN2019090837-appb-000008
Referring to FIG. 4, assuming the first dimension D 1 takes values in [1, 2, 3, 4, 5, 6, 7, 8] . A query q is received to obtain a summation of user data in a first range [2, 7] over the first dimension D 1. The query q may be denoted as q: SELECT SUM (M 1) FROM T WHERE D 1∈ [2, 7] . The first range [2, 7] may be decomposed into four  sub-intervals  408, 410, 414 and 424. Correspondingly, the query q may be rewritten as the sum of four sub-queries shown below:
q 1: SELECT SUM (M 1) FROM T WHERE D 1∈ [2, 2]
q 2: SELECT SUM (M 1) FROM T WHERE D 1∈ [3, 4]
q 3: SELECT SUM (M 1) FROM T WHERE D 1∈ [5, 6]
q 4: SELECT SUM (M 1) FROM T WHERE D 1∈ [7, 7]
Upon receiving all answers from the users with respect to the four sub-queries, the server may answer the initial query q by assembling estimates for the four sub-queries.
FIG. 7 illustrates another example diagram 700 for determining a plurality of first sub-intervals associated with the first range and a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure. FIG. 7 illustrates an example directed to a 2-dimensional MDA query. The query is directed to obtain the analytical result of user data in a first range over the first data dimension 702 and a second range over the second data dimension 704. The first range [2, 3, 4, 5, 6, 7] may be determined to include four  sub-intervals  408, 410, 414, and 424, similar to those described with reference to FIG. 4. The second range [3, 4, 5, 6, 7, 8] may be determined to include two sub-intervals 710 with a value range of [3, 4] and 712 with a value range of [5, 6, 7, 8] . A 2-dimensional hierarchy of intervals may be constructed in accordance with the first binary hierarchy of intervals and  the second binary hierarchy of intervals. In the 2-dimensional hierarchy of intervals, the four sub-intervals in the first data dimension and the two sub-intervals in the second data dimension may be taken to generate eight combinations of sub-intervals 706. Each of the sub-intervals 706 may correspond to a unique combination of one of the four sub-intervals in the first data dimension and one of the two sub-intervals in the second data dimension.
FIG. 8 illustrates another example process 800 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure.
At block 802, the server may receive a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension. The block 802 may be performed by the estimation processor 204 of FIG. 2. The query may be directed to obtain an analytical result of user data by aggregating data values in the first range over the first data dimension and the second range over the second data dimension under constraints on sensitive dimensions of the user data. According to the present embodiment, the first data dimension and the second data dimension may be sensitive dimensions. For example, the server may receive a query for an average purchase made by users between age 30 to 40 and with salary ranged between 50K to 70K, with reference to Table 1.
At block 804, the server may determine a plurality of first sub-intervals associated with the first range. The block 804 may be performed by the estimation processor 204 of FIG. 2. The determination of a plurality of first sub-intervals associated with the first range is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
At block 806, the server may determine a plurality of second sub-intervals associated with the second range. The block 806 may be performed by the estimation processor 204 of FIG. 2. In embodiments, the block 806 may be performed by the query decomposition module 1210 of FIG. 12. Referring to FIG. 7, the second range [3, 4, 5, 6, 7, 8] over the second data dimension may be partitioned into two sub-intervals 710 with a range of [3, 4] and 712 with a range of [5, 6, 7, 8] . According to the present embodiment, the plurality of second sub-intervals associated with the second range over the second data dimension may be determined using a binary hierarchy similar to those described with respect to the partitioning of the first range over the first dimension. It should be understood for those of ordinary skilled in the art that the above partitioning of data range is for illustrative purpose and the present disclosure is not intended to be limiting. The first range over the first data dimension and the second range over the second dimension may be partitioned using different techniques. Further, the operations of block 804 and block 806 may be performed in parallel or in a sequential order.
At block 808, the server may decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals. The block 808 may be performed by the estimation processor 204 of FIG. 2. As described above with reference to FIG. 4 and FIG. 5, a first dimension D 1 having m distinct values, in the order of z 1, z 2, …z m, may be constructed to a binary hierarchy
Figure PCTCN2019090837-appb-000009
with multiple levels, each level including equal sized sub-intervals. Taking a second dimension D 2 has m distinct values, in the order of z 1, z 2, …z m, as an example, the second dimension D 2 may be similarly  constructed to a binary hierarchy
Figure PCTCN2019090837-appb-000010
with multiple levels, each level including equal sized sub-intervals. A two-dimensional hierarchy can be constructed using Equation (4) :
Figure PCTCN2019090837-appb-000011
Each
Figure PCTCN2019090837-appb-000012
may denote a 2-dimensional level. There may be a total of {h+1}  2 2-dimensional levels in a 2-dimensional hierarchy. Each pair
Figure PCTCN2019090837-appb-000013
Figure PCTCN2019090837-appb-000014
may denote a 2-dimensional sub-interval. Referring to FIG. 7, the first dimension 702 having eight distinct values [1, 2, 3, 4, 5, 6, 7, 8] may be constructed to a first binary hierarchy
Figure PCTCN2019090837-appb-000015
with four levels, and the second dimension 704 having eight distinct values [1, 2, 3, 4, 5, 6, 7, 8] may be constructed to a second binary hierarchy
Figure PCTCN2019090837-appb-000016
with four levels. A 2-dimensional hierarchy constructed in accordance with the first binary hierarchy
Figure PCTCN2019090837-appb-000017
and the second binary hierarchy
Figure PCTCN2019090837-appb-000018
may include sixteen levels, among which, eight levels include the combinations of the plurality of first sub-intervals (i.e., 408, 410, 414 and 424) and the plurality of second sub-intervals (i.e., 708 and 710) . The server may further decompose the query into eight sub-queries, each corresponding to one of the eight combinations of the plurality of first sub-intervals (i.e., 408, 410, 414 and 424) and the plurality of second sub-intervals (i.e., 708 and 710) .
It should be understood for those of ordinary skilled in the art that the measures of the first dimension and the second dimension described above are for illustration purpose and the present disclosure is not intended to be limiting. The measures of the first dimension and the second dimension can be set as different in accordance to the actual data set being collected.
At block 810, the server may fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals. The block 810 may be performed by the estimation processor 204 of FIG. 2. In embodiments, the block 810 may be performed by the data fetching module 1212 of FIG. 12. As the query is decomposed into a plurality of sub-queries, the users at the client side may receive the decomposed plurality of sub-queries corresponding to the plurality of first sub-intervals and the plurality of second sub-intervals instead of receiving the initial query with respect to the entire first range over the first data dimension and the entire second range over the second data dimension. In response to each of the plurality of sub-queries, the users may transmit the analytical value of the first user data corresponding to a combination of one of the plurality of first sub-interval and one of the plurality of second sub-interval, under the constraint of LDP encoding algorithm.
At block 812, the server may compute the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals. The block 812 may be performed by the estimation processor 204 of FIG. 2. Similar to the operation described in the block 310 with reference to FIG. 3, the server may estimate the answer to each of the plurality of sub-queries using a weighted frequency estimator and estimate the answer to the initial query by summing up the answers to all of the plurality of sub-queries.
It should be understood for those of ordinary skilled in the art that the processes described above are intended to be illustrative. In embodiments, a process may be accomplished with one or more additional operations not  described, and/or without one or more of the operations discussed. Further, the order in which the operations of the process as illustrated in FIG. 8 and set forth above is not intended to be limiting.
FIG. 9 illustrates an example process 900 for determining a plurality of first sub-intervals associated with the first range in accordance with an embodiment of the present disclosure. The blocks described herein may be performed by the estimation processor 204 of FIG. 2.
At block 902, the server may pre-generate a first binary hierarchy of intervals with respect to the first data dimension. The operation of block 902 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
At block 904, the server may determine whether one interval in each level of the first binary hierarchy is within the first range. The operation of block 904 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
At block 906, the server may determine whether the interval is selected in an upper level of the first binary hierarchy. The operation of block 906 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
At block 908, the server may select the interval as one of the plurality of sub-intervals in response to the interval being within the first range and not being selected in an upper level of the first binary hierarchy. The operation of block 908 is similar to those described with reference to FIG. 4 and FIG. 5, and therefore, is not described in detail herein.
FIG. 10 illustrates another example process 1000 for determining a plurality of second sub-intervals associated with the second range in accordance with an embodiment of the present disclosure. The blocks described herein may be performed by the estimation processor 204 of FIG. 2.
At block 1002, the server may pre-generate a second binary hierarchy of intervals with respect to the second data dimension. With reference to FIG. 7, a second binary hierarchy of intervals with respect to the second data dimension 704 includes four levels of intervals. Level 0 of the second binary hierarchy includes eight distinct values [1, 2, 3, 4, 5, 6, 7, 8] and may be partitioned into two equal sized sub-intervals 708 and 710 in Level 1. Level 1 may be further partitioned into four equal sized sub-intervals 712, 714, 716 and 718 in Level 2. Level 2 may be even further partitioned into eight equal sized sub-intervals in Level 3, each has unit length.
At block 1004, the server may determine whether one interval in each level of the second binary hierarchy is within the second range. With reference to FIG. 7, the second range is denoted as [3, 4, 5, 6, 7, 8] , the server may determine that the sub-interval 708 in Level 1 with a range of [1, 2, 3, 4] is beyond the second range while the sub-interval 710 in Level 1 with a range of [5, 6, 7, 8] is within the second range. Further, the sub-intervals 714, 716 and 718 in Level 2 may be determined within the second range while the sub-interval 712 in Level 2 may be determined beyond the second range. Even further, the sub-intervals 720 and 722 in Level 3 may be determined beyond the second range while the rest sub-intervals in Level 3 may be determined within the second range.
At block 1006, the server may determine whether the one interval is selected in an upper level of the second binary hierarchy. With reference to FIG. 7, the server may further determine that the sub-intervals 710 in Level 1 and 714 in Level 2 are not selected in an upper level of the second binary hierarchy. Other sub-intervals, such as 716 and 718 in Level 2 and the sub-intervals in Level 3 other than 720 and 722, although are within the second range, may be determined to have been selected in an upper level of the second binary hierarchy.
At block 1008, the server may select the one interval as one of the plurality of sub-intervals in response to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy. With reference to FIG. 7, the server may determine that the second range can be partitioned into sub-intervals 710 and 714.
It should be understood for those of ordinary skilled in the art that the processes described above are intended to be illustrative. In embodiments, a process may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Further, the order in which the operations of the process as illustrated in FIG. 10 and set forth above is not intended to be limiting.
FIG. 11 illustrates another example process 1100 for answering MDA queries under LDP in accordance with an embodiment of the present disclosure. The example process as illustrated in FIG. 11 includes the blocks of  operations  802, 804, 806, 808 and 810 that are similar to those described with reference to FIG. 8. Therefore, the blocks of  operations  802, 804, 806, 808 and 810 are not described in detail herein.
The  blocks  1102 and 1104 may be performed by the estimation processor 204 of FIG. 2.
At block 1102, the server may estimate an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of second sub-intervals. According to the present embodiment, rather than receiving one query directed the entire first range and the entire second range, the users may receive a plurality of sub-queries directed to each of the combination of the plurality of first sub-intervals and the plurality of second sub-intervals. In response to each sub-query, the user device transmits an analytical value of the first user data corresponding to each combination of the plurality of first sub-intervals and the plurality of second sub-intervals under the constraint of the LDP encoding. At the server side, the LDP encoded analytical value of the first user data corresponding to each combination of the plurality of first sub-intervals and the plurality of second sub-intervals is embedded with noise, and thus, the server may estimate the analytical value as the answer to each sub-query. The techniques that the server estimates analytical value in answering the 2-dimensional MDA query may be similar to those described with respect to answering the 1-dimensional MDA query. It should be understood for those of ordinary skilled in the art that the examples described above are for illustration purpose and the present disclosure is not intended to be limiting. The techniques that the server estimates analytical value in answering the 2-dimensional MDA query can use different algorithms or models from those described with respect to answering the 1-dimensional MDA query.
At block 1104, the server may estimate the analytical result of user data in the first range over the first data dimension and the second range over  the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of second sub-intervals. After the server estimates the analytical value of the first user data corresponding to each combination of the plurality of first sub-intervals and the plurality of second sub-intervals from each user, an analytical result of user data in the first range over the first data dimension and the second range over the second data dimension, as initially requested may be answered by summing up all the analytical results associated with the sub-interval combinations.
FIG. 12 illustrates an example system 1200 for implementing the processes for answering MDA queries under LDP in accordance with an embodiment of the present disclosure. The techniques and mechanisms described herein may be implemented by multiple instances of the system 1200 as well as by any other computing device, system, and/or environment. The system 1200 shown in FIG. 12 is only one example of a system and is not intended to suggest any limitation as to the scope of use or functionality of any computing device utilized to perform the processes and/or procedures described above. Other well-known computing devices, systems, environments and/or configurations that may be suitable for use with the embodiments include, but are not limited to, personal computers, server computers (e.g., server 104-1 or 104-2 in FIG. 1) , hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, game consoles, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, implementations using field programmable gate  arrays ( “FPGAs” ) and application specific integrated circuits ( “ASICs” ) , and/or the like.
The system 1200 may include one or more processors 1202 and system memory 1204 communicatively coupled to the processor (s) 1202. The processor (s) 1202 may execute one or more modules and/or processes to cause the processor (s) 1202 to perform a variety of functions. In some embodiments, the processor (s) 1202 may include a central processing unit (CPU) , a graphics processing unit (GPU) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 1202 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
Depending on the exact configuration and type of the system 1200, the system memory 1204 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof. The system memory 1204 may include one or more computer-executable modules (modules) 1206 that are executable by the processor (s) 1202.
The modules 1206 may include, but are not limited to, a receiving module 1208, a query decomposition module 1210, a data fetching module 1212, and an analyzing module 1214. The receiving module 1208 may be configured to receive a query for an analytical result of user data in at least a first range over a first dimension as described with reference to FIG. 3. The query decomposition module 1210 may be configured to determine a plurality of sub-intervals associated with the at least first range and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries  corresponding to one of the plurality of first sub-intervals as described with reference to FIG. 3. The data fetching module 1212 may be configured to fetch, from user devices, first user data in the plurality of sub-intervals as described with reference to FIG. 3. The analyzing module 1214 may be configured to compute the analytic result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals as described with reference to FIG. 3.
The receiving module 1208 may be further configured to receive a query for an analytic result of user data in a first range over a first data dimension and a second range over a second data dimension as described with reference to FIG. 8.
The receiving module 1208 may perform a step in block 302 described above with reference to FIG. 3. The receiving module 1208 may further perform a step in block 802 described above with reference to FIG. 8.
The query decomposition module 1210 may be further configured to determine a plurality of first sub-intervals associated with the first range, determine a plurality of second sub-intervals associated with the second range, and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals as described with reference to FIG. 8.
The query decomposition module 1210 may be further configured to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the at least first range; determine whether the one interval is selected in an upper level of the binary  hierarchy; and select the interval as one of the plurality of sub-intervals in responses to the one interval being within the at least first range and not being selected in an upper level of the binary hierarchy as described with reference to FIG. 5.
The query decomposition module 1210 may be further configured to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the first range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy as described with reference to FIG. 9.
The query decomposition module 1210 may be further configured to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the second data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the second range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the binary hierarchy as described with reference to FIG. 10.
The data fetching module 1212 may be configured further to fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals as described with reference to FIG. 8.
The data fetching module 1212 may perform a step in block 308 described above with reference to FIG. 3. The data fetching module 1212 may perform a step in block 810 described above with reference to FIG. 8.
The analyzing module 1214 may be further configured to compute an analytical value of the first user data in each of the plurality of sub-intervals and obtain the analytic result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals as described with reference to FIG. 6.
The analyzing module 1214 may be further configured to compute the analytic result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals as described with reference to FIG. 8.
The analyzing module 1214 may be further configured to compute an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and obtain the analytic result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals as described with reference to FIG. 11.
The analyzing module 1214 may perform a step in block 310 described above with reference to FIG. 3. The analyzing module 1214 may perform steps in  blocks  602 and 604 described above with reference to FIG. 6. The analyzing module 1214 may perform a step in block 812 described above  with reference to FIG. 8. The analyzing module 1214 may perform steps in  blocks  1102 and 1104 described above with reference to FIG. 11.
The system 1200 may additionally include an input/output (I/O) interface 1216 for receiving data associated with the process described above, such as query for an analytical result of user data from user 102 as illustrated in FIG. 1, and for outputting the processed data, such as transmitting the decomposed a plurality of sub-queries to the user devices 106 as illustrated in FIG. 1. The system 1200 may also include a communication module 1218 allowing the system 1200 to communicate with other devices (e.g., the user devices 106, the database 110, the cloud storage 112 as illustrated in FIG. 1) over a network (e.g., the network 108 as illustrated in FIG. 1) . The network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF) , infrared, and other wireless media.
Some or all operations of the methods described above can be performed by execution of computer-readable instructions stored on a computer-readable storage medium, as defined below. The term “computer-readable instructions” as used in the description and claims, include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
The computer-readable storage media may include volatile memory (such as random-access memory (RAM) ) and/or non-volatile memory (such as read-only memory (ROM) , flash memory, etc. ) . The computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
A non-transient computer-readable storage medium is an example of computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media. Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, phase change memory (PRAM) , static random-access memory (SRAM) , dynamic random-access memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read-only memory (EEPROM) , flash memory or other memory technology, compact disk read-only memory (CD-ROM) , digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or  other transmission mechanism. As defined herein, computer-readable storage media do not include communication media.
The computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGs. 1-12. Generally, computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
EXAMPLE CLAUSES
A. A method comprising: receiving a query for an analytical result of user data in at least a first range over a first data dimension; determining a plurality of sub-intervals associated with the at least first range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; fetching, from user devices, first user data in the plurality of sub-intervals; and computing the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
B. The method as recited in paragraph A, wherein determining a plurality of sub-intervals associated with the at least first range comprises: pre-generating a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determining whether one interval in each of the plurality of levels of the binary hierarchy is within the first range; determining whether the one interval is selected in an upper level of the binary hierarchy; and selecting the one interval as one of the plurality of sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy.
C. The method as recited in paragraph B wherein pre-generating a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension further comprises: sorting values associated with the first data dimension; and recursively partitioning the values into two equal sized intervals until the interval is a unit length.
D. The method as recited in paragraph A, wherein computing the analytical result of the user data in at least the first range over the first data  dimension in accordance with the first user data in the plurality of sub-intervals further comprises: estimating an analytical value of the first user data in each of the plurality of sub-intervals; and obtaining the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals.
E. The method as recited in paragraph A, wherein the first user data in the plurality of sub-intervals is encoded at the user devices using a local differential privacy (LDP) algorithm.
F. A method comprising: receiving a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension; determining a plurality of first sub-intervals associated with the first range; determining a plurality of second sub-intervals associated with the second range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals; fetching, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals; and computing the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
G. The method as recited in paragraph F, wherein determining a plurality of first sub-intervals associated with the first range comprises: pre-generating a first binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determining whether one interval in each of the plurality of levels of the first binary hierarchy is within the first range;  determining whether the one interval is selected in an upper level of the first binary hierarchy; and selecting the one interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the first binary hierarchy.
H. The method as recited in paragraph F, wherein determining a plurality of second sub-intervals associated with the second range comprises: pre-generating a second binary hierarchy of intervals having a plurality of levels with respect to the second data dimension; determining whether one interval in each of the plurality of levels of the second binary hierarchy is within the second range; determining whether the one interval is selected in an upper level of the second binary hierarchy; and selecting the one interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy.
I. The method as recited in paragraph F, wherein computing the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals further comprises: estimating an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and obtaining the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of second sub-intervals.
J. The method as recited in paragraph F, wherein the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals is encoded at the user devices using a local differential privacy (LDP) algorithm.
K. A system comprising: one or more processors, and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: a receiving module configured to receive a query for an analytical result of user data in at least a first range over a first data dimension; a query processing module configured to: determine a plurality of sub-intervals associated with the at least first range; and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; a data fetching module configured to fetch, from user devices, first user data in the plurality of sub-intervals; and an analyzing module configured to compute the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
L. The system as recited in paragraph K, wherein the query processing module is further configured to: pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the binary hierarchy is within the first range; determine whether the one interval is selected in an upper level of the binary hierarchy; and select the one interval as one of the plurality of sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy.
M. The system as recited in paragraph L, wherein to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension, the query processing module is further configured to: sort values associated with the first data dimension; and recursively partition the values into two equal sized intervals until the interval is a unit length.
N. The system as recited in paragraph K, wherein to compute the analytical result of the user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals, the analyzing module configured to: estimate an analytical value of the first user data in each of the plurality of sub-intervals; and obtain the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals.
O. The system as recited in paragraph K, wherein the first user data in the plurality of sub-intervals is encoded at the user devices using a local differential privacy (LDP) algorithm.
P. A system comprising: one or more processors, and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: a receiving module configured to receive a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension; a query processing module configured to: determine a plurality of first sub-intervals associated with the first range; determine a plurality of second sub-intervals  associated with the second range; and decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals; a data fetching module configured to fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals; and an analyzing module configured to compute the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
Q. The system as recited in paragraph P, wherein to determine a plurality of first sub-intervals associated with the first range, the query processing module is further configured to: pre-generate a first binary hierarchy of intervals having a plurality of levels with respect to the first data dimension; determine whether one interval in each of the plurality of levels of the first binary hierarchy is within the first range; determine whether the one interval is selected in an upper level of the first binary hierarchy; and select the one interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the first binary hierarchy.
R. The system as recited in paragraph P, wherein to determine a plurality of second sub-intervals associated with the second range, the query processing module is further configured to: pre-generate a second binary hierarchy of intervals having a plurality of levels with respect to the second data dimension; determine whether one interval in each of the plurality of levels of the second binary hierarchy is within the second range; determine whether the  one interval is selected in an upper level of the second binary hierarchy; and select the one interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy.
S. The system as recited in paragraph P, wherein to compute the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals, the analyzing module configured to: estimate an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and obtain the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
T. A computer-readable storage medium storing computer-readable instructions executable by one or more processors of a video compression system, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving a query for an analytical result of user data in at least a first range over a first data dimension; determine a plurality of sub-intervals associated with the at least first range; decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals; fetching, from user devices, first user data in the plurality of sub-intervals; and computing the analytical result of user data in at least the first range over the first data  dimension in accordance with the first user data in the plurality of sub-intervals.
CONCLUSION
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.

Claims (20)

  1. A method comprising:
    receiving a query for an analytical result of user data in at least a first range over a first data dimension;
    determining a plurality of sub-intervals associated with the at least first range;
    decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals;
    fetching, from user devices, first user data in the plurality of sub-intervals; and
    computing the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
  2. The method of claim 1, wherein determining a plurality of sub-intervals associated with the at least first range comprises:
    pre-generating a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension;
    determining whether one interval in each of the plurality of levels of the binary hierarchy is within the at least first range;
    determining whether the one interval is selected in an upper level of the binary hierarchy; and
    selecting the one interval as one of the plurality of sub-intervals in responses to the one interval being within the at least first range and not being selected in an upper level of the binary hierarchy.
  3. The method of claim 2, wherein pre-generating the binary hierarchy of intervals having a plurality of levels with respect to the first data dimension further comprises:
    sorting values associated with the first data dimension; and
    recursively partitioning the values into two equal sized intervals until the interval is a unit length.
  4. The method of claim 1, wherein computing the analytical result of the user data in the at least first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals further comprises:
    estimating an analytical value of the first user data in each of the plurality of sub-intervals; and
    obtaining the analytical result of user data in the at least first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals.
  5. The method of claim 1, wherein the first user data in the plurality of sub-intervals is encoded at the user devices using a local differential privacy (LDP) algorithm.
  6. A method comprising:
    receiving a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension;
    determining a plurality of first sub-intervals associated with the first range;
    determining a plurality of second sub-intervals associated with the second range;
    decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals;
    fetching, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals; and
    computing the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  7. The method of claim 6, wherein determining a plurality of first sub-intervals associated with the first range comprises:
    pre-generating a first binary hierarchy of intervals having a plurality of levels with respect to the first data dimension;
    determining whether one interval in each of the plurality of levels of the first binary hierarchy is within the first range;
    determining whether the one interval is selected in an upper level of the first binary hierarchy; and
    selecting the one interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the first binary hierarchy.
  8. The method of claim 6, wherein determining a plurality of second sub-intervals associated with the second range comprises:
    pre-generating a second binary hierarchy of intervals having a plurality of levels with respect to the second data dimension;
    determining whether one interval in each of the plurality of levels of the second binary hierarchy is within the second range;
    determining whether the one interval is selected in an upper level of the second binary hierarchy; and
    selecting the interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range and not being selected in an upper level of the second binary hierarchy.
  9. The method of claim 6, wherein computing the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the received user data in the plurality of first sub-intervals and the plurality of the second sub-intervals further comprises:
    estimating an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and
    obtaining the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of second sub-intervals.
  10. The method of claim 6, wherein the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals is encoded at the user devices using a local differential privacy (LDP) algorithm.
  11. A system comprising:
    one or more processors, and
    memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including:
    a receiving module configured to receive a query for an analytical result of user data in at least a first range over a first data dimension;
    a query processing module configured to:
    determine a plurality of sub-intervals associated with the at least first range; and
    decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals;
    a data fetching module configured to fetch, from user devices, first user data in the plurality of sub-intervals; and
    an analyzing module configured to compute the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
  12. The system of claim 11, wherein the query processing module is further configured to:
    pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension;
    determine whether one interval in each of the plurality of levels of the binary hierarchy is within the first range;
    determine whether the one interval is selected in an upper level of the binary hierarchy; and
    select the interval as one of the plurality of sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the binary hierarchy.
  13. The system of claim 12, wherein to pre-generate a binary hierarchy of intervals having a plurality of levels with respect to the first data dimension, the query processing module is further configured to:
    sort values associated with the first data dimension; and
    recursively partition the values into two equal sized intervals until the interval is a unit length.
  14. The system of claim 11, wherein to compute the analytical result of the user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals, the analyzing module configured to:
    estimate an analytical value of the first user data in each of the plurality of sub-intervals; and
    obtain the analytical result of user data in at least the first range over the first data dimension by summing the estimated analytical values of the first user data in the plurality of sub-intervals.
  15. The system of claim 11, wherein the first user data in the plurality of sub-intervals is encoded at the user devices using a local differential privacy (LDP) algorithm.
  16. A system comprising:
    one or more processors, and
    memory communicatively coupled to the one or more processors, the memory storing computer-executable modules  executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including:
    a receiving module configured to receive a query for an analytical result of user data in a first range over a first data dimension and a second range over a second data dimension;
    a query processing module configured to:
    determine a plurality of first sub-intervals associated with the first range;
    determine a plurality of second sub-intervals associated with the second range; and
    decompose the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals and one of the plurality of second sub-intervals;
    a data fetching module configured to fetch, from user devices, first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals; and
    an analyzing module configured to compute the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  17. The system of claim 16, wherein to determine a plurality of first sub-intervals associated with the first range, the query processing module is further configured to:
    pre-generate a first binary hierarchy of intervals having a plurality of levels with respect to the first data dimension;
    determine whether one interval in each of the plurality of levels of the first binary hierarchy is within the first range;
    determine whether the one interval is selected in an upper level of the first binary hierarchy; and
    select the one interval as one of the plurality of first sub-intervals in responses to the one interval being within the first range and not being selected in an upper level of the first binary hierarchy.
  18. The system of claim 16, wherein to determine a plurality of second sub-intervals associated with the second range, the query processing module is further configured to:
    pre-generate a second binary hierarchy of intervals having a plurality of levels with respect to the second data dimension;
    determine whether one interval in each of the plurality of levels of the second binary hierarchy is within the second range;
    determine whether the one interval is selected in an upper level of the second binary hierarchy; and
    select the one interval as one of the plurality of second sub-intervals in responses to the one interval being within the second range  and not being selected in an upper level of the second binary hierarchy.
  19. The system of claim 16, wherein to compute the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension in accordance with the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals, the analyzing module configured to:
    estimate an analytical value of the first user data in each of the plurality of first sub-intervals and each of the plurality of the second sub-intervals; and
    obtain the analytical result of the user data in the first range over the first data dimension and the second range over the second data dimension by summing the estimated analytical values of the first user data in the plurality of first sub-intervals and the plurality of the second sub-intervals.
  20. A computer-readable storage medium storing computer-readable instructions executable by one or more processors of a video compression system, that when executed by the one or more processors, cause the one or more processors to perform operations comprising:
    receiving a query for an analytical result of user data in at least a first range over a first data dimension;
    determining a plurality of sub-intervals associated with the at least first range;
    decomposing the query into a plurality of sub-queries, each of the plurality of sub-queries corresponding to one of the plurality of first sub-intervals;
    fetching, from user devices, first user data in the plurality of sub-intervals; and
    computing the analytical result of user data in at least the first range over the first data dimension in accordance with the first user data in the plurality of sub-intervals.
PCT/CN2019/090837 2019-06-12 2019-06-12 Method and system for answering multi-dimensional analytical queries under local differential privacy WO2020248150A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980096293.9A CN113811868A (en) 2019-06-12 2019-06-12 Method and system for responding multidimensional analysis query under local differential privacy
PCT/CN2019/090837 WO2020248150A1 (en) 2019-06-12 2019-06-12 Method and system for answering multi-dimensional analytical queries under local differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/090837 WO2020248150A1 (en) 2019-06-12 2019-06-12 Method and system for answering multi-dimensional analytical queries under local differential privacy

Publications (1)

Publication Number Publication Date
WO2020248150A1 true WO2020248150A1 (en) 2020-12-17

Family

ID=73781150

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090837 WO2020248150A1 (en) 2019-06-12 2019-06-12 Method and system for answering multi-dimensional analytical queries under local differential privacy

Country Status (2)

Country Link
CN (1) CN113811868A (en)
WO (1) WO2020248150A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588358A (en) * 2004-08-26 2005-03-02 陈红 Treating method and system for MDX multidimensional data search statement
CN102682118A (en) * 2012-05-15 2012-09-19 北京久其软件股份有限公司 Multidimensional data model access method and device
US20170235779A1 (en) * 2014-09-30 2017-08-17 Hewlett Packard Enterprise Development Lp Processing query of database and data stream
US20180113902A1 (en) * 2016-10-25 2018-04-26 International Business Machines Corporation Query parallelism method
CN108388579A (en) * 2018-01-19 2018-08-10 复旦大学 A kind of range query method based on attribute on multi-dimensional orthogonal region
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
US20190057133A1 (en) * 2017-08-15 2019-02-21 Salesforce.Com, Inc. Systems and methods of bounded scans on multi-column keys of a database
CN109726225A (en) * 2019-01-11 2019-05-07 广东工业大学 A kind of storage of distributed stream data and querying method based on Storm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588358A (en) * 2004-08-26 2005-03-02 陈红 Treating method and system for MDX multidimensional data search statement
CN102682118A (en) * 2012-05-15 2012-09-19 北京久其软件股份有限公司 Multidimensional data model access method and device
US20170235779A1 (en) * 2014-09-30 2017-08-17 Hewlett Packard Enterprise Development Lp Processing query of database and data stream
US20180113902A1 (en) * 2016-10-25 2018-04-26 International Business Machines Corporation Query parallelism method
US20190057133A1 (en) * 2017-08-15 2019-02-21 Salesforce.Com, Inc. Systems and methods of bounded scans on multi-column keys of a database
CN108388579A (en) * 2018-01-19 2018-08-10 复旦大学 A kind of range query method based on attribute on multi-dimensional orthogonal region
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN109726225A (en) * 2019-01-11 2019-05-07 广东工业大学 A kind of storage of distributed stream data and querying method based on Storm

Also Published As

Publication number Publication date
CN113811868A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
US10540519B2 (en) Differentially private linear queries on histograms
CN113557512B (en) Secure multi-party arrival frequency and frequency estimation
US11501008B2 (en) Differential privacy using a multibit histogram
US9875277B1 (en) Joining database tables
US11790116B2 (en) Systems and methods for privacy preserving determination of intersections of sets of user identifiers
WO2016122513A1 (en) Data analytics on encrypted data elements
JP2016531513A (en) Method and apparatus for utility-aware privacy protection mapping using additive noise
WO2015157020A1 (en) Method and apparatus for sparse privacy preserving mapping
US11023594B2 (en) Locally private determination of heavy hitters
Mascetti et al. Spatial generalisation algorithms for LBS privacy preservation
US10546032B2 (en) System and method for association rule mining from encrypted databases
CN107005576B (en) Generating bridging matching identifiers for linking identifiers from server logs
US10824739B2 (en) Secure data aggregation in databases using static shifting and shifted bucketization
US10839087B2 (en) Secure data aggregation in databases using dynamic bucketization and grouping bucketization
EP4097618A1 (en) Privacy preserving machine learning for content distribution and analysis
WO2020248150A1 (en) Method and system for answering multi-dimensional analytical queries under local differential privacy
Zheng The differential privacy of Bayesian inference
Cheng et al. Enabling secure and efficient kNN query processing over encrypted spatial data in the cloud
Liu et al. A general framework for privacy-preserving of data publication based on randomized response techniques
US11803650B1 (en) Column access control
WO2020248149A1 (en) Data sharing and data analytics implementing local differential privacy
US20240005022A1 (en) Privacy-preserving dataset sketches that can be joined non-interactively
Wu Privacy preserving data mining with unidirectional interaction
US11616765B2 (en) Practical private algorithms for robust statistics
US20210312221A1 (en) Geographic dataset preparation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932562

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932562

Country of ref document: EP

Kind code of ref document: A1