CN116451278A - Star-connection workload query privacy protection method, system, equipment and medium - Google Patents

Star-connection workload query privacy protection method, system, equipment and medium Download PDF

Info

Publication number
CN116451278A
CN116451278A CN202310725211.0A CN202310725211A CN116451278A CN 116451278 A CN116451278 A CN 116451278A CN 202310725211 A CN202310725211 A CN 202310725211A CN 116451278 A CN116451278 A CN 116451278A
Authority
CN
China
Prior art keywords
query
star connection
workload
star
noisy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310725211.0A
Other languages
Chinese (zh)
Inventor
张亮
曹晓光
李娇娇
宋江龙
吴世山
李艾功
赵力文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shiping Information & Technology Co ltd
Original Assignee
Hangzhou Shiping Information & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shiping Information & Technology Co ltd filed Critical Hangzhou Shiping Information & Technology Co ltd
Priority to CN202310725211.0A priority Critical patent/CN116451278A/en
Publication of CN116451278A publication Critical patent/CN116451278A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A star connection workload inquiry privacy protection method, a system, equipment and a medium, wherein the privacy protection method comprises the following steps: forming a star connection workload query set by a plurality of star connection queries facing the data warehouse; partitioning the star connection workload query set according to query attributes; respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy; respectively carrying out noise adding on each attribute query strategy by utilizing a differential privacy mechanism; deducing a noisy query which noisy each attribute query policy; and merging and aggregating each noisy query to obtain a noisy star connection workload query set, querying a data warehouse by using the noisy star connection workload query set, and transmitting the obtained query result to an untrusted data analyzer. The invention can search the relativity of the query interval to the greatest extent, reduce the global sensitivity of the connection query operation, effectively strengthen the expansibility and reduce the calculation cost.

Description

Star-connection workload query privacy protection method, system, equipment and medium
Technical Field
The invention belongs to the technical field of data analysis privacy protection, and particularly relates to a star connection workload inquiry privacy protection method, system, equipment and medium.
Background
In recent years, with the generation of "blowout" of data, in the digital economic age, data becomes a new key production element, meaning and value of the data are not in the data itself, but in that analysis results of the data can become a new driver of enterprise business decisions. The enterprise serves as a service side, the service quality is improved by collecting data analysis data, and the user experience is improved. The data analyzer sends out a query request to the trusted server, and the server responds and returns a query result. Therefore, the unreliable data analyst is very likely to infer the privacy information of the user from a plurality of query results, even mine the user information deeply contained in the data, and the risk of privacy disclosure exists. The main reason for privacy disclosure is that the server returns its completely true results directly from the query request of the data analyst. In order to solve the problem of privacy disclosure, the differential privacy model can not only avoid the inference of an untrusted data analyst by denoising the results, but also disregard the background knowledge of the analyst. While star-connected queries are one of the typical applications of relational databases, their related research efforts for differential privacy are relatively few.
The main problem of differential privacy application to star connection workload queries is how to respond to star connection workload queries while protecting user privacy. However, star connection queries are highly sensitive to data, and the effect of adding or deleting a piece of data on the star connection query is related to the number of dimension tables, O (N (N-1)), where N is a threshold value for each dimension, and N represents the number of dimension tables (reference [1 ]). The scheme of reference [1] proposes that for an n-way join query, the magnitude of its sensitivity increases exponentially with the change in n, thus decreasing its magnitude by a variant of local sensitivity, but the star join query is a join query with a large number of foreign key constraints, and the scheme of reference 1 cannot be directly applied. In star connection queries, one fact table is connected with a plurality of dimension tables, and a plurality of dimension tables are involved, and a workload query is usually composed of a large number of single star connection queries, so that the differential privacy scheme of the workload query under the star connection queries often has the problems of high global sensitivity and high noise, and therefore, the availability of data is often low.
Aiming at workload inquiry, the current differential privacy scheme generally adopts a matrix mechanism aiming at the problem of overlarge global sensitivity, firstly performs dimension reduction according to an inquiry matrix to form an inquiry strategy, effectively reduces the correlation between inquiry, generally has sensitivity of 1, then performs noise addition on the response result of the inquiry strategy, and deduces the noise addition result of the original inquiry response (reference [2 ]). The solution of reference [2] proposes to optimize the set of linear count queries under differential privacy using a matrix mechanism. For the problem of overlarge global sensitivity of star connection query, currently, a differential privacy scheme generally adopts a method for solving local sensitivity (such as local sensitivity, smooth sensitivity, elastic sensitivity, residual sensitivity and the like) as an alternative scheme for approximating global sensitivity. However, these computing approaches still have some drawbacks: local sensitivity depends on data, so differential privacy conditions cannot be met when used to verify noisy operations; the smoothed sensitivity, the elastic sensitivity and the residual sensitivity are calculated on the upper limit value of the local sensitivity, and the value of the smoothed sensitivity is larger although the calculation cost is smaller, wherein the value of the smoothed sensitivity is minimum but the calculation cost is high. Thus, these approaches to global sensitivity are not directly applicable to star connection workload queries, and a differential privacy approach more suitable for star connection workload queries may need to be explored to address this issue.
Meanwhile, the star connection workload inquiry can involve a plurality of dimension tables, sensitivity can be increased along with expansion of the dimension tables, noise is increased along with the expansion of the dimension tables, and availability of inquiry results is reduced rapidly. In addition, the differential privacy scheme for star connection queries is generally based on the process of connection first, then query last, adding noise, the addition of dimension tables can result in high computational overhead. However, the above scheme only considers the calculation mode of sensitivity, and cannot solve the problem caused by the dimension table expansion.
[1] Wei Dong and KeYi. A nearly instance-optimal differentially private mechanism for conjunctive queries. In PODS 2022.
[2] Hay, Michael, McGregor, et al. The matrix mechanism: optimizing linear counting queries under differential privacy[J]. Vldb Journal the International Journal of Very Large Data Bases, 2015.
Disclosure of Invention
Aiming at the problems of low availability, poor expansibility and huge calculation cost of query results in the prior art, the invention provides a star-type connection workload query privacy protection method, a system, equipment and a medium, which are used for searching the correlation of a query interval to the greatest extent, reducing the global sensitivity of connection query operation, effectively enhancing the expansibility and reducing the calculation cost.
In order to achieve the above purpose, the present invention has the following technical scheme:
in a first aspect, a method for protecting query privacy of a star connection workload is provided, including the following steps:
forming a star connection workload query set by a plurality of star connection queries facing the data warehouse;
partitioning the star connection workload query set according to query attributes;
respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
respectively carrying out noise adding on each attribute query strategy by utilizing a differential privacy mechanism;
deducing a noisy query which noisy each attribute query policy;
and merging and aggregating each noisy query to obtain a noisy star connection workload query set, querying a data warehouse by using the noisy star connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
As a preferred embodiment, the data warehouse isComprises->Dimension table->And a fact table->Each dimension table is associated with a fact table>Connected, the whole presents a star-shaped connection mode; the->Star connection inquiry->Composing a Star connection workload Inquiry set +.>The method comprises the steps of carrying out a first treatment on the surface of the Star connection inquiry->Screening conditions of (a) for various dimensions, star connection query +.>Expressed in terms of predicate of query as +.>Predicate of any one of the queries +.>Representation->In dimension table->Inquiry interval of->For the query scope lower limit, +.>Is the upper limit of the query range.
In a preferred embodiment, in the step of partitioning the star connection workload query set according to query attributes, each unit of the star connection workload query set W is the first unitThe individual star connection queries are in dimension table +.>Query scope on the star connection workload query set +.>In the formula->Representing Star connection query in dimension table +.>The query interval referred to above; splitting the star connection workload query set W according to a dimension table of a data warehouse to obtain +.>Individual blocks, respectively->First->Individual block->Denoted as->Then->Indicate->The individual queries are in dimension tablesA query interval above.
As a preferable scheme, in the step of respectively carrying out dimension reduction on each block of the partitioned star connection workload query set, a query strategy is foundSo that there is a solution matrix +.>Block->All query sections in (a) can be queried with query policy +.>The linear combination of the mid-query intervals represents: />,/>
As a preferred scheme, the step of using the differential privacy mechanism to noise each attribute query policy respectively, and the query policy isAdd-on pullLaplace noise->In the formula->For inquiry policy->Sensitivity of->For privacy budget, th->Individual block->Is>Dimension interval of individual queries is->,/>Query strategy after perturbation->Middle->The noise adding query interval is->
As a preferable scheme, when the deducing the noisy query which is noisy for each attribute query strategy, the noisy query is processed through the perturbed query strategyAnd solution matrix->Calculating to obtain star-shaped connection workload query set W in dimension table +.>Noise-added query->
As a preferable scheme, in the step of merging and aggregating each noisy query, the star connection workload query set W is arranged in a dimension tableNoise-added query->Merging and aggregating to obtain noisy star-type connection workload query setStar connection workload query set using noise addition +.>Query data warehouse->Get query result +.>To an untrusted data analyst.
In a second aspect, a star connection workload query privacy protection system is provided, comprising:
the query set construction module is used for constructing a plurality of star connection queries facing the data warehouse into a star connection workload query set;
the query set blocking module is used for blocking the star-shaped connection workload query set according to the query attribute;
the attribute query strategy acquisition module is used for respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
the attribute query strategy noise adding module is used for respectively adding noise to each attribute query strategy by utilizing a differential privacy mechanism;
the noisy query deducing module is used for deducing noisy queries which are noisy for each attribute query strategy;
and the data warehouse noise adding query module is used for merging and aggregating each noise adding query to obtain a noise adding star-shaped connection workload query set, querying the data warehouse by using the noise adding star-shaped connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
In a third aspect, there is provided an electronic device comprising: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the star connection workload inquiry privacy protection method.
In a fourth aspect, a computer readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the star connection workload query privacy preserving method is provided.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention blocks a star connection workload query set according to query attributes, sections under the same query dimension have similarity, dimension sections under a single query dimension are designed to be reduced, and the correlation of the query sections is explored to the greatest extent. Aiming at the situation that the global sensitivity of the star-shaped connection query differential privacy scheme is overlarge, the method adopts a mode of disturbing each dimension query interval, and reduces the global sensitivity of the connection query operation. And the expansibility is enhanced, and meanwhile, the invention can be disturbed in a parallel mode, so that the calculation cost is reduced. In addition, the invention can achieve better privacy protection effect by controlling the disturbance degree of each dimension.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of star connection of a data warehouse in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a scenario application of a star connection workload query privacy protection method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for protecting query privacy of a star connection workload according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Suppose data warehouseComprises->Dimension table->And a fact table->Each dimension table is constrained with the fact table by an external key +.>Connected, the whole presents a star connection mode, as shown in figure 1. Star connection queries generally refer to queries performed on data warehouse in star connection mode, which are types of queries commonly found in data warehouse applications. The screening conditions for star connection queries are typically directed to various dimension tables, and then aggregation is performed on the fact tables. The large number of star connection queries form a set of star connection workload queries. The data involved in the query contains private information,in practical applications, the sensitive information must be protected, and differential privacy has become a popular scheme for protecting data due to the strict privacy guarantee provided by the differential privacy. The differential privacy avoids the risk of privacy disclosure in a noise adding mode, and the common differential privacy mechanism is a Laplacian mechanism, namely, the query result is added with a Laplacian distribution obeying mode +.>The Laplace distribution size of which is positively correlated with the query sensitivity.
The invention constructs a workload inquiry set by a large number of inquiry under star connection facing to the data warehouse, blocks the inquiry set according to inquiry attributes, reduces the dimension of each block to form a single attribute inquiry strategy, deduces the original attribute noisy inquiry by adding noise to the single attribute inquiry strategy, and finally combines to obtain the noisy workload inquiry set. As shown in FIG. 2, the scenario application of the technical scheme of the invention is that an untrusted data analyzer provides a large number of star connection queries, namely a star connection workload query set W, to a server, and the server performs differential privacy processing to return a noisy response result to the data analyzer so as to achieve the effect of privacy protection. The main process of the differential privacy scheme is that the query strategy is divided into single attribute according to the blocking dimension reduction of the query set, then the single attribute query strategy is denoised, the denoised original attribute query is deduced, finally the query is combined, the whole process is shown in figure 3, when the server receives the star connection workload query set W, the server firstly blocks W into columns (dimension tables)Generating inquiry strategies according to block dimension reduction>And then the differential privacy mechanism is utilized to respectively treat +.>Adding noise, and deducing ++according to the dimension-reducing formula>Noise-added query->Finally, will->Merge into noisy workload query>And responds to it to obtain the final perturbation result and sends the result to the untrusted data analyst.
The star connection workload inquiry privacy protection method of the embodiment of the invention mainly comprises the following four processes:
a) Blocking, i.e.,/>Is the number of dimensions.
b) Dimension reduction by block, i.e,/>Is a query strategy after dimension reduction.
c) By block query noise addition, i.e
d) Block-wise inferred noisy queries, i.e
e) Merging aggregate noisy queries,/>
Suppose a data warehouseIs->Dimension table->The threshold size of each dimension table attribute is set to +.>. The data analyst sends +_ to the server>Star connection inquiry->I.e. star connection workload query set +.>. Query due to star connection>Screening conditions of (a) for various dimensions, star connection query +.>Predicate representation of a query can be usedWherein->Representation->In dimension table->Query section (+)>For the query scope lower limit, +.>Upper limit of query range). Thus, query set W is composed of +.>Star connection inquiryqIs composed of query predicates. For example, the +.>Star connection inquiry->By->Dimension interval formation of individual predicates +.>The element of the query set W may be denoted +.>I.e.>The individual star connection queries are in dimension table +.>Query scope on the table. If dimension table->In the query dimension, then->Taking the query dimension interval, if the dimension table +.>Not in query dimension, interval +.>Middle->Is the minimum value of attribute threshold, +.>Is the maximum value of the attribute domain value. Each row of the working query set W represents a single star connection query +.>Each column represents all query intervals of a certain dimension tableC
Each of these processes is further described below:
a) Block division
The star connection workload query set W is partitioned. Each unit of W isThe individual star connection queries are in dimension table +.>Query scope above, query set->Wherein->Representing all star-connected queries in query set W in dimension table +.>The query section referred to above. Due to data warehouse->Each dimension table is mutually independent, the query set W can be split according to the dimension table, and the +.>Individual blocks, i.e.)>. First->Individual block->Can be expressed asThen->Indicate->The individual queries are in dimension table->A query interval above.
b) Dimension reduction by blocks
Because the dimension tables are independent of each other, query predicates for each dimension tableDifferential privacy processing is performed. Whereas queries in the star-connected workload query set W tend to be related, i.e. +.>Is->There is often a correlation and thus the addition of noise can be reduced by means of dimension reduction. Defining a set of queries, i.e. query policies +.>,/>All query intervals in (a) can be used +.>A linear combination representation of the mid-query interval. I.e. find a query strategy +.>So that there is a solution matrix +.>Can get->,/>
c) Per block query noise addition
Can be obtained by dimension reductionCorresponding query policy->Query policy->Implementing the Laplace mechanism to satisfy differential privacy, i.e. +.>Adding Laplace noise->。/>For inquiry policy->Is used for the sensitivity of (a). Because of this->The dimension tables are independent of each other, and for each dimension, a record change affects +.>Intervals, thus each query policy +.>Is all of global sensitivityN. First->Individual block->Is>Dimension interval of individual queries is->. The privacy cost is evenly divided into two parts which are respectively used for the left end point and the right end point, and the inquiry strategy after disturbance is +.>Middle->The noise adding query interval is->Wherein
d) Block-wise inferred noisy queries
Due to the presence of a solution matrixSo that->Query strategy by adding noise->Deducing noisy query predicates>I.e. star connection workload query set W is +.>The noisy predicate on the dimension table is +.>
e) Merging aggregate noisy queries
Each is put intoMerging to get noisy star connection workload query +.>Querying a data warehouse with the set of queries +.>And get the final query result +.>To an untrusted data analyst.
The problems mainly existing in the existing star connection workload inquiry include: (1) query results have low availability. The data is highly sensitive to star connection query operation, excessive noise is introduced to query results, meanwhile, the workload queries often have correlation, the sensitivity is increased again, the noise is increased, and the availability is low. (2) poor expansibility. Star connection queries typically include connections between multiple dimension tables, and when the number of dimension tables is increased, the time complexity required for performing sensitivity calculations also increases exponentially, so current solutions face certain limitations in processing large star connection queries. (3) the computational overhead is enormous. The existing differential privacy scheme generally performs connection operation of the dimension table, and then adds noise for protection. Since star connection queries typically involve multiple dimension tables, the computational overhead tends to be very high.
The scheme of reference [1] proposes that for an n-way join query, the magnitude of its sensitivity increases exponentially with the change in n, thus decreasing its magnitude by a variant of local sensitivity, but the star join query is a join query with a large number of foreign key constraints, and cannot be directly applied to the scheme of [1 ]. Compared with reference [1], the method of the invention provides a differential privacy mechanism for adding noise to query predicates before querying a data warehouse for star connection query, thereby reducing global sensitivity.
The scheme of reference [2] proposes to optimize a linear counting query set under differential privacy by using a matrix mechanism, and compared with reference [2], the method of the invention also optimizes a star connection workload query set by using a matrix mechanism, and is different in that the scheme of reference [2] aims at linear counting query, the method of the invention aims at star connection query, and reference [2] performs noise addition on query response results, and the method of the invention performs noise addition on query.
The invention designs a scheme of firstly partitioning and dimension reduction, adding noise according to a block-to-query mode and then restoring and combining response query aiming at the star-shaped connection workload query under the condition of differential privacy. The workload inquiry is divided into blocks and dimension reduced, so that the correlation of inquiry intervals is explored to the greatest extent. The global sensitivity of the connection query operation is reduced by adopting a mode of disturbing each dimension query interval. And the expansibility is enhanced, and meanwhile, the invention can be disturbed in a parallel mode, so that the calculation cost is reduced. In addition, the invention can achieve better privacy protection effect by controlling the disturbance degree of each dimension.
The embodiment of the invention also provides a star connection workload inquiry privacy protection system, which comprises:
the query set construction module is used for constructing a plurality of star connection queries facing the data warehouse into a star connection workload query set;
the query set blocking module is used for blocking the star-shaped connection workload query set according to the query attribute;
the attribute query strategy acquisition module is used for respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
the attribute query strategy noise adding module is used for respectively adding noise to each attribute query strategy by utilizing a differential privacy mechanism;
the noisy query deducing module is used for deducing noisy queries which are noisy for each attribute query strategy;
and the data warehouse noise adding query module is used for merging and aggregating each noise adding query to obtain a noise adding star-shaped connection workload query set, querying the data warehouse by using the noise adding star-shaped connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
The embodiment of the invention also provides electronic equipment, which comprises: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the star connection workload inquiry privacy protection method.
The embodiment of the invention also provides a computer readable storage medium, wherein at least one instruction is stored in the computer readable storage medium, and the at least one instruction is executed by a processor in electronic equipment to realize the star connection workload inquiry privacy protection method.
The instructions stored in the memory may be partitioned into one or more modules/units, which are stored in a computer-readable storage medium and executed by the processor to perform the star connection workload query privacy preserving method of the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specified function, which describes the execution of the computer program in a server.
The electronic equipment can be a smart phone, a notebook computer, a palm computer, a cloud server and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.
The processor may be a central processing unit (CentraL Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DigitaL SignaL Processor, DSP), application specific integrated circuits (AppLication Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (fierld-ProgrammabLe Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash Card (FLash Card) or the like, which are provided on the server. Further, the memory may also include both an internal storage unit and an external storage device of the server. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content of information interaction and execution process between the above module units is based on the same concept as the method embodiment, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. The star connection workload inquiry privacy protection method is characterized by comprising the following steps of:
forming a star connection workload query set by a plurality of star connection queries facing the data warehouse;
partitioning the star connection workload query set according to query attributes;
respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
respectively carrying out noise adding on each attribute query strategy by utilizing a differential privacy mechanism;
deducing a noisy query which noisy each attribute query policy;
and merging and aggregating each noisy query to obtain a noisy star connection workload query set, querying a data warehouse by using the noisy star connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
2. The star connection workload query privacy protection method according to claim 1, wherein the data warehouse isComprises->Dimension table->And a fact table->Each dimension table is associated with a fact table>Connected, the whole presents a star-shaped connection mode; the->Star connection inquiry->Composing a Star connection workload Inquiry set +.>The method comprises the steps of carrying out a first treatment on the surface of the Star connection inquiry->Screening conditions of (a) for various dimensions, star connection query +.>Expressed in terms of predicate of query as +.>Predicate of any one of queriesRepresentation->In dimension table->Inquiry interval of->For the query scope lower limit, +.>Is the upper limit of the query range.
3. The method according to claim 2, wherein in the step of partitioning the star connection workload query set according to query attributes, each unit of the star connection workload query set W is the first unitStar-shaped linkThe query is in dimension table->Query scope on, star connection workload query setIn the formula->Representing Star connection query in dimension table +.>The query interval referred to above; splitting the star connection workload query set W according to a dimension table of a data warehouse to obtain +.>Individual blocks, respectively->First->Individual block->Denoted as->Then->Indicate->The individual queries are in dimension table->A query interval above.
4. According toThe method for protecting privacy of star connection workload queries according to claim 3, wherein in the step of performing dimension reduction on each block of the partitioned set of star connection workload queries, a query policy is foundSo that there is a solution matrix +.>Block->All query sections in (a) can be queried with query policy +.>The linear combination of the mid-query intervals represents: />,/>
5. The method for protecting query privacy of star-connected workload according to claim 4, wherein the step of using differential privacy mechanism to noise each attribute query policy separately, the query policies areAdding Laplace noise->In the formula->For inquiry policy->Sensitivity of (2),/>For privacy budget, th->Individual block->Is>Dimension interval of individual queries is->,/>Query strategy after perturbation->Middle->The noise adding query interval is->
6. The method of claim 5, wherein the extrapolating a noisy query that is noisy for each attribute query policy is performed by a perturbed query policyAnd solution matrix->Calculating to obtain star-shaped connection workload query set W in dimension table +.>Noise-added query->
7. The method of claim 6, wherein in the step of merging and aggregating each noisy query, the set of star-connected workload queries W is stored in a dimension tableNoise-added query onMerging and aggregating to obtain noisy star connection workload query set +.>Star connection workload query set using noise addition +.>Query data warehouse->Get query result +.>To an untrusted data analyst.
8. A star connection workload query privacy protection system, comprising:
the query set construction module is used for constructing a plurality of star connection queries facing the data warehouse into a star connection workload query set;
the query set blocking module is used for blocking the star-shaped connection workload query set according to the query attribute;
the attribute query strategy acquisition module is used for respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
the attribute query strategy noise adding module is used for respectively adding noise to each attribute query strategy by utilizing a differential privacy mechanism;
the noisy query deducing module is used for deducing noisy queries which are noisy for each attribute query strategy;
and the data warehouse noise adding query module is used for merging and aggregating each noise adding query to obtain a noise adding star-shaped connection workload query set, querying the data warehouse by using the noise adding star-shaped connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
9. An electronic device, comprising:
a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system
A processor executing instructions stored in the memory to implement the star connection workload query privacy preserving method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized by: the computer readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the star connection workload query privacy preserving method of any of claims 1 to 7.
CN202310725211.0A 2023-06-19 2023-06-19 Star-connection workload query privacy protection method, system, equipment and medium Pending CN116451278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310725211.0A CN116451278A (en) 2023-06-19 2023-06-19 Star-connection workload query privacy protection method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310725211.0A CN116451278A (en) 2023-06-19 2023-06-19 Star-connection workload query privacy protection method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN116451278A true CN116451278A (en) 2023-07-18

Family

ID=87136043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310725211.0A Pending CN116451278A (en) 2023-06-19 2023-06-19 Star-connection workload query privacy protection method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116451278A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633902A (en) * 2024-01-25 2024-03-01 杭州世平信息科技有限公司 OLAP star-type connection workload query differential privacy protection method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN105955999A (en) * 2016-04-20 2016-09-21 华中科技大学 Large scale RDF graph Thetajoin query processing method
CN115858579A (en) * 2022-10-28 2023-03-28 杭州世平信息科技有限公司 Data warehouse star connection query method, system and medium based on differential privacy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN105955999A (en) * 2016-04-20 2016-09-21 华中科技大学 Large scale RDF graph Thetajoin query processing method
CN115858579A (en) * 2022-10-28 2023-03-28 杭州世平信息科技有限公司 Data warehouse star connection query method, system and medium based on differential privacy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHAO LI 等: "The matrix mechanism: optimizing linear counting queries under differential privacy", THE VLDB JOURNAL, pages 1 - 25 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633902A (en) * 2024-01-25 2024-03-01 杭州世平信息科技有限公司 OLAP star-type connection workload query differential privacy protection method and system

Similar Documents

Publication Publication Date Title
US10346404B2 (en) Efficient partitioned joins in a database with column-major layout
US10509772B1 (en) Efficient locking of large data collections
US10467433B2 (en) Event processing system
EP2646928A1 (en) Systems and methods for performing a nested join operation
US11775656B2 (en) Secure multi-party information retrieval
US8849837B2 (en) Configurable dynamic matching system
Gionis et al. k-Anonymization revisited
EP3887993A1 (en) Differentially private database permissions system
EP2590088B1 (en) Database queries enriched in rules
CN116451278A (en) Star-connection workload query privacy protection method, system, equipment and medium
US10810458B2 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
JP2017215868A (en) Anonymization processor, anonymization processing method, and program
US20230359769A1 (en) Systems and Methods for Anonymizing Large Scale Datasets
Xu et al. Efficient similarity join based on Earth mover’s Distance using Mapreduce
EP2577494A1 (en) Graph authorization
US10657126B2 (en) Meta-join and meta-group-by indexes for big data
US20220253457A1 (en) Block Generation Control Method Applied to Blockchain and Related Apparatus
Xu et al. A multi‐dimensional index for privacy‐preserving queries in cloud computing
Huang et al. Orthogonal mechanism for answering batch queries with differential privacy
CN112598507A (en) Excessive credit granting risk prediction system and method based on knowledge graph
CN111813761A (en) Database management method and device and computer storage medium
JP6973636B2 (en) Safety assessment equipment, safety assessment methods, and programs
CN117633902A (en) OLAP star-type connection workload query differential privacy protection method and system
Wang et al. Multi-dimensional k-anonymity Based on Mapping for Protecting Privacy.
US20220277110A1 (en) Secure computation system, secure computation method, and secure computation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230718

RJ01 Rejection of invention patent application after publication