CN116451278A - Star-connection workload query privacy protection method, system, equipment and medium - Google Patents
Star-connection workload query privacy protection method, system, equipment and medium Download PDFInfo
- Publication number
- CN116451278A CN116451278A CN202310725211.0A CN202310725211A CN116451278A CN 116451278 A CN116451278 A CN 116451278A CN 202310725211 A CN202310725211 A CN 202310725211A CN 116451278 A CN116451278 A CN 116451278A
- Authority
- CN
- China
- Prior art keywords
- query
- star connection
- workload
- star
- noisy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000035945 sensitivity Effects 0.000 claims abstract description 36
- 230000007246 mechanism Effects 0.000 claims abstract description 19
- 230000009467 reduction Effects 0.000 claims abstract description 16
- 230000004931 aggregating effect Effects 0.000 claims abstract description 10
- 238000000638 solvent extraction Methods 0.000 claims abstract description 6
- 238000003860 storage Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000000903 blocking effect Effects 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000004378 air conditioning Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 9
- 230000008569 process Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A star connection workload inquiry privacy protection method, a system, equipment and a medium, wherein the privacy protection method comprises the following steps: forming a star connection workload query set by a plurality of star connection queries facing the data warehouse; partitioning the star connection workload query set according to query attributes; respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy; respectively carrying out noise adding on each attribute query strategy by utilizing a differential privacy mechanism; deducing a noisy query which noisy each attribute query policy; and merging and aggregating each noisy query to obtain a noisy star connection workload query set, querying a data warehouse by using the noisy star connection workload query set, and transmitting the obtained query result to an untrusted data analyzer. The invention can search the relativity of the query interval to the greatest extent, reduce the global sensitivity of the connection query operation, effectively strengthen the expansibility and reduce the calculation cost.
Description
Technical Field
The invention belongs to the technical field of data analysis privacy protection, and particularly relates to a star connection workload inquiry privacy protection method, system, equipment and medium.
Background
In recent years, with the generation of "blowout" of data, in the digital economic age, data becomes a new key production element, meaning and value of the data are not in the data itself, but in that analysis results of the data can become a new driver of enterprise business decisions. The enterprise serves as a service side, the service quality is improved by collecting data analysis data, and the user experience is improved. The data analyzer sends out a query request to the trusted server, and the server responds and returns a query result. Therefore, the unreliable data analyst is very likely to infer the privacy information of the user from a plurality of query results, even mine the user information deeply contained in the data, and the risk of privacy disclosure exists. The main reason for privacy disclosure is that the server returns its completely true results directly from the query request of the data analyst. In order to solve the problem of privacy disclosure, the differential privacy model can not only avoid the inference of an untrusted data analyst by denoising the results, but also disregard the background knowledge of the analyst. While star-connected queries are one of the typical applications of relational databases, their related research efforts for differential privacy are relatively few.
The main problem of differential privacy application to star connection workload queries is how to respond to star connection workload queries while protecting user privacy. However, star connection queries are highly sensitive to data, and the effect of adding or deleting a piece of data on the star connection query is related to the number of dimension tables, O (N (N-1)), where N is a threshold value for each dimension, and N represents the number of dimension tables (reference [1 ]). The scheme of reference [1] proposes that for an n-way join query, the magnitude of its sensitivity increases exponentially with the change in n, thus decreasing its magnitude by a variant of local sensitivity, but the star join query is a join query with a large number of foreign key constraints, and the scheme of reference 1 cannot be directly applied. In star connection queries, one fact table is connected with a plurality of dimension tables, and a plurality of dimension tables are involved, and a workload query is usually composed of a large number of single star connection queries, so that the differential privacy scheme of the workload query under the star connection queries often has the problems of high global sensitivity and high noise, and therefore, the availability of data is often low.
Aiming at workload inquiry, the current differential privacy scheme generally adopts a matrix mechanism aiming at the problem of overlarge global sensitivity, firstly performs dimension reduction according to an inquiry matrix to form an inquiry strategy, effectively reduces the correlation between inquiry, generally has sensitivity of 1, then performs noise addition on the response result of the inquiry strategy, and deduces the noise addition result of the original inquiry response (reference [2 ]). The solution of reference [2] proposes to optimize the set of linear count queries under differential privacy using a matrix mechanism. For the problem of overlarge global sensitivity of star connection query, currently, a differential privacy scheme generally adopts a method for solving local sensitivity (such as local sensitivity, smooth sensitivity, elastic sensitivity, residual sensitivity and the like) as an alternative scheme for approximating global sensitivity. However, these computing approaches still have some drawbacks: local sensitivity depends on data, so differential privacy conditions cannot be met when used to verify noisy operations; the smoothed sensitivity, the elastic sensitivity and the residual sensitivity are calculated on the upper limit value of the local sensitivity, and the value of the smoothed sensitivity is larger although the calculation cost is smaller, wherein the value of the smoothed sensitivity is minimum but the calculation cost is high. Thus, these approaches to global sensitivity are not directly applicable to star connection workload queries, and a differential privacy approach more suitable for star connection workload queries may need to be explored to address this issue.
Meanwhile, the star connection workload inquiry can involve a plurality of dimension tables, sensitivity can be increased along with expansion of the dimension tables, noise is increased along with the expansion of the dimension tables, and availability of inquiry results is reduced rapidly. In addition, the differential privacy scheme for star connection queries is generally based on the process of connection first, then query last, adding noise, the addition of dimension tables can result in high computational overhead. However, the above scheme only considers the calculation mode of sensitivity, and cannot solve the problem caused by the dimension table expansion.
[1] Wei Dong and KeYi. A nearly instance-optimal differentially private mechanism for conjunctive queries. In PODS 2022.
[2] Hay, Michael, McGregor, et al. The matrix mechanism: optimizing linear counting queries under differential privacy[J]. Vldb Journal the International Journal of Very Large Data Bases, 2015.
Disclosure of Invention
Aiming at the problems of low availability, poor expansibility and huge calculation cost of query results in the prior art, the invention provides a star-type connection workload query privacy protection method, a system, equipment and a medium, which are used for searching the correlation of a query interval to the greatest extent, reducing the global sensitivity of connection query operation, effectively enhancing the expansibility and reducing the calculation cost.
In order to achieve the above purpose, the present invention has the following technical scheme:
in a first aspect, a method for protecting query privacy of a star connection workload is provided, including the following steps:
forming a star connection workload query set by a plurality of star connection queries facing the data warehouse;
partitioning the star connection workload query set according to query attributes;
respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
respectively carrying out noise adding on each attribute query strategy by utilizing a differential privacy mechanism;
deducing a noisy query which noisy each attribute query policy;
and merging and aggregating each noisy query to obtain a noisy star connection workload query set, querying a data warehouse by using the noisy star connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
As a preferred embodiment, the data warehouse isComprises->Dimension table->And a fact table->Each dimension table is associated with a fact table>Connected, the whole presents a star-shaped connection mode; the->Star connection inquiry->Composing a Star connection workload Inquiry set +.>The method comprises the steps of carrying out a first treatment on the surface of the Star connection inquiry->Screening conditions of (a) for various dimensions, star connection query +.>Expressed in terms of predicate of query as +.>Predicate of any one of the queries +.>Representation->In dimension table->Inquiry interval of->For the query scope lower limit, +.>Is the upper limit of the query range.
In a preferred embodiment, in the step of partitioning the star connection workload query set according to query attributes, each unit of the star connection workload query set W is the first unitThe individual star connection queries are in dimension table +.>Query scope on the star connection workload query set +.>In the formula->Representing Star connection query in dimension table +.>The query interval referred to above; splitting the star connection workload query set W according to a dimension table of a data warehouse to obtain +.>Individual blocks, respectively->First->Individual block->Denoted as->Then->Indicate->The individual queries are in dimension tablesA query interval above.
As a preferable scheme, in the step of respectively carrying out dimension reduction on each block of the partitioned star connection workload query set, a query strategy is foundSo that there is a solution matrix +.>Block->All query sections in (a) can be queried with query policy +.>The linear combination of the mid-query intervals represents: />,/>。
As a preferred scheme, the step of using the differential privacy mechanism to noise each attribute query policy respectively, and the query policy isAdd-on pullLaplace noise->In the formula->For inquiry policy->Sensitivity of->For privacy budget, th->Individual block->Is>Dimension interval of individual queries is->,/>Query strategy after perturbation->Middle->The noise adding query interval is->:
。
As a preferable scheme, when the deducing the noisy query which is noisy for each attribute query strategy, the noisy query is processed through the perturbed query strategyAnd solution matrix->Calculating to obtain star-shaped connection workload query set W in dimension table +.>Noise-added query->。
As a preferable scheme, in the step of merging and aggregating each noisy query, the star connection workload query set W is arranged in a dimension tableNoise-added query->Merging and aggregating to obtain noisy star-type connection workload query setStar connection workload query set using noise addition +.>Query data warehouse->Get query result +.>To an untrusted data analyst.
In a second aspect, a star connection workload query privacy protection system is provided, comprising:
the query set construction module is used for constructing a plurality of star connection queries facing the data warehouse into a star connection workload query set;
the query set blocking module is used for blocking the star-shaped connection workload query set according to the query attribute;
the attribute query strategy acquisition module is used for respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
the attribute query strategy noise adding module is used for respectively adding noise to each attribute query strategy by utilizing a differential privacy mechanism;
the noisy query deducing module is used for deducing noisy queries which are noisy for each attribute query strategy;
and the data warehouse noise adding query module is used for merging and aggregating each noise adding query to obtain a noise adding star-shaped connection workload query set, querying the data warehouse by using the noise adding star-shaped connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
In a third aspect, there is provided an electronic device comprising: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the star connection workload inquiry privacy protection method.
In a fourth aspect, a computer readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the star connection workload query privacy preserving method is provided.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention blocks a star connection workload query set according to query attributes, sections under the same query dimension have similarity, dimension sections under a single query dimension are designed to be reduced, and the correlation of the query sections is explored to the greatest extent. Aiming at the situation that the global sensitivity of the star-shaped connection query differential privacy scheme is overlarge, the method adopts a mode of disturbing each dimension query interval, and reduces the global sensitivity of the connection query operation. And the expansibility is enhanced, and meanwhile, the invention can be disturbed in a parallel mode, so that the calculation cost is reduced. In addition, the invention can achieve better privacy protection effect by controlling the disturbance degree of each dimension.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of star connection of a data warehouse in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a scenario application of a star connection workload query privacy protection method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for protecting query privacy of a star connection workload according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Suppose data warehouseComprises->Dimension table->And a fact table->Each dimension table is constrained with the fact table by an external key +.>Connected, the whole presents a star connection mode, as shown in figure 1. Star connection queries generally refer to queries performed on data warehouse in star connection mode, which are types of queries commonly found in data warehouse applications. The screening conditions for star connection queries are typically directed to various dimension tables, and then aggregation is performed on the fact tables. The large number of star connection queries form a set of star connection workload queries. The data involved in the query contains private information,in practical applications, the sensitive information must be protected, and differential privacy has become a popular scheme for protecting data due to the strict privacy guarantee provided by the differential privacy. The differential privacy avoids the risk of privacy disclosure in a noise adding mode, and the common differential privacy mechanism is a Laplacian mechanism, namely, the query result is added with a Laplacian distribution obeying mode +.>The Laplace distribution size of which is positively correlated with the query sensitivity.
The invention constructs a workload inquiry set by a large number of inquiry under star connection facing to the data warehouse, blocks the inquiry set according to inquiry attributes, reduces the dimension of each block to form a single attribute inquiry strategy, deduces the original attribute noisy inquiry by adding noise to the single attribute inquiry strategy, and finally combines to obtain the noisy workload inquiry set. As shown in FIG. 2, the scenario application of the technical scheme of the invention is that an untrusted data analyzer provides a large number of star connection queries, namely a star connection workload query set W, to a server, and the server performs differential privacy processing to return a noisy response result to the data analyzer so as to achieve the effect of privacy protection. The main process of the differential privacy scheme is that the query strategy is divided into single attribute according to the blocking dimension reduction of the query set, then the single attribute query strategy is denoised, the denoised original attribute query is deduced, finally the query is combined, the whole process is shown in figure 3, when the server receives the star connection workload query set W, the server firstly blocks W into columns (dimension tables)Generating inquiry strategies according to block dimension reduction>And then the differential privacy mechanism is utilized to respectively treat +.>Adding noise, and deducing ++according to the dimension-reducing formula>Noise-added query->Finally, will->Merge into noisy workload query>And responds to it to obtain the final perturbation result and sends the result to the untrusted data analyst.
The star connection workload inquiry privacy protection method of the embodiment of the invention mainly comprises the following four processes:
a) Blocking, i.e.,/>Is the number of dimensions.
b) Dimension reduction by block, i.e,/>Is a query strategy after dimension reduction.
c) By block query noise addition, i.e。
d) Block-wise inferred noisy queries, i.e。
e) Merging aggregate noisy queries,/>。
Suppose a data warehouseIs->Dimension table->The threshold size of each dimension table attribute is set to +.>. The data analyst sends +_ to the server>Star connection inquiry->I.e. star connection workload query set +.>. Query due to star connection>Screening conditions of (a) for various dimensions, star connection query +.>Predicate representation of a query can be usedWherein->Representation->In dimension table->Query section (+)>For the query scope lower limit, +.>Upper limit of query range). Thus, query set W is composed of +.>Star connection inquiryqIs composed of query predicates. For example, the +.>Star connection inquiry->By->Dimension interval formation of individual predicates +.>The element of the query set W may be denoted +.>I.e.>The individual star connection queries are in dimension table +.>Query scope on the table. If dimension table->In the query dimension, then->Taking the query dimension interval, if the dimension table +.>Not in query dimension, interval +.>Middle->Is the minimum value of attribute threshold, +.>Is the maximum value of the attribute domain value. Each row of the working query set W represents a single star connection query +.>Each column represents all query intervals of a certain dimension tableC。
Each of these processes is further described below:
a) Block division
The star connection workload query set W is partitioned. Each unit of W isThe individual star connection queries are in dimension table +.>Query scope above, query set->Wherein->Representing all star-connected queries in query set W in dimension table +.>The query section referred to above. Due to data warehouse->Each dimension table is mutually independent, the query set W can be split according to the dimension table, and the +.>Individual blocks, i.e.)>. First->Individual block->Can be expressed asThen->Indicate->The individual queries are in dimension table->A query interval above.
b) Dimension reduction by blocks
Because the dimension tables are independent of each other, query predicates for each dimension tableDifferential privacy processing is performed. Whereas queries in the star-connected workload query set W tend to be related, i.e. +.>Is->There is often a correlation and thus the addition of noise can be reduced by means of dimension reduction. Defining a set of queries, i.e. query policies +.>,/>All query intervals in (a) can be used +.>A linear combination representation of the mid-query interval. I.e. find a query strategy +.>So that there is a solution matrix +.>Can get->,/>。
c) Per block query noise addition
Can be obtained by dimension reductionCorresponding query policy->Query policy->Implementing the Laplace mechanism to satisfy differential privacy, i.e. +.>Adding Laplace noise->。/>For inquiry policy->Is used for the sensitivity of (a). Because of this->The dimension tables are independent of each other, and for each dimension, a record change affects +.>Intervals, thus each query policy +.>Is all of global sensitivityN. First->Individual block->Is>Dimension interval of individual queries is->. The privacy cost is evenly divided into two parts which are respectively used for the left end point and the right end point, and the inquiry strategy after disturbance is +.>Middle->The noise adding query interval is->Wherein。
d) Block-wise inferred noisy queries
Due to the presence of a solution matrixSo that->Query strategy by adding noise->Deducing noisy query predicates>I.e. star connection workload query set W is +.>The noisy predicate on the dimension table is +.>。
e) Merging aggregate noisy queries
Each is put intoMerging to get noisy star connection workload query +.>Querying a data warehouse with the set of queries +.>And get the final query result +.>To an untrusted data analyst.
The problems mainly existing in the existing star connection workload inquiry include: (1) query results have low availability. The data is highly sensitive to star connection query operation, excessive noise is introduced to query results, meanwhile, the workload queries often have correlation, the sensitivity is increased again, the noise is increased, and the availability is low. (2) poor expansibility. Star connection queries typically include connections between multiple dimension tables, and when the number of dimension tables is increased, the time complexity required for performing sensitivity calculations also increases exponentially, so current solutions face certain limitations in processing large star connection queries. (3) the computational overhead is enormous. The existing differential privacy scheme generally performs connection operation of the dimension table, and then adds noise for protection. Since star connection queries typically involve multiple dimension tables, the computational overhead tends to be very high.
The scheme of reference [1] proposes that for an n-way join query, the magnitude of its sensitivity increases exponentially with the change in n, thus decreasing its magnitude by a variant of local sensitivity, but the star join query is a join query with a large number of foreign key constraints, and cannot be directly applied to the scheme of [1 ]. Compared with reference [1], the method of the invention provides a differential privacy mechanism for adding noise to query predicates before querying a data warehouse for star connection query, thereby reducing global sensitivity.
The scheme of reference [2] proposes to optimize a linear counting query set under differential privacy by using a matrix mechanism, and compared with reference [2], the method of the invention also optimizes a star connection workload query set by using a matrix mechanism, and is different in that the scheme of reference [2] aims at linear counting query, the method of the invention aims at star connection query, and reference [2] performs noise addition on query response results, and the method of the invention performs noise addition on query.
The invention designs a scheme of firstly partitioning and dimension reduction, adding noise according to a block-to-query mode and then restoring and combining response query aiming at the star-shaped connection workload query under the condition of differential privacy. The workload inquiry is divided into blocks and dimension reduced, so that the correlation of inquiry intervals is explored to the greatest extent. The global sensitivity of the connection query operation is reduced by adopting a mode of disturbing each dimension query interval. And the expansibility is enhanced, and meanwhile, the invention can be disturbed in a parallel mode, so that the calculation cost is reduced. In addition, the invention can achieve better privacy protection effect by controlling the disturbance degree of each dimension.
The embodiment of the invention also provides a star connection workload inquiry privacy protection system, which comprises:
the query set construction module is used for constructing a plurality of star connection queries facing the data warehouse into a star connection workload query set;
the query set blocking module is used for blocking the star-shaped connection workload query set according to the query attribute;
the attribute query strategy acquisition module is used for respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
the attribute query strategy noise adding module is used for respectively adding noise to each attribute query strategy by utilizing a differential privacy mechanism;
the noisy query deducing module is used for deducing noisy queries which are noisy for each attribute query strategy;
and the data warehouse noise adding query module is used for merging and aggregating each noise adding query to obtain a noise adding star-shaped connection workload query set, querying the data warehouse by using the noise adding star-shaped connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
The embodiment of the invention also provides electronic equipment, which comprises: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the star connection workload inquiry privacy protection method.
The embodiment of the invention also provides a computer readable storage medium, wherein at least one instruction is stored in the computer readable storage medium, and the at least one instruction is executed by a processor in electronic equipment to realize the star connection workload inquiry privacy protection method.
The instructions stored in the memory may be partitioned into one or more modules/units, which are stored in a computer-readable storage medium and executed by the processor to perform the star connection workload query privacy preserving method of the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specified function, which describes the execution of the computer program in a server.
The electronic equipment can be a smart phone, a notebook computer, a palm computer, a cloud server and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.
The processor may be a central processing unit (CentraL Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DigitaL SignaL Processor, DSP), application specific integrated circuits (AppLication Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (fierld-ProgrammabLe Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash Card (FLash Card) or the like, which are provided on the server. Further, the memory may also include both an internal storage unit and an external storage device of the server. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content of information interaction and execution process between the above module units is based on the same concept as the method embodiment, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (10)
1. The star connection workload inquiry privacy protection method is characterized by comprising the following steps of:
forming a star connection workload query set by a plurality of star connection queries facing the data warehouse;
partitioning the star connection workload query set according to query attributes;
respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
respectively carrying out noise adding on each attribute query strategy by utilizing a differential privacy mechanism;
deducing a noisy query which noisy each attribute query policy;
and merging and aggregating each noisy query to obtain a noisy star connection workload query set, querying a data warehouse by using the noisy star connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
2. The star connection workload query privacy protection method according to claim 1, wherein the data warehouse isComprises->Dimension table->And a fact table->Each dimension table is associated with a fact table>Connected, the whole presents a star-shaped connection mode; the->Star connection inquiry->Composing a Star connection workload Inquiry set +.>The method comprises the steps of carrying out a first treatment on the surface of the Star connection inquiry->Screening conditions of (a) for various dimensions, star connection query +.>Expressed in terms of predicate of query as +.>Predicate of any one of queriesRepresentation->In dimension table->Inquiry interval of->For the query scope lower limit, +.>Is the upper limit of the query range.
3. The method according to claim 2, wherein in the step of partitioning the star connection workload query set according to query attributes, each unit of the star connection workload query set W is the first unitStar-shaped linkThe query is in dimension table->Query scope on, star connection workload query setIn the formula->Representing Star connection query in dimension table +.>The query interval referred to above; splitting the star connection workload query set W according to a dimension table of a data warehouse to obtain +.>Individual blocks, respectively->First->Individual block->Denoted as->Then->Indicate->The individual queries are in dimension table->A query interval above.
4. According toThe method for protecting privacy of star connection workload queries according to claim 3, wherein in the step of performing dimension reduction on each block of the partitioned set of star connection workload queries, a query policy is foundSo that there is a solution matrix +.>Block->All query sections in (a) can be queried with query policy +.>The linear combination of the mid-query intervals represents: />,/>。
5. The method for protecting query privacy of star-connected workload according to claim 4, wherein the step of using differential privacy mechanism to noise each attribute query policy separately, the query policies areAdding Laplace noise->In the formula->For inquiry policy->Sensitivity of (2),/>For privacy budget, th->Individual block->Is>Dimension interval of individual queries is->,/>Query strategy after perturbation->Middle->The noise adding query interval is->:
。
6. The method of claim 5, wherein the extrapolating a noisy query that is noisy for each attribute query policy is performed by a perturbed query policyAnd solution matrix->Calculating to obtain star-shaped connection workload query set W in dimension table +.>Noise-added query->。
7. The method of claim 6, wherein in the step of merging and aggregating each noisy query, the set of star-connected workload queries W is stored in a dimension tableNoise-added query onMerging and aggregating to obtain noisy star connection workload query set +.>Star connection workload query set using noise addition +.>Query data warehouse->Get query result +.>To an untrusted data analyst.
8. A star connection workload query privacy protection system, comprising:
the query set construction module is used for constructing a plurality of star connection queries facing the data warehouse into a star connection workload query set;
the query set blocking module is used for blocking the star-shaped connection workload query set according to the query attribute;
the attribute query strategy acquisition module is used for respectively carrying out dimension reduction on each block of the partitioned star connection workload query set to form a single attribute query strategy;
the attribute query strategy noise adding module is used for respectively adding noise to each attribute query strategy by utilizing a differential privacy mechanism;
the noisy query deducing module is used for deducing noisy queries which are noisy for each attribute query strategy;
and the data warehouse noise adding query module is used for merging and aggregating each noise adding query to obtain a noise adding star-shaped connection workload query set, querying the data warehouse by using the noise adding star-shaped connection workload query set, and transmitting the obtained query result to an untrusted data analyzer.
9. An electronic device, comprising:
a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system
A processor executing instructions stored in the memory to implement the star connection workload query privacy preserving method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized by: the computer readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the star connection workload query privacy preserving method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310725211.0A CN116451278A (en) | 2023-06-19 | 2023-06-19 | Star-connection workload query privacy protection method, system, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310725211.0A CN116451278A (en) | 2023-06-19 | 2023-06-19 | Star-connection workload query privacy protection method, system, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116451278A true CN116451278A (en) | 2023-07-18 |
Family
ID=87136043
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310725211.0A Pending CN116451278A (en) | 2023-06-19 | 2023-06-19 | Star-connection workload query privacy protection method, system, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116451278A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117633902A (en) * | 2024-01-25 | 2024-03-01 | 杭州世平信息科技有限公司 | OLAP star-type connection workload query differential privacy protection method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309958A (en) * | 2013-05-28 | 2013-09-18 | 中国人民大学 | OLAP star connection query optimizing method under CPU and GPU mixing framework |
CN105955999A (en) * | 2016-04-20 | 2016-09-21 | 华中科技大学 | Large scale RDF graph Thetajoin query processing method |
CN115858579A (en) * | 2022-10-28 | 2023-03-28 | 杭州世平信息科技有限公司 | Data warehouse star connection query method, system and medium based on differential privacy |
-
2023
- 2023-06-19 CN CN202310725211.0A patent/CN116451278A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309958A (en) * | 2013-05-28 | 2013-09-18 | 中国人民大学 | OLAP star connection query optimizing method under CPU and GPU mixing framework |
CN105955999A (en) * | 2016-04-20 | 2016-09-21 | 华中科技大学 | Large scale RDF graph Thetajoin query processing method |
CN115858579A (en) * | 2022-10-28 | 2023-03-28 | 杭州世平信息科技有限公司 | Data warehouse star connection query method, system and medium based on differential privacy |
Non-Patent Citations (1)
Title |
---|
CHAO LI 等: "The matrix mechanism: optimizing linear counting queries under differential privacy", THE VLDB JOURNAL, pages 1 - 25 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117633902A (en) * | 2024-01-25 | 2024-03-01 | 杭州世平信息科技有限公司 | OLAP star-type connection workload query differential privacy protection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10346404B2 (en) | Efficient partitioned joins in a database with column-major layout | |
US10509772B1 (en) | Efficient locking of large data collections | |
US10467433B2 (en) | Event processing system | |
EP2646928A1 (en) | Systems and methods for performing a nested join operation | |
US11775656B2 (en) | Secure multi-party information retrieval | |
US8849837B2 (en) | Configurable dynamic matching system | |
Gionis et al. | k-Anonymization revisited | |
EP3887993A1 (en) | Differentially private database permissions system | |
EP2590088B1 (en) | Database queries enriched in rules | |
CN116451278A (en) | Star-connection workload query privacy protection method, system, equipment and medium | |
US10810458B2 (en) | Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors | |
JP2017215868A (en) | Anonymization processor, anonymization processing method, and program | |
US20230359769A1 (en) | Systems and Methods for Anonymizing Large Scale Datasets | |
Xu et al. | Efficient similarity join based on Earth mover’s Distance using Mapreduce | |
EP2577494A1 (en) | Graph authorization | |
US10657126B2 (en) | Meta-join and meta-group-by indexes for big data | |
US20220253457A1 (en) | Block Generation Control Method Applied to Blockchain and Related Apparatus | |
Xu et al. | A multi‐dimensional index for privacy‐preserving queries in cloud computing | |
Huang et al. | Orthogonal mechanism for answering batch queries with differential privacy | |
CN112598507A (en) | Excessive credit granting risk prediction system and method based on knowledge graph | |
CN111813761A (en) | Database management method and device and computer storage medium | |
JP6973636B2 (en) | Safety assessment equipment, safety assessment methods, and programs | |
CN117633902A (en) | OLAP star-type connection workload query differential privacy protection method and system | |
Wang et al. | Multi-dimensional k-anonymity Based on Mapping for Protecting Privacy. | |
US20220277110A1 (en) | Secure computation system, secure computation method, and secure computation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230718 |
|
RJ01 | Rejection of invention patent application after publication |