CN117633902A - OLAP star-type connection workload query differential privacy protection method and system - Google Patents
OLAP star-type connection workload query differential privacy protection method and system Download PDFInfo
- Publication number
- CN117633902A CN117633902A CN202410105845.0A CN202410105845A CN117633902A CN 117633902 A CN117633902 A CN 117633902A CN 202410105845 A CN202410105845 A CN 202410105845A CN 117633902 A CN117633902 A CN 117633902A
- Authority
- CN
- China
- Prior art keywords
- query
- range
- workload
- dimension
- differential privacy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 52
- 230000009466 transformation Effects 0.000 claims abstract description 22
- 238000003860 storage Methods 0.000 claims abstract description 12
- 230000004044 response Effects 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims description 34
- 230000008569 process Effects 0.000 claims description 8
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims 2
- 230000004931 aggregating effect Effects 0.000 claims 1
- 230000035945 sensitivity Effects 0.000 abstract description 33
- 238000004364 calculation method Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 5
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An OLAP star connection workload inquiry differential privacy protection method and system, the method comprises the following steps: dividing a star-shaped connected workload query set according to dimensions to obtain a query range in each dimension; encoding the query range in each dimension to obtain a range vector corresponding to each query range; performing approximate transformation on the range vector, adding noise meeting the differential privacy, and then reconstructing to obtain a noisy range vector; and converging and merging the noisy range vectors according to the dimensions to obtain a noisy workload query set for response. The invention also discloses an OLAP star-type connection workload inquiry differential privacy protection system, electronic equipment and a computer readable storage medium. The invention utilizes the independence of star connection dimensionality, reduces the global sensitivity of the workload query set, enhances expansibility, reduces calculation cost, and can achieve better privacy protection effect by controlling the disturbance degree of each dimensionality.
Description
Technical Field
The invention belongs to the technical field of data privacy and security, and particularly relates to an OLAP star connection workload inquiry differential privacy protection method and system.
Background
In the digital economic age, data become new key production factors, the meaning and value of the data are not in the data, but in that the analysis result of the data can become the basis of enterprise business decisions. The enterprise serves as a service side to improve service quality and user experience by collecting data and analyzing the data. The data analyzer sends out a query request to the trusted server, and the server responds and returns a query result. Therefore, an untrusted data analyzer is very likely to infer the privacy information of the client from a plurality of query results, and even mine the client information deeply contained in the data, so that the risk of privacy disclosure exists. The main reason for privacy disclosure is that the server returns its completely true results directly from the query request of the data analyst.
On-line analytical processing (OLAP) enables analysts, administrators, or executives to access information converted from raw data, which is truly understandable to the user, and truly reflects the properties of the enterprise dimension, from a variety of angles, quickly, consistently, and interactively, thereby obtaining a greater understanding of the data. OLAP can also be said to be a collection of multidimensional data analysis tools. Star connection queries generally refer to queries performed on data warehouses in star connection mode, which are a common query task in OLAP. The screening conditions for star connection queries are typically directed to various dimension tables, and then aggregation is performed on the fact tables. The large number of star connection queries form a set of star connection workload queries. Star connection queries are highly sensitive to data, and the addition or deletion of a piece of data affects the results of the star connection query in relation to the number of dimension tables. In star connection queries, one fact table is connected with a plurality of dimension tables, and a plurality of dimension tables are involved, and a workload query is usually composed of a large number of single star connection queries, so that the problem of high global sensitivity and high noise often exists in a differential privacy scheme of the workload query under the star connection queries, and therefore, the availability of data is low.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provide an OLAP star-type connection workload inquiry differential privacy protection method and system, which can reduce the global sensitivity of a workload inquiry set, strengthen expansibility, reduce calculation cost and achieve better privacy protection effect by controlling the disturbance degree of each dimension.
In order to achieve the above purpose, the present invention has the following technical scheme:
in a first aspect, an OLAP star connection workload query differential privacy protection method is provided, including:
dividing a star-shaped connected workload query set according to dimensions to obtain a query range in each dimension;
encoding the query range in each dimension to obtain a range vector corresponding to each query range;
performing approximate transformation on the range vector, adding noise meeting the differential privacy, and then reconstructing to obtain a noisy range vector;
and converging and merging the noisy range vectors according to the dimensions to obtain a noisy workload query set for response.
As a preferred scheme, the workload query set of the star connection is composed ofmThe query composition is that after the workload query set of the star connection is divided according to the dimensions, each dimension existsmA plurality of query ranges;
for data warehouseComprises->A plurality of dimension tables, each dimension table having a threshold value of +.>By the following constitutionmWorkload query set of individual queries constituting a star connection +.>The method comprises the steps of carrying out a first treatment on the surface of the Inquiry->For various dimensions, a query +.>Expressed as +.>Wherein->Representation->In dimension table->Is>For the query scope lower limit, +.>Is the upper limit of the query range; star connected workload query set +.>By->Personal inquiryqIs composed of query predicates of->。
As a preferred solution, the partitioning the workload query set of the star connection according to dimensions, to obtain a query scope in each dimension includes:
dividing a workload query set of star connection according to dimensions to obtainnWith a size ofmIs set of query ranges, i.eWherein->Expressed in dimension table->Upper part of the cylindermA range of queries.
As a preferred solution, the encoding the query scope in each dimension to obtain a scope vector corresponding to each query scope includes:
for a pair ofnThe dimensions are processed by adopting a unitary encoding mode, each query range is encoded into a matrix composed of {0,1}, and a range vector is obtained by adopting the unitary encoding mode for each query range,kRepresenting dimension tablesiThe threshold size of the attribute.
Preferably, the step of performing approximate transformation on the range vector is performed by performing approximate transformation on the range vectorSingular value decomposition is performed to obtain two orthogonal matrices and a diagonal matrix composed of singular values, i.eWherein->And->Is an orthogonal matrix, < >>Is a diagonal array of singular values,。
as a preferred solution, the adding noise that satisfies the differential privacy, and then reconstructing the range vector that is obtained by adding noise includes: for a pair ofsAdding noise disturbance meeting differential privacy to diagonal array composed of singular valuesObtaining a diagonal matrix after disturbance, namely +.>;
Through orthogonal matrix,/>) And diagonal matrix after disturbance->Performing matrix multiplication to obtain a noisy coding matrix, i.e. +.>Obtaining the noise query range by restoring the unitary coding process>I.e.。
As a preferred solution, the performing convergence and merging on the noisy range vector according to dimensions to obtain a noisy workload query set for response includes:
range vector for each dimensionConverging and merging to obtain a noisy star-type connection workload query setI.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the Star connection workload query set using noise addition +.>Query data warehouse->Obtaining the final query result->To an untrusted data analyst.
In a second aspect, an OLAP star connection workload query differential privacy protection system is provided, including:
the dimension segmentation module is used for segmenting the workload query set connected in a star mode according to dimensions to obtain query ranges in each dimension;
the query range coding module is used for coding the query range in each dimension to obtain range vectors corresponding to each query range;
the approximate transformation and reconstruction noise adding module is used for performing approximate transformation on the range vector, adding noise meeting the differential privacy, and then reconstructing to obtain a noisy range vector;
and the convergence and combination module is used for converging and combining the noisy range vectors according to the dimension to obtain a noisy workload query set for response.
In a third aspect, there is provided an electronic device comprising:
a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the OLAP star connection workload inquiry differential privacy protection method.
In a fourth aspect, a computer readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the OLAP star connected workload query differential privacy protection method is provided.
Compared with the prior art, the invention has at least the following beneficial effects:
star connection workload queries consist of a plurality of star connection queries with relevance, and the impact of one data record on such query results is infinite, not limiting the privacy of user data. Furthermore, star connections in OLAP typically involve multiple dimension tables, and thus privacy constraints of user data are related not only to the number of dimension tables, but also to each of the workload queries. Differential privacy schemes for star connected workload queries for OLAP typically suffer from excessive global sensitivity, resulting in low query result availability. According to the invention, disturbance is performed from the aspect of the query range, the query range in each dimension is obtained through dimension segmentation, and the approximate transformation is performed on the range vector, so that the global sensitivity is effectively reduced, and the final query result is obtained through noise adding and reconstruction and then is converged and combined and is sent to an untrusted data analyzer. The method adopts the division coding of the workload inquiry set, and maximally utilizes the independence of star connection dimensions. The global sensitivity of the workload query set is reduced by adopting a mode of performing approximate transformation on the range vector of each dimension and adding disturbance. And expansibility is enhanced, and the method can be used for disturbing in a parallel mode, so that calculation cost is reduced. In addition, the method can achieve better privacy protection effect by controlling the disturbance degree of each dimension.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application scenario of an OLAP star connection workload query differential privacy protection method in an embodiment of the present invention;
FIG. 2 is a flowchart of an OLAP star connection workload query differential privacy protection method in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The differential privacy model can not only avoid inference by the untrusted data analyst by denoising the results, but also disregard the background knowledge of the analyst. Star-type connection queries are one of the typical applications of relational databases, and related studies of differential privacy are relatively few. The main problems of differential privacy application to star connection workload queries are: how to respond to star connection workload queries while protecting user privacy. Aiming at workload inquiry, in the current differential privacy scheme, in order to solve the problem of overlarge global sensitivity, a matrix mechanism is generally adopted, firstly, dimension reduction is carried out according to an inquiry matrix to form an inquiry strategy, the correlation between inquiry is reduced, the sensitivity of the inquiry strategy is generally 1, then, the response result of the inquiry strategy is subjected to noise addition, and the noise addition result of the original inquiry response is deduced. For the problem of excessive global sensitivity of star connection query, currently, a differential privacy scheme generally adopts a method for solving local sensitivity (such as local sensitivity, smooth sensitivity, elastic sensitivity and residual sensitivity) as an alternative scheme for approximating global sensitivity. However, these computing approaches still have some drawbacks: local sensitivity depends on data, so differential privacy conditions cannot be met when used to verify noisy operations; the smoothed sensitivity, the elastic sensitivity and the residual sensitivity are calculated on the upper limit value of the local sensitivity, and the value of the smoothed sensitivity is larger although the calculation cost is smaller, wherein the value of the smoothed sensitivity is minimum but the calculation cost is high. Thus, these approaches to global sensitivity are not directly applicable to star connection workload queries, and a differential privacy approach more suitable for star connection workload queries may need to be explored to address this issue. Meanwhile, the star connection workload inquiry can involve a plurality of dimension tables, sensitivity can be increased along with expansion of the dimension tables, noise is increased along with the expansion of the dimension tables, and availability of inquiry results is reduced rapidly. Meanwhile, the differential privacy scheme for star connection query is generally based on the process of connecting first, querying last and adding noise, and the addition of the dimension table causes high computational overhead. However, the above scheme only considers the calculation mode of sensitivity, and cannot solve the problem caused by the dimension table expansion.
In general, the disadvantages of the prior art are mainly the following three points:
1. query results have low availability. The data is highly sensitive to star connection query operation, excessive noise is introduced to query results, meanwhile, the workload queries often have correlation, the sensitivity is increased again, the noise is increased, and the availability is low.
2. The expansibility is poor. Star connection queries typically include connections between multiple dimension tables, and as the number of dimension tables increases, the time complexity required to perform sensitivity calculations increases exponentially, and thus current solutions face certain limitations in processing large star connection queries.
3. The computational overhead is huge. The existing differential privacy scheme generally performs connection operation of the dimension table, and then adds noise for protection. Since star connection queries typically involve multiple dimension tables, the computational overhead tends to be very high.
Suppose data warehouseComprises->Dimension table->And a fact tableEach dimension table is constrained with the fact table by an external key +.>And is connected with the socket to form a star connection mode.
The data involved in the query contains private information, which in practical applications must be protected, differential privacy has become a popular solution for protecting data privacy because it provides strict privacy guarantees. Differential privacy avoids the risk of privacy disclosure by means of noise addition, and a common differential privacy mechanism is a laplace mechanism, i.e. adding noise to the query result, which obeys the laplace distribution, the size of which is positively correlated with the query sensitivity.
The embodiment of the invention provides an OLAP star-shaped connection workload query differential privacy protection method, which is used for inquiring a large number of single queries of star-shaped connectionmThe method comprises the steps of firstly dividing the workload inquiry set according to dimensions to obtain inquiry ranges on each dimension, namely one dimension existsmQuery scopes, followed by parallel processing of the query scopes in each dimension, willmEncoding the query ranges to obtainmThe range vector is then subjected to approximate transformation to reduce sensitivity, the transformed range vector data is subjected to scrambling, the noisy range vector is reconstructed by inverse transformation, and finally the range vector is subjected to noise reductionnAnd (5) simply converging the noise adding range vectors of the individual dimensions to obtain a noise adding workload query set. The application scene of the invention is shown in figure 1: untrusted data analysts propose a multidimensional set of star connection queries to a serverWDimension-range coding split by serverThe steps of approximate transformation-vector denoising-reconstruction-denoising range-dimension merging-response-denoising workload query set are performed, and the final response result is sent to a data analyzer, and the main process is shown in fig. 2.
The OLAP star-shaped connection workload inquiry differential privacy protection method mainly comprises the following steps:
a) Dividing dimension, namely dividing the workload query set connected in star shape according to dimension to obtain query range in each dimensionWherein, the method comprises the steps of, wherein,min order to query the number of the queries,nfor dimension table number, ++>Represented on a dimensional tablemA plurality of query ranges;
b) Range coding, namely coding the query range of each dimension to obtain range vectors corresponding to the query ranges, so as to obtain the first range vectoriDimension tableFor example, a->Wherein->Consists of a combination of {0,1},krepresenting dimension tablesiThe threshold size of the attribute;
c) The approximate transformation, which is to perform transformation processing, such as singular value decomposition, linear transformation and other approximate transformation schemes, on the vector data after each dimension coding,i.e. singular value decomposition of the coding matrix, < >>And->Is an orthogonal matrix of the type that,/>is a diagonal matrix of singular values, wherein +.>;
d) Data is noisy, noise is added to the diagonal matrix composed of singular values,whereinRepresenting a noise perturbation function that satisfies differential privacy, e.g., noise composed of a laplace distribution;
e) Reconstructing the resulting noisy range vector,and->Wherein->Representing a noise adding query range obtained after the reduction coding;
f) Combining the noisy range vectors to obtain a noisy workload query set,;
g) Accessing data and responding to a workload query set, i.e。
Suppose a data warehouseComprises->A plurality of dimension tables, each dimension table having a threshold value of +.>。/>The individual star-shaped connection queries constitute a workload query set +.>. Query due to star connection>For various dimensions, a star connection query>Can be represented by its predicate set +.>Wherein, the method comprises the steps of, wherein,representation->In dimension table->Is>For the query scope lower limit, +.>Is the upper limit of the query range. Thus, query set->By->Star connection inquiryqIs composed of query predicates of->。
In one possible implementation manner, the specific implementation manner of each step is as follows:
a) Segmentation dimension
Partitioning a workload query set according to dimensions to obtainnWith a size ofmIs a set of query scopes. Because the star-shaped connection each dimension table and the fact table have foreign key constraint and are mutually independent, the query set can be divided according to the dimension to obtainnIndividual modules, i.e.,/>Expressed in dimension table->Upper part of the cylindermA range of queries.
b) Range encoding
For a pair ofnThe dimensions are processed by adopting a monobasic coding mode respectively, the dimensions are coded into a matrix composed of {0,1},contained in this dimensionmA plurality of query scope sets, which are obtained by adopting a unitary coding mode for each query scope. For example, whenm=2 and is 2kWhen=4,>can be encoded as +.>。
c) Approximation transformation
The coding matrix for each dimension is subjected to an approximate transformation, here exemplified by singular value decomposition.
For a pair ofPerforming singular value decomposition to obtain two orthogonal matrices and a diagonal matrix composed of singular values, i.e/>. Wherein->And->Is an orthogonal matrix, < >>The data is characterized by being a diagonal matrix composed of singular values, and the data is mainly characterized in the singular values. Furthermore, the->。
d) Data noise addition
Since the features of the data are embodied in the transformed singular values, the direct pairsNoise perturbation meeting differential privacy for diagonal matrix composed of singular valuesObtaining a diagonal matrix after disturbance, ++>. In addition, due to->Reducing the sensitivity of workload queries and further reducing the noise value when noise perturbations are made.
e) Reconstruction noise adding range
Through orthogonal matrix,/>) And diagonal matrix with added noise->Obtaining a noisy coding matrix by performing matrix multiplication>Obtaining the noise query range by restoring the unitary coding process>I.e.。
f) Dimension aggregation and merging
To each dimensionFusion to get noisy Star connected workload query set +.>,。
g) Responsive to a workload query set
Querying a set with perturbed workloadQuery data warehouse->And obtain the final query resultTo an untrusted data analyst.
According to the OLAP star-shaped connection workload query differential privacy protection method, the problem that the global sensitivity of the current star-shaped connection query differential privacy scheme is overlarge is considered, and a scheme of disturbance from the query range angle is adopted. Considering that workload queries under star-connected queries involve a large number of single queries, the workload queries tend to have relevance from the perspective of single query attribute dimensions, and the workload queries are designed to start from the single query attribute dimensions, perform approximate transformation, reduce the sensitivity of query sets and then denoise the queries. Because each dimension table and the fact table of the star connection workload query set have foreign key constraint and are independent of each other, the invention divides the design of the star connection workload query set into a query range set with single dimension, and encodes the query range set to obtain a range vector on each dimension. The invention furthest utilizes the independence of star connection dimensionality, reduces the global sensitivity of a workload query set, enhances expansibility, reduces calculation cost, and can achieve better privacy protection effect by controlling the disturbance degree of each dimensionality.
Another embodiment of the present invention further provides an OLAP star connection workload query differential privacy protection system, including:
the dimension segmentation module is used for segmenting the workload query set connected in a star mode according to dimensions to obtain query ranges in each dimension;
the query range coding module is used for coding the query range in each dimension to obtain range vectors corresponding to each query range;
the approximate transformation and reconstruction noise adding module is used for performing approximate transformation on the range vector, adding noise meeting the differential privacy, and then reconstructing to obtain a noisy range vector;
and the convergence and combination module is used for converging and combining the noisy range vectors according to the dimension to obtain a noisy workload query set for response.
Another embodiment of the present invention also proposes an electronic device, including: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the OLAP star connection workload inquiry differential privacy protection method.
Another embodiment of the present invention also proposes a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the OLAP star connection workload query differential privacy preserving method.
The instructions stored in the memory may be partitioned into one or more modules/units, which are stored in a computer-readable storage medium and executed by the processor to perform the OLAP star connection workload query differential privacy protection method of the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specified function, which describes the execution of the computer program in a server.
The electronic equipment can be a smart phone, a notebook computer, a palm computer, a cloud server and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.
The processor may be a central processing unit (CentraL Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DigitaL SignaL Processor, DSP), application specific integrated circuits (AppLication Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (fierld-ProgrammabLe Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash Card (FLash Card) or the like, which are provided on the server. Further, the memory may also include both an internal storage unit and an external storage device of the server. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content of information interaction and execution process between the above module units is based on the same concept as the method embodiment, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (10)
1. An OLAP star connection workload query differential privacy protection method, comprising:
dividing a star-shaped connected workload query set according to dimensions to obtain a query range in each dimension;
encoding the query range in each dimension to obtain a range vector corresponding to each query range;
performing approximate transformation on the range vector, adding noise meeting the differential privacy, and then reconstructing to obtain a noisy range vector;
and converging and merging the noisy range vectors according to the dimensions to obtain a noisy workload query set for response.
2. The OLAP star connection workload query differential privacy protection method of claim 1, wherein the star connection workload query set consists ofmThe query composition is that after the workload query set of the star connection is divided according to the dimensions, each dimension existsmA plurality of query ranges;
for data warehouseComprises->A plurality of dimension tables, each dimension table having a threshold value of +.>By the following constitutionmWorkload query set of individual queries constituting a star connection +.>The method comprises the steps of carrying out a first treatment on the surface of the Inquiry->For various dimensions, a query +.>Expressed as +.>Wherein->Representation->In dimension table->Is>For the query scope lower limit, +.>Is the upper limit of the query range; star connected workload query set +.>By->Personal inquiryqIs composed of query predicates of->。
3. The OLAP star connection workload query differential privacy protection method of claim 2, wherein the partitioning the star connection workload query set by dimensions to obtain the query scope in each dimension comprises:
dividing a workload query set of star connection according to dimensions to obtainnWith a size ofmIs set of query ranges, i.eWherein->Expressed in dimension table->Upper part of the cylindermA range of queries.
4. The OLAP star connected workload query differential privacy preserving method of claim 3, wherein the encoding the query scope in each dimension to obtain a scope vector corresponding to each query scope comprises:
for a pair ofnThe dimensions are processed by adopting a unitary encoding mode, each query range is encoded into a matrix composed of {0,1}, and a range vector is obtained by adopting the unitary encoding mode for each query range,kRepresenting dimension tablesiThe threshold size of the attribute.
5. The OLAP star connection workload query differential privacy preserving method of claim 4, wherein the step of approximately transforming the range vector is performed by transforming the range vectorPerforming singular value decomposition to obtain two orthogonal matrices and a diagonal matrix composed of singular values, i.e.)>Wherein->And->Is an orthogonal matrix, < >>Diagonal matrix composed of singular values ++>。
6. The OLAP star connection workload query differential privacy preserving method of claim 5, wherein the adding noise that satisfies differential privacy, and then reconstructing the resulting noisy range vector comprises: for a pair ofsAdding noise disturbance meeting differential privacy to diagonal array composed of singular valuesObtaining a diagonal matrix after disturbance, i.e;
Through orthogonal matrix,/>) And diagonal matrix after disturbance->Performing matrix multiplication to obtain a noisy coding matrix, i.e. +.>Obtaining the noise query range by restoring the unitary coding process>I.e.。
7. The OLAP star connected workload query differential privacy preserving method of claim 6, wherein the aggregating and merging the noisy range vectors according to dimensions to obtain a noisy workload query set response comprises:
range vector for each dimensionConverging and merging to obtain a noisy star-shaped connection workload query set +.>I.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the Star connection workload query set using noise addition +.>Query data warehouse->Obtaining the final query result->To an untrusted data analyst.
8. An OLAP star connection workload query differential privacy protection system, comprising:
the dimension segmentation module is used for segmenting the workload query set connected in a star mode according to dimensions to obtain query ranges in each dimension;
the query range coding module is used for coding the query range in each dimension to obtain range vectors corresponding to each query range;
the approximate transformation and reconstruction noise adding module is used for performing approximate transformation on the range vector, adding noise meeting the differential privacy, and then reconstructing to obtain a noisy range vector;
and the convergence and combination module is used for converging and combining the noisy range vectors according to the dimension to obtain a noisy workload query set for response.
9. An electronic device, comprising:
a memory storing at least one instruction; and
A processor executing instructions stored in the memory to implement the OLAP star connection workload query differential privacy preserving method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein at least one instruction for execution by a processor in an electronic device to implement the OLAP star connected workload query differential privacy protection method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410105845.0A CN117633902A (en) | 2024-01-25 | 2024-01-25 | OLAP star-type connection workload query differential privacy protection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410105845.0A CN117633902A (en) | 2024-01-25 | 2024-01-25 | OLAP star-type connection workload query differential privacy protection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117633902A true CN117633902A (en) | 2024-03-01 |
Family
ID=90035842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410105845.0A Pending CN117633902A (en) | 2024-01-25 | 2024-01-25 | OLAP star-type connection workload query differential privacy protection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117633902A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833752A (en) * | 2010-04-20 | 2010-09-15 | 南京航空航天大学 | Pretreatment method for decomposed and reconstituted infrared small targets based on singular values |
CN109284620A (en) * | 2017-07-19 | 2019-01-29 | 中国移动通信集团黑龙江有限公司 | A kind of generation method, device and server for issuing data |
CN114912140A (en) * | 2022-04-15 | 2022-08-16 | 支付宝(杭州)信息技术有限公司 | Method, device, equipment and medium for processing data to be shared based on differential privacy |
CN115858579A (en) * | 2022-10-28 | 2023-03-28 | 杭州世平信息科技有限公司 | Data warehouse star connection query method, system and medium based on differential privacy |
CN116451278A (en) * | 2023-06-19 | 2023-07-18 | 杭州世平信息科技有限公司 | Star-connection workload query privacy protection method, system, equipment and medium |
-
2024
- 2024-01-25 CN CN202410105845.0A patent/CN117633902A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833752A (en) * | 2010-04-20 | 2010-09-15 | 南京航空航天大学 | Pretreatment method for decomposed and reconstituted infrared small targets based on singular values |
CN109284620A (en) * | 2017-07-19 | 2019-01-29 | 中国移动通信集团黑龙江有限公司 | A kind of generation method, device and server for issuing data |
CN114912140A (en) * | 2022-04-15 | 2022-08-16 | 支付宝(杭州)信息技术有限公司 | Method, device, equipment and medium for processing data to be shared based on differential privacy |
CN115858579A (en) * | 2022-10-28 | 2023-03-28 | 杭州世平信息科技有限公司 | Data warehouse star connection query method, system and medium based on differential privacy |
CN116451278A (en) * | 2023-06-19 | 2023-07-18 | 杭州世平信息科技有限公司 | Star-connection workload query privacy protection method, system, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10282366B2 (en) | Multi-dimensional decomposition computing method and system | |
US10719513B2 (en) | Salient sampling for query size estimation | |
CN110555316A (en) | privacy protection table data sharing algorithm based on cluster anonymity | |
WO2014138745A2 (en) | Method and system for clustering and classifying online visual information | |
Ben-Chen et al. | On the optimality of spectral compression of mesh data | |
US10810458B2 (en) | Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors | |
Cuzzocrea et al. | LCS-Hist: taming massive high-dimensional data cube compression | |
CN113254988A (en) | High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment | |
CN115858579A (en) | Data warehouse star connection query method, system and medium based on differential privacy | |
US20230195726A1 (en) | Selecting between hydration-based scanning and stateless scale-out scanning to improve query performance | |
Chang et al. | Cloud computing storage backup and recovery strategy based on secure IoT and spark | |
Wang et al. | Attention reweighted sparse subspace clustering | |
Tiwari et al. | Security Protection Mechanism in Cloud Computing Authorization Model Using Machine Learning Techniques | |
CN116451278A (en) | Star-connection workload query privacy protection method, system, equipment and medium | |
CN118473824A (en) | Communication data real-time acquisition method, device, equipment and storage medium | |
CN117633902A (en) | OLAP star-type connection workload query differential privacy protection method and system | |
Nider et al. | Bulk JPEG decoding on in-memory processors | |
Das et al. | An enhanced block-based Compressed Sensing technique using orthogonal matching pursuit | |
WO2023221275A1 (en) | Node classification method and system based on tensor graph convolutional network | |
US11709798B2 (en) | Hash suppression | |
WO2023065477A1 (en) | Spatial text query method and apparatus | |
CN110059520B (en) | Iris feature extraction method, iris feature extraction device and iris recognition system | |
Xu et al. | A multi‐dimensional index for privacy‐preserving queries in cloud computing | |
US11544240B1 (en) | Featurization for columnar databases | |
CN113590322A (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |