WO2023138379A1 - 基于隐私保护的联合数据查询方法及装置 - Google Patents

基于隐私保护的联合数据查询方法及装置 Download PDF

Info

Publication number
WO2023138379A1
WO2023138379A1 PCT/CN2023/070474 CN2023070474W WO2023138379A1 WO 2023138379 A1 WO2023138379 A1 WO 2023138379A1 CN 2023070474 W CN2023070474 W CN 2023070474W WO 2023138379 A1 WO2023138379 A1 WO 2023138379A1
Authority
WO
WIPO (PCT)
Prior art keywords
row
ciphertext
index
data
attribute
Prior art date
Application number
PCT/CN2023/070474
Other languages
English (en)
French (fr)
Inventor
潘无穷
韦韬
李婷婷
段然
金杯
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2023138379A1 publication Critical patent/WO2023138379A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Definitions

  • One or more embodiments of this specification relate to the field of computer technology, and in particular to a method and device for joint data query based on privacy protection.
  • data party 1 holds the user's physical status data such as height and weight
  • data party 2 holds the user's salary data
  • data party 3 holds the user's loan data.
  • the protection and security of data privacy become issues worthy of attention.
  • Multi-party secure computing can jointly process relevant data without leaking local data to other data parties to achieve certain business purposes.
  • a trusted third party can be used to realize the secure joint processing of data.
  • each data provider usually encrypts the local data provided to the third party, and the data obtained by the third party is usually ciphertext data.
  • ciphertext data needs to be used repeatedly, such as query usage, a third party can build a ciphertext database for the data of each data party.
  • database query scenarios especially in query scenarios such as query range and ranking, how to avoid leaking data privacy through database indexes is very important in data security.
  • One or more embodiments of this specification describe a data query method and device based on privacy protection, so as to solve one or more problems mentioned in the background art.
  • a privacy protection-based joint data query method is provided, which is used for a third party to securely query target data from joint data tables of multiple data parties.
  • the joint data table is an attribute ciphertext data table securely established based on joint attribute data of several business subjects based on multiple data parties.
  • the method is executed by the third party, including: based on the comparison between the ciphertext of the query target and the ciphertext of a plurality of index points corresponding to the first attribute column, obtaining several associated index points of the query target, wherein each index point corresponding to the first attribute column is indexed according to the attribute values of the first attribute column.
  • the data segmentation point, the query target includes the query value on the first attribute column; obtain the encoded ciphertext of each row identifier corresponding to the plurality of associated index points, and the row identifier of each row is determined by row encoding for the out-of-order data table obtained out of order in the joint data table in advance; the corresponding encoded ciphertext is restored to the plaintext row identifier, thereby determining the candidate row from the out-of-order data table; comparing the ciphertext of the attribute value of each candidate row in the first attribute column with the ciphertext of the query target, so as to determine the final target row from each candidate row, to Get the target data in the target row.
  • the third party has a trusted secret-state computing architecture
  • the trusted secret-state computing architecture includes a plurality of nodes, and each node processes services related to the joint data table through multi-party secure computing.
  • the joint data table is stored in the third party in the form of component ciphertext
  • a single node stores a corresponding single component ciphertext for a single element in the joint data table
  • the query target is split into various components by the querying party, and each node holds the component ciphertext of a single component.
  • a single node in said third party is implemented in a trusted execution environment.
  • the comparison between the ciphertext of the query target and the ciphertexts of multiple index points corresponding to the first attribute column, and obtaining several associated index points of the query target includes: comparing the ciphertext of the query value with each index point corresponding to the first attribute column in turn; and when the comparison result is that the size of the query value is opposite to that of two adjacent index points, determining the two adjacent index points as the associated index points of the query object.
  • the comparison between the ciphertext of the query target and the ciphertexts of multiple index points corresponding to the first attribute column, and obtaining several associated index points of the query target includes: sequentially comparing the ciphertext of the query value with each index point corresponding to the first attribute column; and determining the minimum index point/maximum index point as the associated index point of the query target when the comparison result is that the query value is smaller than all index points/all maximum index points.
  • each index point is determined in the following manner: extract the ciphertext of each attribute value of the first attribute column from the out-of-sequence data table, and form a first index table with the encoded ciphertext of the corresponding row identifier, and the encoded ciphertext of a single row identifier is obtained by encrypting the single row identifier in a predetermined manner; sort the first index table according to the size of the attribute value of the first attribute column after disordering; based on the sorting result, determine each index point according to the predetermined attribute value segmentation condition.
  • the multiple index points corresponding to the first attribute column are primary index points of the first attribute column, a single primary index point corresponds to multiple secondary index points, and each secondary index point divides the row data corresponding to the single primary index point into a plurality of secondary index attribute value ranges; comparing the ciphertext of the query target with the ciphertext of each index point corresponding to the first attribute column, and obtaining several associated index points of the query target includes: comparing the ciphertext of the query target with the ciphertext of each primary index point of the first attribute column to obtain a number associated with the query target A first-level index point: based on the comparison between the ciphertext of the query target and the ciphertext of the secondary index points corresponding to the plurality of first-level index points, determine several associated index points of the query target from each secondary index point.
  • the query target further includes the query value on the second attribute column
  • the recovering the corresponding coded ciphertext to the plaintext row identifier, so as to obtain each corresponding candidate row from the out-of-sequence data table further includes: obtaining a first plaintext row identifier set according to the plaintext row identifiers recovered from the encoded ciphertext corresponding to the several associated index points; detecting the intersection of the first plaintext row identifier set and the second plaintext row identifier set to obtain several common row identifiers, wherein the second plaintext row identifier set includes each Each plaintext row identifier determined by the index point, each index point associated with the query target in the second attribute column is determined based on a comparison between the ciphertext of the query target and a plurality of index point ciphertexts corresponding to the second attribute column; each row corresponding to each common row identifier in the disordered data table is determined as a candidate row.
  • the encoded ciphertext of the plaintext row identifier is updated and used to re-build the index when one of the following is satisfied: the predetermined time arrives; the third-party system is idle; the number of restored plaintext row identifiers reaches a predetermined number threshold.
  • a privacy protection-based joint data query device for a third party to securely query target data from joint data tables of multiple data parties
  • the joint data table is an attribute ciphertext data table that is securely established based on the joint attribute data of several business subjects based on multiple data parties
  • the device is located at the third party, including:
  • the indexing unit is configured to obtain a plurality of associated index points of the query target based on a comparison between the ciphertext of the query target and the ciphertext of multiple index points corresponding to the first attribute column, wherein each index point corresponding to the first attribute column is a data segmentation point indexed according to the sorting of the attribute values of the first attribute column, and the query target includes the query value on the first attribute column;
  • the row identification unit is configured to obtain the coded ciphertexts of the respective row identifiers corresponding to the plurality of associated index points, and restore the corresponding coded ciphertexts to the plaintext row identifiers, so as to acquire corresponding candidate rows from the disordered data table obtained in advance for the disordered order of the joint data table, wherein the row identifiers of each row are determined by row encoding for the disordered data table;
  • the target determining unit is configured to compare the ciphertext of the attribute value in the first attribute column of each candidate row with the ciphertext of the query target, so as to determine the final target row from each candidate row, so as to obtain the target data in the target row.
  • a computing device including a memory and a processor, wherein executable codes are stored in the memory, and when the processor executes the executable codes, the method of the first aspect is implemented.
  • row identifiers are introduced for the disordered data tables after the joint data table is out of order, and indexes are established through the row identifiers.
  • the row identifier is determined by a third party and exists in ciphertext in the index table, while the row identifier of candidate rows can be restored to plaintext, which not only speeds up the efficiency of encrypted query, but also ensures that the encrypted data in the joint data table is not leaked.
  • Figure 1 shows a schematic diagram of a specific implementation architecture of trusted secret state computing
  • FIG. 2 shows a schematic flow diagram of a third party indexing a joint data table in a secret state according to an embodiment
  • FIG. 3 shows a schematic diagram of a third party establishing an index according to a single attribute column in a secret state for a joint data table according to a specific example
  • FIG. 4 shows a schematic diagram of a data query process based on privacy protection according to an embodiment
  • Fig. 5 shows a schematic block diagram of an apparatus for querying data based on privacy protection according to an embodiment.
  • data party 1 holds the attribute values of the user's height, weight and other physical status attributes
  • data party 2 holds the attribute value of the user's salary attribute (specific salary amount)
  • data party 3 holds the attribute value of the user's loan attribute (specific loan amount).
  • Data cube 1 or data cube 3 queries the sum of salaries of users whose salary is in the top N position
  • data cube 1 queries the loan status of users whose salary is in the top N position
  • the data held by a single data party has limitations (not comprehensive enough), and each data party does not want to leak local data.
  • a third party can be used to realize the secure joint processing of data.
  • each data party can send the data it holds to the third party in advance, and the third party performs joint sorting based on the data sent by each data party.
  • a third party sends a business processing request (such as a query request) for a certain data party or other business parties and needs to obtain corresponding data from the joint data for subsequent business processing, the data query results can be obtained based on the corresponding sorting results.
  • Trusted secret state computing is a safe and efficient secret state computing method, which can calculate a common result for multiple data parties without revealing any party's data.
  • Figure 1 shows an implementation architecture of trusted secret state computing.
  • the third party can use multiple nodes (such as node A, node B, node C, etc.) for joint calculation.
  • Multiple nodes can build a framework for trusted secret-state computing through multi-party secure computing (MPC). That is to say, the various nodes of the third party do not disclose data to each other, and complete various data processing in a safe manner. For example, establish a comprehensive ciphertext database, and obtain data based on the ciphertext database for various business processes.
  • MPC multi-party secure computing
  • each data party may be in a network environment of a public network
  • each node of a third party may be in a specific network environment.
  • the high-speed network shown in FIG. 1 may be a network capable of high-speed calculation and communication.
  • High-speed computing is usually determined by the central processing unit of the device, and high-speed communication can be achieved by improving communication capabilities (such as quantum communication, etc.) or reducing the number of communication gateways (such as being in the same local area network).
  • multiple nodes are separated from the public network via a specific network, and at the same time can interact with devices in the public network.
  • each data party as shown in Figure 1
  • the implementer of TECC may be called a service party or a third party (hereinafter collectively referred to as the service party), which is used to provide services for joint data processing among various data parties.
  • a single data party can split the local data into multiple components before providing the local data to the TECC architecture.
  • data party 1 can split the local data U into the sum of U1, U2, and U3.
  • a single data party can send the split ciphertext of multiple components to each node respectively. In this way, for each data from the data side, each node holds a component, and the corresponding data can only be restored if all components on multiple nodes are obtained.
  • Each node of the third party performs multi-party secure computing (MPC) on the received data, that is, it obtains corresponding processing results without disclosing local data to each other.
  • MPC methods can be, for example, secret sharing, homomorphic encryption, confusing circuits, and so on.
  • TECC in order to further ensure security and computing performance, each node can be located in a high-speed network (such as a local area network) capable of high-speed computing and communication. The entire computing process has very little interaction with the public network, so trusted secret computing can balance security and computing performance to achieve an ideal state.
  • secure channels may also be established between a single data party and each node to increase data security.
  • the way of establishing the secure channel can be realized through at least one of the agreed protocol, key, dedicated communication interface, etc., which will not be repeated here.
  • each node of Trusted Secret Computing is built on the basis of Trusted Execution Environment (TEE).
  • TEE is a security enclave built by hardware and software methods on a computing platform, which can guarantee that the code and data loaded in the security enclave are protected in terms of confidentiality and integrity.
  • the TEE technology can ensure that the participant's data only exists in the TEE, and the host and owner of the TEE cannot obtain the plaintext of the data (in the case that the TEE is not compromised).
  • each TEE only touches the data component from the beginning to the end, that is to say, even if an attacker breaks through a TEE and steals or modifies it, no valid information can be obtained.
  • the data sent by each data party to a third party can be in ciphertext form.
  • the data ciphertext of each data party can be uploaded to the third party in advance, so the third party can sort and index the ciphertext data in advance.
  • Table 1 shows the joint data form jointly established by each node of the third party.
  • the joint data table is mentioned here because the database table is established by the joint data of multiple data parties.
  • the data provided by multiple data parties to a third party is in ciphertext form, so the joint database can also be called a ciphertext data table.
  • each data party can agree on data alignment rules.
  • align by row that is, each row corresponds to the same target object, for example, the first row corresponds to target object 1 (such as user 1), the second row corresponds to target object 2 (such as user 2), and the Nth row corresponds to target object N.
  • aligning is performed, that is, each target object arranges data according to the preset order of attribute items, such as the first column corresponds to attribute item 1, the second column corresponds to attribute item 2...the Vth column corresponds to attribute item V, and so on.
  • Each data party can send the local attribute value to the third party according to the alignment rules.
  • target objects can be users, goods, service items, and so on.
  • the target object may not appear in the joint data table, that is, each data party only agrees on the alignment rules of the target object, but does not send the identification of the target object (as shown in Table 1).
  • a single target object can be described by a predetermined target identifier (such as user name, product number, etc.), and each data party can unify the target object ciphertext and send it to a third party.
  • the target object identifier ciphertext can be treated as an attribute column in the joint data table.
  • the joint data table can be stored in multiple nodes in a secure manner.
  • a single node holds a fragment of the joint data table.
  • the joint data tables of each node are added according to the corresponding elements, a complete joint data table can be synthesized.
  • a single element uses the random split component of the data cube as the component ciphertext, or the ciphertext obtained by encrypting the random split component with a predetermined encryption method (such as hash encryption, etc.).
  • sorting and indexing are important steps in the data query process.
  • the conventional technology can adopt a sorting method based on a similar order-preserving encryption method.
  • the third party obtains the attribute value ciphertext uploaded by each data party to obtain the table to be sorted.
  • the user ID and attribute value of each attribute item in the table are in ciphertext (for example, one row in the table stores the attribute value ciphertext of one user for multiple attribute items, and one column stores the attribute value ciphertext of each user for one attribute item), but the order relationship between each row and each column is clear (beneficial to initial data alignment), for example, the first row corresponds to user 1, and the second row corresponds to user 2, etc.
  • the third party sorts according to one or more attribute items, so as to build an index.
  • Sorting can be done by row or by column.
  • row-by-row is taken as an example for illustration, but the case of sorting by column is not excluded.
  • sorting by row means that the data of a row is fixed, and the position of the row as a whole is arranged on the column.
  • each row in the database corresponds to a piece of business data
  • each column corresponds to a data item in each piece of business data.
  • a row corresponds to a user
  • a column corresponds to a business attribute.
  • Sorting by row is for user sorting, and the sorting basis can be one or more business attributes. When the sorting of a single user changes, the sorting of the entire row of data changes together.
  • an index can be built based on the data of a certain column or multiple columns.
  • methods for establishing index points include, for example, bucketing, quick sort and truncation, and the like.
  • the bucketing points that is, the dividing points of the buckets, such as height 150cm, 160cm, etc.
  • the specific bucketing point can be selected according to historical experience, or "find the maximum and minimum values first, and then divide the range between them equally".
  • the data party 1 can send the ciphertext of each component of U to each node of the third party in a secure manner, and each node of the third party receives the ciphertext of each component respectively, and jointly uses each component to query in the ciphertext database, thereby feeding back the query result.
  • the server needs to jointly sort the data in the ciphertext database through multi-party security calculations, and then obtain the data query results based on the corresponding sorting results.
  • Each node feeds back the query result to the single participant through multi-party secure computation.
  • the data party can also send other business processing requests to the third party, for example, using the credit status of the user with the lowest salary in the past three years to predict their default risk, and so on. It can be understood that in the case where the third party needs to continue to perform subsequent business processing, the third party may not feed back the corresponding data, but continue to perform corresponding data processing within the scope of the local LAN.
  • data party 1 queries the loan status of users whose salary is in the top N positions, or queries the sum of salaries of users whose height is in the top M position and the sum of salaries of users whose weight is in the top M position, and so on.
  • a third party uses the above sorting method to index data, it may cause the order relationship of each row in the sorting column to be exposed when querying the data in the table.
  • the query target attribute items include the user's weight attribute item and salary attribute item.
  • this specification provides a technical solution for joint data query based on privacy protection.
  • This technical solution is applicable to the scenario where the data of multiple data parties is safely held by a third party in the corresponding ciphertext database, and the third party securely obtains target data from it according to business needs.
  • the technical solution establishes an index based on the pre-sorted sorting results of the joint data of multiple data parties, and prevents a single node of a third party from speculating on private data based on the sorting results.
  • the technical issues in this manual are based on the third party’s TECC architecture, but the practical application is not limited to the third party’s TECC architecture.
  • the third party can also be a single device or other security forms, which are not limited in this manual.
  • the following firstly describes the process of establishing an index for the sorting of the joint data table by a third party.
  • Fig. 2 shows the process of building an index table executed by a third party according to an embodiment. It is worth noting that, when the third party is a device, this process is completed for a single device of the third party. When the third party is a cluster similar to TECC, this process can be completed by the corresponding cluster in a multi-party secure computing manner.
  • the process of establishing an index table performed by a third party may include: step 201, performing random order on the joint data table to obtain a random data table; step 202, performing plaintext encoding for the disordered data table by row, so that each row of data corresponds to a plaintext row identifier; step 203, constructing a first index table using the encoded ciphertext of the first attribute column and the row identifier encoded in plaintext; step 204, sorting the first index table according to the first attribute column, thereby obtaining each index point corresponding to the first attribute column.
  • the joint data table is reordered to obtain a reordered data table.
  • the joint data table is jointly established by a third party based on the data of multiple data parties, and will not be repeated here.
  • the reordering can be done in a predetermined way or randomly, that is, the row of a row of business data after reordering is random. As shown in FIG. 3 , in each row in the out-of-order table, the corresponding relationship between the target object and the ciphertext of the attribute values of the corresponding attribute items does not change, but the arrangement order of the rows changes. In particular, under the TECC architecture, reordering can be performed in a multi-party secure computing manner. For example, it is safe to synchronize which row a certain row is transferred to.
  • step 202 plaintext encoding is performed row by row for the out-of-sequence data table, so that each row of data corresponds to a row identifier in plaintext.
  • the row identifier may be generated in a predetermined manner, and is used to identify the identifier of the corresponding row. It may consist of at least one of numbers, letters, symbols, etc. For example, each row in FIG. 3 is coded as 0, 1, 2, 3... in sequence. Through a single row identifier, the row of data identified by it can be located.
  • a data table 301 is used to represent an out-of-sequence data table with plaintext row identifiers added. It is worth noting that, under the technical conception of this specification, table 301 is the basis for constructing indexes of each column. During the effective use of the index constructed according to the current row identifier, table 301 can be the always-existing query basis. In actual scenarios, the table 301 can be used as a basic table for indexing each attribute column, such as a parent table for indexing.
  • a first index table is constructed using the ciphertext identified by the first attribute column and each row. Since the plaintext-encoded row identifier is closely combined with the row data, directly using the plaintext row identifier for indexing may leak data privacy. For this reason, in the subsequent indexing process, the ciphertext of the row identification can be used to build a related table.
  • the ciphertext of the row identifier can be obtained by encrypting the row identifier using a predetermined encryption method, or the components split in the form of random components can be respectively stored in each node for encryption. In this way, any device of a third party cannot directly deduce the plaintext row identifier from the locally held data alone.
  • the relevant index is usually established according to the column.
  • a column that may be used as an index (such as the first attribute column) can be extracted, and a corresponding index table can be constructed together with the ciphertext of the row identification, which is referred to as the first index table here.
  • each attribute value ciphertext of C1 column and the ciphertext column of row identification can be extracted to form the first index table. In this way, the size of the table for sorting is greatly reduced.
  • the ciphertext of each attribute value of the C2 column and the ciphertext column of the row identifier can be extracted to form a second index table.
  • corresponding index tables can be established for each attribute column.
  • the first index table is sorted according to the first attribute column, so as to obtain each index point corresponding to the first attribute column, and an index is established for the joint data table according to the first attribute column.
  • the sorting here is usually done according to the size of the attribute value. For example, when C1 is listed as wages, rank according to wages from high to low or from low to high; if C1 is listed as weight data, sort according to weight.
  • the sorting method here can be bucketing, quick sort and truncation, etc., which will not be described here. After sorting, in the obtained target sorting table, each row is sorted in descending order (or from small to large) of the attribute values corresponding to the target attribute items.
  • the row of the largest attribute value ciphertext corresponding to the target attribute item is located in the first row; the row of the second largest attribute value ciphertext corresponding to the target attribute item is located in the second row, and so on, the row of the smallest attribute value ciphertext corresponding to the target attribute item is located in the Nth row.
  • the size of each attribute value is safely compared according to the encryption rules, which will not be repeated here.
  • a third party can obtain the size comparison result of the attribute value ciphertext. For example, through security comparison, if the attribute value of the third row is larger than the attribute value of the second row, each node can exchange the order of the third row and the second row.
  • An index point can be understood as a data split point determined for the convenience of query.
  • Two adjacent index points can define an attribute value range. For example, to build an index on the user's height, one index point can be determined every 10 cm, then when querying user data of a given height (such as 161 cm to 168 cm), you can compare the given height value (such as 165) with each index point, or compare the endpoint of a given height range with each index point, so as to quickly obtain candidate data in a smaller range.
  • a given height such as 161 cm to 168 cm
  • the index point is usually related to the attribute value corresponding to the attribute column, which can be predetermined or divided according to the number of target objects.
  • the given index point is, for example, 2,000 yuan, 5,000 yuan, 10,000 yuan, 30,000 yuan, etc.
  • the segmented attribute value ranges are, for example, 0 to 2,000 yuan (exclusive), 2,000 yuan (inclusive) to 5,000 yuan (exclusive), 5,000 yuan (inclusive) to 10,000 yuan (exclusive), 10,000 yuan (inclusive) to 30,000 yuan (inclusive), and more than 30,000 yuan (inclusive).
  • the attribute value of the salary attribute column can be compared with the given index point, so as to determine which index point or points the corresponding row belongs to. In this case, the number of data pieces corresponding to each index point may not be equal.
  • each index point is determined based on the salary amount corresponding to the 1000th person, the salary amount corresponding to the 2000th person, the salary amount corresponding to the 3000th person...etc.
  • Each index point can be divided into 1000 people.
  • the index point may be the salary amount corresponding to the corresponding ranking, or other numerical value determined based on the salary amount that can be distinguished from the next salary amount.
  • the salary corresponding to the 1,000th person can be used as an index point, and 1,000 pieces of data from 0 to the salary (inclusive) can be divided into 1,000 pieces of data, or the salary of the 1,001st person can be used as an index point, and 1,000 pieces of data from 0 to the salary (exclusive) can be divided, and a value (such as the average value) between the salary corresponding to the 1,000th person and the salary of the 1,001st person can also be used as an index point...
  • each index point actually divides the data into multiple data ranges, for example, several pieces of data between two adjacent index points are a data range (such as the range of salary from 2000 yuan to 5000 yuan).
  • the index points of a single attribute column may not contain endpoints, but only include intermediate partition points that divide the data. In this way, the number of index points is 1 less than the number of attribute value ranges that separate the data.
  • the index points of a single attribute column may also contain an endpoint, so that the number of index points corresponds to the number of ranges separating the data.
  • each index point may correspond to an attribute value range constituted by adjacent index points smaller or larger than it.
  • the index points of a single attribute column may also include two endpoints, so that the number of index points is one more than the number of attribute value ranges that separate the data.
  • the attribute value range can be described by two adjacent index points.
  • FIG. 3 shows an example in which a single index point corresponds to a single index subtable.
  • the data in a single index sub-table may be arranged according to the size of the attribute values in the attribute columns of the corresponding index.
  • the first index point according to the attribute column C1 corresponds to the first subtable, which contains several row identification ciphertexts arranged in order of size.
  • the data in the first sub-table may be data that is divided according to index points and then reordered.
  • each sub-table described above is regarded as a first-level index, and the corresponding index points are regarded as a first-level index point, and then a second-level index can also be constructed for each sub-table.
  • secondary indexes can divide data by secondary index points.
  • the determination method of the second-level index point is similar to that of the first-level index point, and the sub-tables under the index are divided at a finer granularity. It can be understood that each index sub-table of the primary index is established on the basis of sorting the target attribute column, therefore, the secondary index can be directly established according to the sorting result.
  • an index point can be established for every 200 attribute value ciphertexts corresponding to the 1000 attribute value ciphertexts of the primary index point in order.
  • the secondary index can further narrow the data range when the data volume of the primary index is large, thereby speeding up the data query process.
  • an index of three or more levels may also be established to further speed up data query.
  • Figure 2 and Figure 3 show the process of building an index based on a single attribute column in different forms, and in practice, it is also possible to build an index for multiple attribute columns in a similar manner.
  • step 203 and step 204 may be repeated for at least one of the age attribute column, height attribute column, etc. to determine corresponding index points and construct corresponding retrieval tables.
  • the index tables corresponding to multiple attribute columns can be constructed according to the data table 301 .
  • a third party can pre-build an index on one attribute column or multiple attribute columns.
  • one or another business scenario that requires sorted data may arise.
  • the current business process requires the sum of the top 10 salaries of the salary attribute item, or the annual income of the top 10 users by weight, and so on.
  • These business scenarios may be query requests initiated by a single data party or other business parties that are allowed to use joint data, or it may be that a third party needs to obtain corresponding business data when processing other businesses.
  • the third party can query according to the corresponding index according to the query requirements.
  • Fig. 4 shows a process of joint data query based on privacy protection according to an embodiment.
  • This process can query data from the joint data table based on the indexes established in the manners shown in Fig. 2 and Fig. 3 .
  • the third party may perform the following steps: step 401, based on the comparison between the ciphertext of the query target and the ciphertexts of multiple index points corresponding to the first attribute column, obtain several associated index points of the query target, wherein each index point corresponding to the first attribute column is a data segmentation point indexed according to the order of the attribute values of the first attribute column, and the query target includes the query value on the first attribute column;
  • the sequence data table is determined by row coding;
  • Step 403 the corresponding encoded ciphertext is restored to the plaintext row identifier, thereby determining the candidate row from the out-of-sequence data table;
  • Step 404 comparing the ciphertext of the attribute value of each candidate row in the first attribute column with the cip
  • step 401 based on the comparison between the ciphertext of the query target and the ciphertext of multiple index points corresponding to the first attribute column, several associated index points of the query target are obtained.
  • the query target can correspond to a precise value, or it can correspond to a query range.
  • the first attribute column may be any attribute column in the joint data table, which is the target column of the query here. For example, to query the salary of a user whose weight is 200 catties, the query target corresponds to an accurate value of 200 catties, and the corresponding first attribute column is the weight attribute column.
  • the query target corresponds to the range of 10-20 years, and the first attribute column is the service age attribute column; to filter employees whose age is 35 to 45 years old, the query target corresponds to the range of 35 to 45 years old, and the first attribute column is the age attribute column.
  • the query target is in the form of ciphertext, and the ciphertext of the query target can be determined based on the query request of the data party or other business parties, or can be determined based on the intermediate results of the business currently processed by the third party.
  • the query party can convert the query target (such as the attribute value of the attribute column to be queried) into the corresponding ciphertext according to the requirements of the third-party architecture and send it to the third party.
  • the query party can randomly split the value corresponding to the query target into multiple components, and each component is sent to a node of the third party, so that the value of the attribute column to be queried is kept confidential for each node of the third party (each node only obtains one component).
  • component ciphertexts For example, if data party 1 wants to query the salary of a user whose weight is 200 catties, he can send the ciphertext of the attribute value of the weight attribute of 200 catties to a third party.
  • the third party can obtain the corresponding intermediate results as the query target when processing other services based on requests from data parties or other business parties. It is understandable that, in order to protect data privacy, the third party may use ciphertext during data processing. Then the query target as an intermediate result is also in ciphertext form. At this point, the corresponding intermediate result is the ciphertext of the query target.
  • the intermediate result can be that the initial onset age of the three high patients is concentrated in the 40 to 50 years old, and then the income status of the people aged 40 to 50 years old should be calculated by querying the joint data table, and so on.
  • the query result for the query target can also be directly used as an intermediate result for subsequent business processing. For example, statistics are made on the top 5 pension spending items among those who have retired in the past five years. Then the third party may first query the row data of the retired personnel in the past five years according to the age value of each user as an intermediate result, and then calculate the sum of the expenditure amount of the pension spent on the attribute items of each expenditure item in these rows.
  • each index point can divide each attribute value range of the corresponding attribute column, so by comparing the ciphertext of the query target with the ciphertext of the index point, the index point associated with the query target in the first attribute column corresponding to the query target can be determined as the associated index point.
  • the query target and index point are both stored in the third party in the form of component, and the third party can compare the size of the two ciphertexts through security comparison and other methods.
  • the ciphertext comparison result of the query target and the index point can be determined in plain text, so as to determine the attribute value range that the query target falls into according to business requirements.
  • the range of attribute values that the query target falls into can be separated by associated index points.
  • the way of determining the associated index point is related to the way the index point divides the attribute value and the form of the query target. For example, when a single index point corresponding to the first attribute column is used as an intermediate separation point to separate the attribute value of the first attribute column into multiple attribute value ranges, the ciphertext of the query value can be compared with each index point corresponding to the first attribute column in turn, and then according to the size comparison result, the index point corresponding to the attribute value range where the query value is located is determined as the associated index point of the query target. For example, in the case that the query value is greater than the first index point but smaller than the second index point, the adjacent first index point and second index point may be determined as associated index points.
  • the minimum index point can be determined as the associated index point of the query target, or if the comparison result is that the query value is greater than the maximum index point, the maximum index point can also be determined as the associated index point of the query target. In other cases, the associated index point may also be determined in other ways, which will not be exhaustive here.
  • the ciphertext of the query target can be compared with the ciphertext of the index point of the first-level index to determine the first-level index associated with the query target.
  • the index point ciphertext of the last-level index (such as the secondary index) is compared, the last-level index point is determined as the associated index point of the query target. In this way, the range of candidate rows for the index can be narrowed step by step, reducing the amount of subsequent data processing.
  • step 402 the encoded ciphertext of each row identifier corresponding to several associated index points is obtained.
  • the associated index point can be divided into rows corresponding to multiple attribute values including the query target.
  • each attribute value row also corresponds to a row identifier obtained by encoding the row for the out-of-sequence data table, and the row identifier appears in the form of encoded ciphertext in the index table. Therefore, the encoded ciphertext corresponding to each row can be obtained.
  • the corresponding coded ciphertext is restored to the plaintext row identification, so as to determine the candidate row from the out-of-sequence data table.
  • the encoded ciphertext can be restored to a plaintext identifier by a third party.
  • the plaintext data of the row identification can be obtained by each node disclosing the local components to each other through the encrypted ciphertext stored in each node in the form of random components.
  • the associated index points (such as 40 and 50 years old) corresponding to the first index range (such as 30 to 45 years old) can be determined in the index of the first attribute column.
  • the plaintext row identifier recovered by encoding the ciphertext obtains the first plaintext row identifier set, and the associated index point (such as 150 jin) corresponding to the second index range (such as the body weight is more than 150 jin) is determined in the index of the second attribute column, and the corresponding candidate row (such as the row whose weight attribute value is more than 150 jin) is obtained, and the second plaintext row identifier set is obtained according to the plaintext row identifier recovered by the encoded ciphertext of each row corresponding to the second index range. Then, the intersection of the first plaintext row identifier set and the second plaintext row identifier set is detected to obtain several plaintext data with common row identifiers. Further, the row corresponding to each shared row identifier in the out-of-sequence data table (such as table 301 in FIG. 3 ) may be determined as a candidate row.
  • step 404 the ciphertext of the attribute value in the first attribute column of each candidate row is compared with the ciphertext of the query target, so as to determine the final target row from each candidate row, so as to obtain the target data in the target row.
  • the candidate rows are based on the comparison of the query target ciphertext with a relatively small number of index points, and the obtained candidate rows can greatly narrow the search range of the target row.
  • the comparison between the ciphertext of the query target and the ciphertext of the attribute value in the first attribute column of the candidate row may also include at least one of value-to-value comparison, value-to-range comparison, and range-to-range comparison.
  • value-to-value comparison may be a comparison of whether two values are the same
  • value-to-range comparison may be a value-to-range comparison
  • range-to-range comparison may be a cross-comparison between two range endpoints, which will not be repeated here.
  • the candidate rows may be several rows determined based on the index points of the salary attribute column, such as rows whose salary ranges from RMB 5,000 to RMB 10,000. Comparing the salary attribute value of a single candidate row with the ciphertext of the query target ranging from 5000 yuan to 6000 yuan, you can get the candidate row whose salary attribute value falls in the range of 5000 yuan to 6000 yuan as the target row, and obtain the ciphertext data of the target row. In some embodiments, the number of target rows satisfying the query condition may also be zero. Wherein, under the TECC framework, the comparison of ciphertext values may use security comparison, which will not be described in detail here.
  • the ciphertext data can be fed back to the query party as a query result, or can be used as an intermediate result of the current business for subsequent processing. For example, take users whose salary is between 5,000 and 6,000 as intermediate results, average their ages or determine data charts, and so on.
  • the sorting data used is data sorted by age attribute column. Assume that in the data sorted by the age attribute column, a single index point corresponds to an age group of 10 years old.
  • the third party can compare the index point ciphertext with 35 to determine that the first index point greater than 35 is 40, and the first index point greater than 45 is 50, then the encoded ciphertext of the row identifiers of the two index blocks of 40 index points (corresponding to 31 to 40 years old data) and 50 index points (corresponding to 41 to 50 years old data) can be obtained.
  • the third party can further narrow down the scope according to the secondary index, which will not be repeated here.
  • the third party can convert the corresponding encoded ciphertext into plaintext, and determine the row identifier of the plaintext, so as to obtain the corresponding row from the out-of-sequence data table as a candidate row, for example, there are 10,000 pieces of data.
  • the third party can compare the attribute value of the age attribute column in the 10,000 candidate row data with the numerical ciphertext at the endpoints of the two ranges of 35 and 45, and determine the candidate row in the age range greater than 35 and less than or equal to 45 as the target row.
  • the attribute value ciphertext (intermediate result) of each target row in the salary attribute column can be averaged in a secure encrypted manner (subsequent business processing).
  • a single node may also obtain some information involving plaintext recovery and range determination.
  • the number of queries is large enough and a single node obtains enough information, even if a single node is compromised by an attacker, there is also a risk of data leakage.
  • the number of row identifiers restored to plain text can be controlled.
  • all indexes are rebuilt. For example, when only one column-related index is used, index reconstruction is not required; once an index with more than one column is used, index reconstruction is performed based on predetermined conditions.
  • the number of row identifiers that have been restored to plain text can be recorded, and the index will be rebuilt when the predetermined number threshold is reached.
  • the third party can also perform index rebuilding at regular intervals (such as at intervals of 1 day) or when the system is idle.
  • the third party can also rebuild when the local system is idle, which is not limited here.
  • the method provided by the embodiment of this specification in the process of processing a joint data table composed of data from multiple data parties by a third party, in the case that relevant data needs to be obtained based on the sorting of several attribute columns of the data table, when the joint data table is sorted according to the attribute values of the attribute columns, row identifiers are introduced for the disordered data table after the joint data table is disordered, and indexes are established through the row identifiers.
  • the row identifier is determined by a third party and exists in ciphertext in the index table, while the row identifier of candidate rows can be restored to plaintext, which not only speeds up the efficiency of encrypted query, but also ensures that the encrypted data in the joint data table is not leaked.
  • the third party is a trusted secret state calculation
  • the data security in the joint data table can be ensured.
  • a device for querying data based on privacy protection may be located in a third party that handles joint data services of multiple data parties.
  • the third party is a TECC architecture
  • each node in the TECC architecture is equipped with a data query device based on privacy protection, and these devices cooperate with each other to complete the data query in the joint data table through multi-party secure computing.
  • the data query device 500 based on privacy protection includes: an indexing unit 501 configured to obtain a plurality of associated index points of the query target based on a comparison between the ciphertext of the query target and a plurality of index point ciphertexts corresponding to the first attribute column, wherein each index point corresponding to the first attribute column is a data segmentation point indexed according to the sorting of the attribute values of the first attribute column, and the query target includes the query value on the first attribute column; the row identification unit 502 is configured to obtain the encoded ciphertext corresponding to each row identifier of the plurality of associated index points, and The corresponding encoded ciphertext is restored to the plaintext row identifier, thereby obtaining each corresponding candidate row from the disordered data table obtained in advance for the disordered data table of the joint data table, wherein the row identifier of each row is determined by row encoding for the disordered data table; the target determination unit 503 is configured to compare the ciphertext
  • the third party has a trusted secret computing architecture
  • the trusted secret computing architecture includes a plurality of nodes, each node is equipped with a device, and the business related to the joint data table is processed between each device through multi-party secure computing.
  • a single device stores a corresponding single component ciphertext for a single element in the joint data table
  • the query target is split into components by the querying party
  • each device in each node holds the component ciphertext of a single component
  • the device 500 also includes an index construction unit (not shown), configured to determine each index point for the first attribute column in the following manner:
  • each index point is determined according to a predetermined attribute value segmentation method.
  • the index construction unit is further configured to update the encoded ciphertext of the plaintext row identifier and re-build the index when one of the following is satisfied: the predetermined time arrives; the third-party system is idle; the number of restored plaintext row identifiers reaches a predetermined number threshold.
  • the multiple index points corresponding to the first attribute column are each primary index point of the first attribute column, a single primary index point corresponds to multiple secondary index points, and each secondary index point divides the row data corresponding to the single primary index point into multiple secondary index attribute value ranges;
  • the index unit 501 is further configured as:
  • the query target further includes the query value on the second attribute column
  • the row identifying unit 502 is further configured to:
  • the second plaintext row identifier set includes each plaintext row identifier determined based on each index point associated with the query target in the second attribute column, and each index point associated with the query target in the second attribute column is determined based on a comparison between the ciphertext of the query target and a plurality of index point ciphertexts corresponding to the second attribute column;
  • Each row corresponding to each common row identifier in the out-of-sequence data table is determined as a candidate row.
  • the device 500 shown in FIG. 5 corresponds to the method described in FIG. 4 , and the corresponding descriptions in the method embodiment in FIG. 4 are also applicable to the device 500 , which will not be repeated here.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is instructed to execute the methods described in conjunction with FIG. 2 and FIG. 4 .
  • a computing device including a memory and a processor, where executable code is stored in the memory, and when the processor executes the executable code, implements the methods described in conjunction with FIG. 2 and FIG. 4 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

本说明书实施例提供一种基于隐私保护的数据查询方法及装置,在对由第三方对多个数据方的数据构成的联合数据表处理过程中,需要基于数据表若干属性列的排序获取相关数据的情况下,在联合数据表按照属性列的属性值排序时,针对联合数据表乱序后的乱序数据表引入行标识,并通过行标识建立索引。行标识由第三方确定,且在索引表中以密文形式存在,而候选行的行标识可以被恢复为明文,从而既加快密态查询效率,又能确保联合数据表中的密态数据不泄露。

Description

基于隐私保护的联合数据查询方法及装置
本申请要求于2022年01月20日提交中国国家知识产权局专利局、申请号为202210068068.8、发明名称为“基于隐私保护的联合数据查询方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及基于隐私保护的联合数据查询方法及装置。
背景技术
在大数据背景下,常常需要将不同数据方的业务数据进行综合处理。例如,在对用户信息进行分析场景中,数据方1持有用户的身高、体重等各身体状态数据,数据方2持有用户的工资数据,数据方3持有用户的借贷数据。在对多方数据进行联合处理的过程中,数据隐私的保护和安全性成为值得关注的问题。
联合数据处理可以通过多方安全计算(MPC)等方式实现。多方安全计算可以在各个数据方不向其他数据方泄露本地数据的情况下,联合处理相关数据,实现某些业务目的。特别地,在各个数据方的数据交互不足以保证数据安全或者设备性能不能满足数据处理时效等情况下,可以借助可信第三方实现数据的安全联合处理。在第三方的参与下,为了保障数据安全,各个数据方向第三方提供本地数据通常进行加密处理,第三方得到的数据通常为密文数据。在密文数据需要反复使用的情况下,例如查询使用等,第三方可以对各个数据方的数据构建密文数据库。在数据库查询场景下,尤其是查询范围、排名等查询场景下,如何避免通过数据库索引等泄露数据隐私,在数据安全中十分重要。
发明内容
本说明书一个或多个实施例描述了一种基于隐私保护的数据查询方法及装置,用以解决背景技术提到的一个或多个问题。
根据第一方面,提供一种基于隐私保护的联合数据查询方法,用于第三方从多个数据方的联合数据表中安全查询目标数据,所述联合数据表为基于多个数据方关于若干个业务 主体的联合属性数据安全建立的属性密文数据表,所述方法由所述第三方执行,包括:基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点,其中,与第一属性列对应的各个索引点是按照所述第一属性列的属性值排序而建立索引的数据分割点,所述查询目标包括在所述第一属性列上的查询值;获取与所述若干关联索引点对应的各个行标识各自的编码密文,各个行的行标识预先针对所述联合数据表乱序得到的乱序数据表按行编码确定;将相应的编码密文恢复明文行标识,从而从所述乱序数据表中确定候选行;针对各个候选行各自在所述第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取所述目标行中的目标数据。
在一个实施例中,所述第三方具有可信密态计算架构,所述可信密态计算架构包括多个节点,各个节点之间通过多方安全计算处理所述联合数据表相关的业务。
在一个进一步的实施例中,所述联合数据表在所述第三方以分量密文的形式存储,单个节点针对所述联合数据表中的单个元素存储相应的单个分量密文,所述查询目标经由查询方拆分为各个分量,并由各个节点各自持有单个分量的分量密文。
在一个实施例中,所述第三方中的单个节点在可信执行环境中实现。
在一个实施例中,所述基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点包括:将所述查询值的密文依次与所述第一属性列对应的各个索引点进行大小比较;在比较结果为所述查询值与相邻两个索引点的大小相反的情况下,将该相邻两个索引点确定为所述查询目标的关联索引点。
在一个实施例中,所述基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点包括:将所述查询值的密文依次与所述第一属性列对应的各个索引点进行大小比较;在比较结果为所述查询值小于全部索引点/全部最大索引点的情况下,将最小索引点/最大索引点确定为所述查询目标的关联索引点。
在一个实施例中,针对所述第一属性列,按照以下方式确定各个索引点:从所述乱序数据表中提取所述第一属性列的各个属性值密文,与相应行标识的编码密文构成第一索引表,单个行标识的编码密文基于对该单个行标识按照预定方式加密得到;对所述第一索引表乱序后按照所述第一属性列的属性值大小进行排序;基于排序结果,按照预定属性值分割条件确定各个索引点。
在一个实施例中,所述第一属性列对应的多个索引点为所述第一属性列的各个一级索引点,单个一级索引点对应多个二级索引点,各个二级索引点将该单个一级索引点对应的 行数据分割为多个二级索引属性值范围;所述基于查询目标的密文与第一属性列对应的各个索引点密文的对比,得到所述查询目标的若干关联索引点包括:将所述查询目标的密文与所述第一属性列的各个一级索引点密文进行对比,得到与所述查询目标相关联的若干一级索引点;基于所述查询目标的密文与所述若干一级索引点对应的二级索引点密文的对比,从各个二级索引点中确定所述查询目标的若干关联索引点。
在一个实施例中,所述查询目标还包括第二属性列上的查询值,所述将相应的编码密文恢复明文行标识,从而从所述乱序数据表中获取相应的各个候选行还包括:根据所述若干关联索引点对应的编码密文恢复的明文行标识得到第一明文行标识集;检测第一明文行标识集和第二明文行标识集的交集,得到若干共有行标识,其中,所述第二明文行标识集包括基于与所述查询目标在所述第二属性列相关联的各个索引点确定的各个明文行标识,所述查询目标在所述第二属性列相关联的各个索引点基于所述查询目标的密文与所述第二属性列对应的多个索引点密文的对比确定;将所述乱序数据表中与各个共有行标识分别对应的各个行确定为候选行。
在一个实施例中,所述明文行标识的编码密文在满足以下中的一项时被更新并用于重新建立索引:预定时刻到达;第三方系统空闲;被恢复的明文行标识条数达到预定条数阈值。
根据第二方面,提供一种基于隐私保护的联合数据查询装置,用于第三方从多个数据方的联合数据表中安全查询目标数据,所述联合数据表为基于多个数据方关于若干个业务主体的联合属性数据安全建立的属性密文数据表,所述装置设于所述第三方,包括:
索引单元,配置为基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点,其中,与第一属性列对应的各个索引点是按照所述第一属性列的属性值排序而建立索引的数据分割点,所述查询目标包括在所述第一属性列上的查询值;
行识别单元,配置为获取与所述若干关联索引点对应的各个行标识各自的编码密文,以及,将相应的编码密文恢复明文行标识,从而从预先针对所述联合数据表乱序得到的乱序数据表中获取相应的各个候选行,其中,各个行的行标识针对所述乱序数据表按行编码确定;
目标确定单元,配置为针对各个候选行各自在所述第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取所述目标行中的目标数据。
根据第三方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。
通过本说明书实施例提供的方法和装置,在对由第三方对多个数据方的数据构成的联合数据表处理过程中,需要基于数据表若干属性列的排序获取相关数据的情况下,在联合数据表按照属性列的属性值排序时,针对联合数据表乱序后的乱序数据表引入行标识,通过行标识建立索引。行标识由第三方确定,并在索引表中以密文形式存在,而候选行的行标识可以被恢复为明文,从而既加快密态查询效率,又能确保联合数据表中的密态数据不泄露。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出可信密态计算的一个具体实施架构示意图;
图2示出根据一个实施例的第三方对联合数据表在密态下建立索引的流程示意图;
图3示出根据一个具体例子的第三方对联合数据表在密态下按照单个属性列建立索引的示意图;
图4示出根据一个实施例的基于隐私保护的数据查询流程示意图;
图5示出根据一个实施例的基于隐私保护的数据查询装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的技术方案进行描述。
首先描述提出本说明书的技术构思所基于的业务场景。
在多个数据方的数据进行联合处理过程中,常常存在联合存储、查询的业务场景。例如,在对用户信息进行分析场景中,数据方1持有用户的身高、体重等各身体状态属性的属性值,数据方2持有用户的工资属性的属性值(具体的工资数额),数据方3持有用户的借贷属性的属性值(具体的借贷数额)。数据方1或数据方3查询工资处于前N位的用户的工资总和,或者,数据方1查询工资处于前N位的用户的借贷情况,或者,查询身高处于前M位的用户的工资总和与体重处于前M位的用户的工资总和的大小等查询需求等 等业务情形。
单个数据方持有的数据具有局限性(不够全面),而各个数据方相互不想泄露本地数据。在各个数据方的数据交互不足以保证数据安全或者设备性能不能满足数据处理时效等情况下,可以借助第三方实现数据的安全联合处理。在由第三方辅助进行数据联合场景的情况下,各数据方可以预先将所持有的数据发送至第三方,第三方基于各个数据方所发送的数据,进行联合排序。当第三方针对某个数据方或其他业务方发送的业务处理请求(如查询请求)需要从联合数据中获取相应数据进行后续业务处理时,可以基于相应的排序结果得到数据查询结果。
本说明书的技术方案基于第三方采用可信密态计算(TrustEd Cryptographic Computing,简称TECC)架构构建数据库的场景提出。可信密态计算是一种安全高效的密态计算方法,能够为多个数据方计算一个共同的结果,而不泄露任何一方的数据。图1示出了可信密态计算的一个实施架构。如图1所示,第三方可以采用多个节点(如节点A、节点B、节点C等)联合计算。多个节点通过多方安全计算(MPC)可以构建一个可信密态计算的框架。也就是说,第三方的各个节点之间相互不泄露数据,以安全的方式完成各种数据处理。例如,建立综合的密文数据库,并基于密文数据库获取数据进行各种业务处理。
在图1中,第三方的节点数量、数据方数量均为示例性描述,实践中可以根据实际情形确定。其中,各个数据方可以处于公网的网络环境,而第三方的各个节点可以处于某个特定的网络环境。图1中示出的高速网可以是能够高速计算、通信的网络。高速计算通常通过设备的中央处理器决定,而高速通信可以通过提高通信能力(如量子通信等)或者减少通信网关数量(如处于同一个局域网内)等方式实现。如此,多个节点经由特定网络与公网隔开,同时又可以与公网中的设备交互。具体地,与公网设备(如图1示出的各个数据方)通过接收或发送数据产生交互,而在数据处理过程中,在“特定网络”内部,由各个节点设备之间经由MPC实现。TECC的实现方可以称为服务方或第三方(以下统称为服务方),用于为各个数据方之间的联合数据处理提供服务。
如图1所示,为了避免任一个节点获知数据方的数据,单个数据方可以在将本地数据提供给TECC架构前,将本地数据拆分为多个分量,例如,数据方1可以将本地数据U拆分为U1、U2、U3的和,U1、U2、U3分别为数据U的3个分量。单个数据方可以将拆分后的多个分量的密文分别发送给各个节点。这样,对于每个来自数据方的数据来说,各个节点均持有其一个分量,只有获得多个节点上的全部分量才可以还原出相应数据。
第三方的各个节点对于接收到的数据进行多方安全计算(MPC),也就是彼此不泄露 本地数据的情况下,得到相应的处理结果。MPC的方式例如可以是秘密分享、同态加密、混淆电路等等。在TECC中,为了进一步确保安全性和计算性能,各个节点可以处在一个可以高速运算及通信的高速网(如局域网)内,整个计算过程与公网的交互极少,从而可信密态计算可以在安全性和计算性能之间进行平衡,达到具有较理想的状态。
在一个可选的实现方式中,还可以在单个数据方和各个节点之间分别建立安全信道来增加数据安全。其中,安全信道的建立方式可以通过约定的协议、密钥、专用通信接口等中的至少一项实现,在此不再赘述。在另一个可选的实现方式中,为了更好地平衡安全性和计算性能,可信密态计算的各个节点均基于可信执行环境TEE(Trusted Execution Environment)构建。TEE是计算平台上由软硬件方法构建的一个安全区域,可保证在安全区域内加载的代码和数据在机密性和完整性方面得到保护。通过TEE技术能够确保参与者的数据只在TEE中存在,TEE的宿主、拥有者等都无法获取数据明文(在TEE不被攻破的情况)。另一方面,每个TEE从始至终都只接触过数据分量,也就说,即便攻击者攻破一个TEE,并窃取或修改它,也不能获得有效信息。
为了不泄露数据隐私,各个数据方发送至第三方的数据可以是密文形式。各个数据方的数据密文可以预先上传至第三方,因此第三方可以预先对密文数据进行排序和建立索引。第三方的各个节点联合建立的联合数据表形如表1所示。这里说联合数据表是因为该数据库表是由多个数据方的数据联合建立的。另外,根据前文的描述,多个数据方向第三方提供的数据为密文形式,因此联合数据库也可以称为密文数据表。
表1联合数据表示意
Figure PCTCN2023070474-appb-000001
其中,构建如表1所示的联合数据表时,多个数据方的数据之间需要对齐。各个数据方可以约定数据对齐规则。一方面,按行对齐,即各行对应的目标对象一致,例如第一行均对应目标对象1(如用户1)、第二行均对应目标对象2(如用户2)……第N行均对应目标对象N。另一方面,进行列队齐,即各个目标对象按照属性项的预设顺序排列数据, 如第一列均对应属性项1、第二列均对应属性项2……第V列均对应属性项V,等等。各个数据方可以按照对齐规则将本地的属性值发送至第三方。实践中,目标对象可以是用户、物品、服务项等等。在一个实施例中,目标对象可以不出现在联合数据表中,也就是说,各个数据方仅约定目标对象的对齐规则,而不发送目标对象的标识(如表1所示的情形)。在另一个实施例中,单个目标对象可以通过预定目标标识(如用户名、商品编号等等)描述,各个数据方可以将目标对象密文统一后发送至第三方,此时,目标对象标识密文可以作为联合数据表中的一个属性列看待。
在第三方为TECC形式的架构的情况下,联合数据表可以以安全形式存储于多个节点。例如,单个节点持有联合数据表的一个分片,当各个节点的联合数据表按照元素对应相加的情况下,可合成一个完整的联合数据表。单个节点持有的联合数据表分片中,单个元素以数据方的随机拆分分量作为分量密文,或者以预定加密方式(如哈希加密等)对随机拆分分量加密得到的密文。
本领域技术人员可以理解,在数据查询过程中,排序和建立索引是重要步骤。第三方对各个数据方所发送的数据进行联合排序时,常规技术可以采用基于类似保序加密方式的排序方式。该排序方式中,第三方获得各数据方上传的属性值密文,以获得待排序的表格,该表格中用户标识和各属性项的属性值是密文的(例如,表格中一行存储一个用户针对多个属性项的属性值密文,一列存储各用户针对一个属性项的属性值密文),但是各行、各列之间的顺序关系是明确的(有利于初始的数据对齐),例如第一行对应用户1、第二行对应用户2等。后续的,第三方按照一个或多个属性项进行排序,从而建立索引。
排序可以按行进行,也可以按列进行。为了描述方面,在本说明书中以按行进行为例进行说明,但不排除按列排序的情形。所谓按行排序,是指一行的数据是固定的,对该行整体在列上的位置进行排列。通常,按行排序的情况下,数据库中每行对应一条业务数据,每列对应各条业务数据中的某个数据项。例如,在前文的例子中,一行对应一个用户,一列对应一个业务属性。按行排序针对用户排序,排序依据可以是一个或多个业务属性,单个用户的排序改变时,整行数据的排序一起改变。在按行排序的情况下,可以根据某一列或多个列的数据建立索引。
常规技术中,建立索引点的方法例如有分桶、快排截断等。其中,分桶方式下可以选定分桶点(即桶的分割点,例如身高150cm、160cm等),并将这些桶的分割点作为索引点。分桶点具体可以按历史经验选,也可以“先求出最大值和最小值,然后将它们之间的范围等分”,基于待排序的列元素(属性密文)与这些分桶点的比较,可以各个元素确认 属于哪一个桶。该方式下需要所有业务主体的标识与分割点比,可能会泄露业务属性的属性值分布,另外,在某个分割区间(对应一个分割步长的范围)业务主体数量较多的情况下,可能导致检索效率较低。快排截断可以采用递归的方式,如果某一个分段少于特定的阈值(比如元素数量少于1000)则停止递归,得到的所有分割点作为索引点。该方式下分割点的具体排名是泄露的,并且如果在多个列都建立索引,会得到每一个属性项在这些列上的大概排名。
根据一个具体场景,例如数据方1向第三方查询月工资为1万的用户的借贷情况,数据方1可以将查询目标的数据标识(如月工资数额U=1万)拆分为多个分量,并将各个分量(如U1、U2、U3)通过安全信道分别发送至各个节点。可选地,数据方1可以通过安全方式向第三方各个节点分别发送U的各个分量的密文,第三方的各个节点分别接收各个分量的密文,并联合利用各个分量在密文数据库中进行查询,从而反馈查询结果。作为示例,前文的业务场景下,在数据方1查询工资处于前N位的用户的借贷情况,或者,查询身高处于前M位的用户的工资总和与体重处于前M位的用户的工资总和的大小等查询需求的情况下,需要服务方对密文数据库中的数据通过多方安全计算进行联合排序,进而基于相应的排序结果得到数据查询结果。各个节点通过多方安全计算向该单个参与者反馈查询结果。
在其他业务场景中,数据方还可以向第三方发送其他业务处理请求,例如,用工资最低的用户近三年的信贷情况预测其违约风险,等等。可以理解,这种需要在第三方继续进行后续业务处理的情况下,第三方可以不反馈相应数据,而继续在本地局域网范围内进行相应数据处理。
在各种数据查询场景中,难免遇到涉及多个业务项(例如前文例子中的多个属性项)的数据查询。例如,数据方1查询工资处于前N位的用户的借贷情况,或者,查询身高处于前M位的用户的工资总和与体重处于前M位的用户的工资总和的大小,等等。第三方在使用上述排序方式对数据建立索引的过程中,可能会导致查询表格中的数据时,各行在排序列上的顺序关系暴露的情况。例如:查询目标属性项包括用户的体重属性项和工资属性项,第三方分别针对表格中体重属性项和工资属性项的属性值密文进行排序后,第三方的单个节点可能会得到形如体重第5名的人同时工资是第1名的隐私信息。这种不是很直观的信息的泄露仍然不利于隐私数据保护。
为此,本说明书对此提供一种基于隐私保护的联合数据查询的技术方案。该技术方案适用于多个数据方的数据被第三方对相应密文数据库安全持有,且由第三方根据业务需求 从中安全获取目标数据的场景。该技术方案基于对多个数据方的联合数据预先排序的排序结果建立索引,避免第三方的单个节点根据排序结果推测隐私数据。本说明书的技术问题基于第三方为TECC架构提出,但实际应用不限于第三方为TECC架构,第三方也可以为单个设备或其他安全形式,本说明书对此不做限定。
下面首先描述第三方对联合数据表排序建立索引的过程。
图2示出根据一个实施例的由第三方执行的建立索引表的流程。值得说明的是,在第三方为一个设备的情况下,该流程为该第三方的单个设备完成,在第三方为类似于TECC的集群形式时,该流程可以由相应集群以多方安全计算方式完成。
如图2所示,该由第三方执行的建立索引表的流程可以包括:步骤201,对联合数据表进行乱序,得到乱序数据表;步骤202,为乱序数据表按行进行明文编码,使得每行数据对应一个明文的行标识;步骤203,用第一属性列与明文编码的行标识的编码密文构建第一索引表;步骤204,对第一索引表按照第一属性列排序,从而得到第一属性列对应的各个索引点。
首先,在步骤201中,对联合数据表进行乱序,得到乱序数据表。其中,联合数据表由第三方根据多个数据方的数据联合建立,在此不再赘述。
对联合数据表进行乱序,即打乱表格中行的前后顺序关系。在对密文数据表构建索引过程中,乱序和排序通常是配合使用的。在排序之前进行一次乱序,这样,可以使得排序时,数据表中前后行的位置不能追踪,避免了数据表中各行在排序列上的顺序关系(例如处于第一属性项排序列第a位的目标对象,即为处于第二属性项排序列第b位的目标对象)的暴露。
乱序可以是预定方式进行的,也可以是随机的,即一行业务数据在乱序后排在第几行是随机的。如图3所示,在乱序表格中的每一行,目标对象及其对应的各属性项的属性值密文之间的对应关系未改变,而行的排列顺序发生改变。特别地,在TECC架构下,乱序可以以多方安全计算方式进行。例如对某一行调到哪一行进行安全同步。
接着,通过步骤202,为乱序数据表按行进行明文编码,使得每行数据对应一个明文的行标识。其中,行标识可以是按照预定方式生成的,用于识别相应行的标识。其可以由数字、字母、符号等中的至少一个组成,例如图3中各行依次编码为0、1、2、3……。通过单个行标识,能定位到其所标识的一行数据。在后续处理过程中,单个行标识和其对应的行之间的对应关系不变,不管数据表如何乱序,仍能根据单个行标识找到其初始标识的一行数据。图3中用数据表301表示添加明文行标识的乱序数据表。值得说明的是,在本 说明书的技术构思下,表301是构建各列索引的依据,在根据当前的行标识构建的索引的有效使用过程中,表301可以是一直存在的查询依据。实际场景中可以将表301作为对各属性列建立索引的基础表格,如称为建立索引的母表。
然后,根据步骤203,用第一属性列与各个行标识的密文构建第一索引表。由于明文编码的行标识与行数据紧密结合,因此,直接使用明文的行标识进行索引,可能泄露数据隐私。为此,在后续索引过程中,可以使用行标识的密文建立相关表格。行标识的密文可以是使用预定加密方式对行标识进行加密得到,也可以是通过随机分量形式拆分的分量分别存储在各个节点进行加密。这样,第三方的任意设备不能直接单独从本地持有的数据反推明文行标识。
在本说明书的实施例中,在查询数据是哪一行的情况下,通常是按照列建立相关索引的,例如,查询工资排名前100的用户,则按照“工资”对应的属性项的列进行排序查询。因此,可以将可能被作为索引的列(如记为第一属性列)抽取出来,与行标识的密文一起构建相应的索引表,如这里称为第一索引表。
参考图3所示,假设标识列为0、1、2、3……,相应的密文记为E0、E1、E2、E3……,建立索引的属性列为C1属性列,则可以将C1列的各个属性值密文和行标识的密文列提取出来,形成第一索引表。这样,排序所针对的表格体量大大减小。
如果还需要对第二列,如C2属性列建立索引,则可以将C2列的各个属性值密文和行标识的密文列提取出来,形成第二索引表。以此类推,针对各个属性列都可以建立相应的索引表。
进一步地,在步骤204中,对第一索引表按照第一属性列排序,从而得到第一属性列对应的各个索引点,为联合数据表按照第一属性列建立索引。
这里的排序,通常按照属性值大小进行。例如在C1列为工资的情况下,按照工资由高到低或由低到高的顺序进行排名,在C1列为体重数据的情况下,按照体重高低排序。这里的排序方式可以是分桶、快排截断等,在此不再赘述。排序后,所得到的目标排序表格中,各行以目标属性项对应的属性值从大到小(或者从小到大)的顺序排序。例如:目标排序表格中,目标属性项对应的最大的属性值密文所在行,位于第1行;目标属性项对应的次大的属性值密文所在行,位于第2行,以此类推,目标属性项对应的最小的属性值密文所在行,位于第N行。其中,对属性值排序过程中,属性值是密文的情况下,按照加密规则安全比较各个属性值的大小,在此不再赘述。为了进行排序,可以由第三方获取属性值密文的大小比较结果,如通过安全比较,在第三行的属性值比第二行的属性值大的情 况下,可以由各个节点均将第三行与第二行调换顺序。
索引点可以理解为为了方便查询而确定的数据分割点。两个相邻索引点可以限定一个属性值范围。例如,对用户的身高建立索引,可以每10厘米确定一个索引点,则在查询给定身高(如161厘米至168厘米)的用户数据时,可以将给定身高值(如165)与各个索引点比较,或者给定身高范围的端点与各个索引点比较,从而快速获得较小范围的候选数据。
索引点通常与属性列对应的属性取值相关,其可以预先给定,也可以根据目标对象的数量划分。例如,在工资属性列,给定索引点的情况下,给定的索引点例如为2000元、5000元、1万元、3万元等等,分割出的属性值范围如为0至2000元(不含)、2000元(含)至5000元(不含)、5000元(含)至1万元(不含)、1万元(含)至3万元(不含)、3万元(含)以上。则可以将工资属性列的属性值与给定的索引点进行比较,从而确定相应行属于被哪个或哪些索引点分割出的范围。这种情况下,每个索引点对应的数据条数可以不相等。
再例如,按照目标对象的数量划分,按照每1000人建立一个索引点,则10万个人可建立100个索引点,各个索引点如基于排序结果的第1000个人对应的工资额、第2000个人对应的工资额、第3000个人对应的工资额……等确定。每个索引点可以分割出1000个人。索引点可以是相应排名对应的工资额,也可以是基于该工资额确定的可以和下一个工资额区分开来的其他数值。举例而言,可以将第1000个人对应的工资额作为一个索引点,分割出0至该工资额(含)的1000条数据,也可以将第1001个人的工资额作为一个索引点,分割出0至该工资额(不含)的1000条数据,还可以取第1000个人对应的工资额与第1001个人的工资额之间的一个数值(如均值)作为一个索引点……
值得说明的是,各个索引点实际上将数据分为多个数据范围,例如两个相邻索引点之间的若干条数据是一个数据范围(如工资2000元至5000元的范围)。其中,根据一个实施例,单个属性列的索引点可以不含端点,而只包含将数据分割开的中间分割点,这样,索引点的数量比其将数据分隔开的属性值范围数量少1。根据另一个实施例,单个属性列的索引点也可以包含一个端点,这样,索引点的数量与将数据分隔开的范围数量一致。此时,可选地,每个索引点可以对应一个与比其小或比其大的相邻索引点构成的属性值范围。根据再一个实施例,单个属性列的索引点也可以包含两个端点,这样,索引点的数量比其将数据分隔开的属性值范围数量多1。可选地,可以通过两两相邻索引点来描述属性值范围。
根据本说明书的一个可能的设计,还可以按照各个索引点对相应的若干条数据进行分割,得到各个检索子表。分割方法如前段的描述。图3中示出了单个索引点对应着单个索引子表的示例。在一个可选的实施例中,单个索引子表中的数据可以按照相应索引的属性列中属性值的大小进行排列。如图3中按照属性列C1的第一索引点对应着第一子表,其中包含若干条通过大小顺序排列的行标识密文。在另一个可选的实施例中,第一子表中的数据可以是按照索引点进行分割后再次乱序的数据。
根据一个可能的设计,将前文描述的各个子表看作一级索引,相应索引点看作一级索引点,则还可以针对各个子表各自构建二级索引。如图3所示,二级索引可以通过二级索引点将数据分割开。二级索引点的确定方式与一级索引点类似,并将以及索引下的子表以更细粒度分割。可以理解的是,一级索引的各个索引子表建立在对目标属性列排序的基础上,因此,可以直接按照排序结果建立二级索引。例如,一级索引的一个子表对应1000个用户,二级索引的一个索引点对应200个用户,则可以将一级索引点对应的1000个属性值密文,按顺序每200个属性值密文建立一个索引点。
通常,二级索引可以在一级索引数据量较大的情况下,进一步缩小数据范围,从而加速数据查询过程。在可选的实施例中,还可以建立三级及以上级数的索引,以进一步加快数据查询速度。
图2、图3以不同形式示出了基于单个属性列构建索引的过程,在实践中,还可以针对多个属性列按照相似的方式构建索引。例如,除了工资属性列,还可以对年龄属性列、身高属性列等中的至少一列,各自重复步骤203和步骤204,确定相应的索引点,构建相应的检索表。其中,图3中,多个属性列对应的索引表均可以依据数据表301进行构建。
基于以上索引构建过程,第三方可以预先对一个属性列或多个属性列建立索引。在具体业务处理过程中,可能产生这样或那样需要基于排序的数据的业务场景。例如需要当前业务处理过程需要工资属性项的处于前10位的工资总和,或者体重排名前10的用户的年收入等等。这些业务场景可能是由单个数据方或者被允许使用联合数据的其他业务方发起的查询请求,也可能是第三方在处理其他业务时需要获取相应业务数据。在进行数据查询时,第三方可以根据查询需求,按照相应索引进行查询。
图4示出了根据一个实施例的基于隐私保护的联合数据查询流程。该流程可以基于图2、图3等方式建立的索引,从联合数据表中查询数据。具体地,第三方可以执行以下步骤:步骤401,基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点,其中,与第一属性列对应的各个索引点是按照第一属性列 的属性值排序而建立索引的数据分割点,查询目标包括在第一属性列上的查询值;步骤402,获取与若干关联索引点对应的各个行标识各自的编码密文,各个行的行标识预先针对联合数据表乱序得到的乱序数据表按行编码确定;步骤403,将相应的编码密文恢复明文行标识,从而从乱序数据表中确定候选行;步骤404,针对各个候选行各自在第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取目标行中的目标数据。
在步骤401中,基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点。查询目标可以对应一个精准数值,也可以对应一个查询范围。第一属性列可以是联合数据表中的任一个属性列,这里是查询的目标列。例如,查询体重为200斤的用户的工资,则查询目标对应一个精准值200斤,对应第一属性列为体重属性列。再例如,筛选工作年限为10至20年的员工,查询目标对应的是范围10-20年,第一属性列为工龄属性列;筛选年龄为35至45岁的员工,查询目标对应的是范围35至45岁,第一属性列为年龄属性列。这里,查询目标是密文形式,查询目标的密文可以基于数据方或其他业务方的查询请求确定,也可以基于第三方当前处理的业务的中间结果确定。
其中,查询目标的密文基于数据方或其他业务方(也可以统称查询方)的查询请求确定的情况下,查询方可以将查询目标(如要查询的属性列的属性值)按照第三方架构的需求转化为相应的密文发送至第三方。例如TECC架构下,查询方可以将要查询目标对应的值随机拆分为多个分量,每个分量发送至第三方的一个节点,从而要查询的属性列的值对于第三方的各个节点而言,是保密的(每个节点仅获取一个分量)。这些分量可以称为分量密文。比如,数据方1要查询体重为200斤的用户的工资,可以将体重属性200斤的属性值密文发送至第三方。
在查询目标的密文基于第三方当前处理的业务的中间结果确定的情况下,第三方在处理基于数据方或其他业务方请求的其他业务时,可以得到相应的中间结果作为查询目标。可以理解的是,为了保护数据隐私,第三方在进行数据处理过程中可以采用密文形式。则作为中间结果的查询目标也是密文形式。此时,相应的中间结果就是查询目标的密文。举例而言,第三方当前所处理的业务是统计三高患者的收入状况,则中间结果可以是三高患者的初始发病年龄集中在40至50岁,接下来要通过查询联合数据表统计年龄在40至50岁的人群收入状况,等等。
同样,针对查询目标的查询结果也可以直接作为中间结果进行后续业务处理。例如, 对近五年退休的人员中养老金花费排名前5的花费项目进行统计。则第三方首先可能先要根据各个用户的年龄值,查询出近五年退休的人员的行数据作为中间结果,然后将这些行中养老金花费在各个支出项目属性项上的支出金额之和进行统计。
由于索引点是基于排序的,且各个索引点可以分割出相应属性列的各个属性值范围,因此通过查询目标的密文与索引点的密文的比对,可以确定查询目标对应的第一属性列中,与查询目标相关联的索引点,作为关联索引点。其中,在第三方采用分量密文的情况下,查询目标和索引点均采用分量形式存储在第三方,第三方可以通过安全比较等方式比较两个密文的大小。查询目标和索引点的密文比较结果可以以明文方式确定,以根据业务需求确定查询目标落入的属性值范围。查询目标落入的属性值范围可以由关联索引点分割出来。
可以理解的是,关联索引点的确定方式与索引点对属性值的分割方式、查询目标的形式相关。例如,第一属性列对应的单个索引点作为中间分隔点将第一属性列的属性值分隔为多个属性值范围的情况下,可以将查询值的密文依次与第一属性列对应的各个索引点进行大小比较,然后根据大小比较结果,将查询值所在的属性值范围对应的索引点确定为查询目标的关联索引点。例如,在查询值大于第一索引点而小于第二索引点的情况下,可以将相邻的第一索引点和第二索引点确定为关联索引点。如果比较结果为查询值小于最小索引点,可以将最小索引点确定为查询目标的关联索引点,或者比较结果为查询值大于最大索引点,还可以将最大索引点确定为查询目标的关联索引点。在其他情况下,还可以通过其他方式确定关联索引点,在此不再穷举。
在一个可选的实现方式中,在第三方预先建立了多级索引的情况下,可以先将查询目标的密文与一级索引的索引点密文对比,确定与查询目标相关联的一级索引,如得到相应的一级索引位置或者索引子表,然后将查询目标的密文与该相关联的一级索引对应的二级索引的索引点密文对比,得到与查询目标相关联的二级索引。以此类推,直至对比到末级索引(如二级索引)的索引点密文,将末级索引点确定为查询目标的关联索引点。这样可以一步一步缩小索引的候选行范围,减少后续数据处理量。
然后,在步骤402中,获取与若干关联索引点对应的各个行标识各自的编码密文。可以理解,关联索引点可以分割出包含查询查询目标在内的多个属性值对应的行。在索引构建过程中,各个属性值行还对应有针对乱序数据表按行编码得到的行标识,该行标识在索引表中以编码密文形式出现。因此,可以获取各个行分别对应的编码密文。
接着,通过步骤403,将相应的编码密文恢复明文行标识,从而从乱序数据表中确定候选行。可以理解,为了得到相关行密文数据表中相关行中的数据,可以由第三方将编码 密文恢复成明文标识。例如在TECC架构下,通过随机分量方式存储在各个节点的编码密文,可以通过各个节点相互公开本地分量而得到行标识的明文数据。
在第三方建立索引过程中,提取索引表之前有一个添加行标识的乱序数据表,如图3中的表301,记录有明文行标识与乱序数据表中的各行数据的关系,因此,通过查询目标对应的明文行标识,可以获取相应行即候选行的数据。
在第三方预先建立了多列索引且当前查询为多列联合条件的查询(如查询年龄在30至45岁且体重高于150斤的用户的健康状况)的情况下,可以在第一属性列的索引中确定第一索引范围(如年龄在30至45岁)对应的关联索引点(如40岁、50岁),以及相应的行(如年龄属性值在30至40岁及40岁至50岁的行),并根据第一索引范围相应各行的编码密文恢复的明文行标识得到第一明文行标识集,在第二属性列的索引中确定第二索引范围(如体重在150斤以上)对应的关联索引点(如150斤),以及相应的候选行(如体重属性值在150斤以上的行),并根据第二索引范围相应各行的编码密文恢复的明文行标识得到第二明文行标识集。然后检测第一明文行标识集和第二明文行标识集的交集,得到若干共有行标识的明文数据。进一步地可以将乱序数据表(如图3中的表301)中与各个共有行标识对应的行确定为候选行。
在步骤404,针对各个候选行各自在第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取目标行中的目标数据。可以理解,候选行基于查询目标密文与相对较少数量的索引点的比较,得到的候选行可以大大缩小目标行的查找范围。然而,实践中,还需要将查询目标的密文与候选行在第一属性列的属性值的密文进一步进行比较,以筛选出目标行。
查询目标的密文与候选行在第一属性列的属性值的密文的比较也可以包括值对值的比较、值对范围、范围对范围的比较中的至少一项。其中,值对值的比较可以是两个值是否相同的比较,值对范围的比较可以是值和范围的两端点的比较,范围对范围的比较可以是两范围的端点交叉比较,在此不再赘述。
作为示例,假设查询目标为工资在5000元至6000元的用户数据,则候选行可以是基于工资属性列的索引点确定的若干行,如工资为5000元至1万元的行。将单个候选行的工资属性值与查询目标的范围5000元至6000元的密文进行比较,则可以得到工资属性值落入5000元至6000元的候选行作为目标行,并获取目标行的密文数据。在一些实施例中,满足查询条件的目标行数量还可能为0。其中,在TECC架构下,密文数值的比较可以采用安全比较,在此不做赘述。
第三方可以获取目标行的密文数据后,该密文数据可以作为查询结果反馈至查询方,也可以作为当前业务的中间结果进行后续处理。例如,将工资在5000至6000的用户作为中间结果,对其年龄进行平均或确定数据图表,等等。
作为一个示例,假设当前业务处理过程需要年龄35至45岁的员工的处于前10位的工资均值,相应的,使用的排序数据为按年龄属性列排序的数据。假设按照年龄属性列的排序数据中,单个索引点对应10岁的年龄段。第三方可以通索引点密文与35的对比,确定大于35的第一个索引点为40,大于45的第一个索引点为50,则可以获取40索引点(对应31至40岁数据)以及50索引点(对应41至50岁数据)两个索引块的行标识的编码密文。这里,如果第三方还对年龄属性列建立有二级索引,第三方还可以进一步根据二级索引缩小范围,在此不再赘述。之后,第三方可以将相应的编码密文转换成明文,并确定明文的行标识,从而从乱序数据表中获取相应行作为候选行,如有1万条数据。然后,第三方可以将1万条候选行数据中年龄属性列的属性值,与35、45两个范围端点的数值密文进行比较,对于大于35、小于或等于45的年龄区间的候选行,确定为目标行。之后,可以将各个目标行在工资属性列的属性值密文(中间结果)以安全的密态方式求均值(后续业务处理)。
以上建立及使用索引的方法在使用的时候,会恢复某一列索引下的若干编码密文为明文。如果恢复的所有MID都来自一个列的索引下面,则不会相关信息泄露;如果使用了多列索引,且不同的列索引下面的行标识编码有相同的,就会得到相应行标识在多个列上的大概排名,比如体重排名在3000-4000的人当中有一个人的工资排名是5000-6000。这样的单个信息看起来不会有很大的问题,但是为了如果攻击者通过对大量这种信息进行分析,则可能获得更多的内容。
在TECC架构的数据查询场景下,涉及明文的恢复、范围的确定等,单个节点也可能获取一些信息。当查询次数足够多的情况下,单个节点获得的信息足够多,则单个节点被攻击者攻破的情况下,也存在数据泄露风险。
为此,在可能的设计中,可以控制恢复成明文行标识的数量。在恢复成明文的行标识的数量足够少的情况下,将所有的索引重建。例如只使用一个列相关的索引时,不需要索引重建;一旦使用超过一个列的索引时,就基于预定条件进行索引重建。在实践中可选的实现方式下,可以记录曾经恢复成明文的行标识的数量,达到预定条数阈值后就对索引进行重建。在其他可选的实现方式下,由于索引重建的过程可以异步由第三方单独完成,不需要在查询请求到来时完成,因此第三方也可以定时(如间隔1天)或在系统空闲时进行 索引重建。另外,第三方还可以在本地系统空闲时重建,在此不做限定。
回顾以上过程,本说明书实施例提供的方法,在对由第三方对多个数据方的数据构成的联合数据表处理过程中,需要基于数据表若干属性列的排序获取相关数据的情况下,在联合数据表按照属性列的属性值排序时,针对联合数据表乱序后的乱序数据表引入行标识,通过行标识建立索引。行标识由第三方确定,并在索引表中以密文形式存在,而候选行的行标识可以被恢复为明文,从而既加快密态查询效率,又能确保联合数据表中的密态数据不泄露。
特别地,在第三方为可信密态计算的情况下,即使一个节点被攻击,也能确保联合数据表中的数据安全。此时,在数据查询涉及多个属性列的情况下,为了避免恢复成明文的行标识泄露不同属性列之间的对应关系,还可以按照预定更新规则更新行标识,以及用更新后的行标识建立新的索引,从而在第三方加强数据的隐私保护。
根据另一方面的实施例,还提供一种基于隐私保护的数据查询装置。其中,该装置可以设于处理多个数据方的联合数据业务的第三方。特别地,在第三方为TECC架构的情况下,TECC架构中的各个节点分别设置有基于隐私保护的数据查询装置,这些装置通过多方安全计算,相互配合完成联合数据表中的数据查询。
如图5所示,基于隐私保护的数据查询装置500包括:索引单元501,配置为基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点,其中,与第一属性列对应的各个索引点是按照第一属性列的属性值排序而建立索引的数据分割点,查询目标包括在第一属性列上的查询值;行识别单元502,配置为获取与若干关联索引点对应的各个行标识各自的编码密文,以及,将相应的编码密文恢复明文行标识,从而从预先针对联合数据表乱序得到的乱序数据表中获取相应的各个候选行,其中,各个行的行标识针对乱序数据表按行编码确定;目标确定单元503,配置为针对各个候选行各自在第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取目标行中的目标数据。
在一个实施例中,第三方具有可信密态计算架构,可信密态计算架构包括多个节点,各个节点均设有装置,各个装置之间通过多方安全计算处理联合数据表相关的业务。
在一个进一步的实施例中,单个装置针对联合数据表中的单个元素存储相应的单个分量密文,查询目标经由查询方拆分为各个分量,并由各个节点中的装置各自持有单个分量的分量密文。
根据一个可能的设计,装置500还包括索引构建单元(未示出),配置为针对第一属 性列按照以下方式确定各个索引点:
从乱序数据表中提取第一属性列的各个属性值密文,与相应行标识的编码密文构成第一索引表,单个行标识的编码密文基于对该单个行标识按照预定方式加密得到;
对第一索引表乱序后按照第一属性列的属性值大小进行排序;
基于排序结果,按照预定属性值分割方式确定各个索引点。
在一个实施例中,索引构建单元还配置为,在满足以下中的一项时更新明文行标识的编码密文并重新建立索引:预定时刻到达;第三方系统空闲;被恢复的明文行标识条数达到预定条数阈值。
在一个可选的实现方式中,第一属性列对应的多个索引点为第一属性列的各个一级索引点,单个一级索引点对应多个二级索引点,各个二级索引点将该单个一级索引点对应的行数据分割为多个二级索引属性值范围;索引单元501进一步配置为:
将查询目标的密文与第一属性列的各个一级索引点密文进行对比,得到与查询目标相关联的若干一级索引点;
基于查询目标的密文与若干一级索引点对应的二级索引点密文的对比,从各个二级索引点中确定查询目标的若干关联索引点。
根据一个实施例,查询目标还包括第二属性列上的查询值,行识别单元502进一步配置为:
根据若干关联索引点对应的编码密文恢复的明文行标识得到第一明文行标识集;
检测第一明文行标识集和第二明文行标识集的交集,得到若干共有行标识,其中,第二明文行标识集包括基于与查询目标在第二属性列相关联的各个索引点确定的各个明文行标识,查询目标在第二属性列相关联的各个索引点基于查询目标的密文与第二属性列对应的多个索引点密文的对比确定;
将乱序数据表中与各个共有行标识分别对应的各个行确定为候选行。
值得说明的是,图5所示的装置500与图4描述的方法相对应,图4的方法实施例中的相应描述同样适用于装置500,在此不再赘述。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2、图4等所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2、图4等所描述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所描述的具体实施方式,对本说明书的技术构思的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上描述仅为本说明书的技术构思的具体实施方式而已,并不用于限定本说明书的技术构思的保护范围,凡在本说明书实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书的技术构思的保护范围之内。

Claims (18)

  1. 一种基于隐私保护的联合数据查询方法,用于第三方从多个数据方的联合数据表中安全查询目标数据,所述联合数据表为基于多个数据方关于若干个业务主体的联合属性数据安全建立的属性密文数据表,所述方法由所述第三方执行,包括:
    基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点,其中,与第一属性列对应的各个索引点是按照所述第一属性列的属性值排序而建立索引的数据分割点,所述查询目标包括在所述第一属性列上的查询值;
    获取与所述若干关联索引点对应的各个行标识各自的编码密文,各个行的行标识预先针对所述联合数据表乱序得到的乱序数据表按行编码确定;
    将相应的编码密文恢复明文行标识,从而从所述乱序数据表中确定候选行;
    针对各个候选行各自在所述第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取所述目标行中的目标数据。
  2. 根据权利要求1所述的方法,其中,所述第三方具有可信密态计算架构,所述可信密态计算架构包括多个节点,各个节点之间通过多方安全计算处理所述联合数据表相关的业务。
  3. 根据权利要求2所述的方法,其中,所述联合数据表在所述第三方以分量密文的形式存储,单个节点针对所述联合数据表中的单个元素存储相应的单个分量密文,所述查询目标经由查询方拆分为各个分量,并由各个节点各自持有单个分量的分量密文。
  4. 根据权利要求2或3所述的方法,其中,所述第三方中的单个节点在可信执行环境中实现。
  5. 根据权利要求1所述的方法,其中,所述基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点包括:
    将所述查询值的密文依次与所述第一属性列对应的各个索引点进行大小比较;
    在比较结果为所述查询值与相邻两个索引点的大小相反的情况下,将该相邻两个索引点确定为所述查询目标的关联索引点。
  6. 根据权利要求1所述的方法,其中,所述基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点包括:
    将所述查询值的密文依次与所述第一属性列对应的各个索引点进行大小比较;
    在比较结果为所述查询值小于全部索引点/全部最大索引点的情况下,将最小索引点/最大索引点确定为所述查询目标的关联索引点。
  7. 根据权利要求1所述的方法,其中,针对所述第一属性列,按照以下方式确定各个索引点:
    从所述乱序数据表中提取所述第一属性列的各个属性值密文,与相应行标识的编码密文构成第一索引表,单个行标识的编码密文基于对该单个行标识按照预定方式加密得到;
    对所述第一索引表乱序后按照所述第一属性列的属性值大小进行排序;
    基于排序结果,按照预定属性值分割条件确定各个索引点。
  8. 根据权利要求1所述的方法,其中,所述第一属性列对应的多个索引点为所述第一属性列的各个一级索引点,单个一级索引点对应多个二级索引点,各个二级索引点将该单个一级索引点对应的行数据分割为多个二级索引属性值范围;所述基于查询目标的密文与第一属性列对应的各个索引点密文的对比,得到所述查询目标的若干关联索引点包括:
    将所述查询目标的密文与所述第一属性列的各个一级索引点密文进行对比,得到与所述查询目标相关联的若干一级索引点;
    基于所述查询目标的密文与所述若干一级索引点对应的二级索引点密文的对比,从各个二级索引点中确定所述查询目标的若干关联索引点。
  9. 根据权利要求1所述的方法,其中,所述查询目标还包括第二属性列上的查询值,所述将相应的编码密文恢复明文行标识,从而从所述乱序数据表中获取相应的各个候选行还包括:
    根据所述若干关联索引点对应的编码密文恢复的明文行标识得到第一明文行标识集;
    检测第一明文行标识集和第二明文行标识集的交集,得到若干共有行标识,其中,所述第二明文行标识集包括基于与所述查询目标在所述第二属性列相关联的各个索引点确定的各个明文行标识,所述查询目标在所述第二属性列相关联的各个索引点基于所述查询目标的密文与所述第二属性列对应的多个索引点密文的对比确定;
    将所述乱序数据表中与各个共有行标识分别对应的各个行确定为候选行。
  10. 根据权利要求1所述的方法,其中,所述明文行标识的编码密文在满足以下中的一项时被更新并用于重新建立索引:
    预定时刻到达;
    第三方系统空闲;
    被恢复的明文行标识条数达到预定条数阈值。
  11. 一种基于隐私保护的联合数据查询装置,用于第三方从多个数据方的联合数据表中安全查询目标数据,所述联合数据表为基于多个数据方关于若干个业务主体的联合属性 数据安全建立的属性密文数据表,所述装置设于所述第三方,包括:
    索引单元,配置为基于查询目标的密文与第一属性列对应的多个索引点密文的对比,得到所述查询目标的若干关联索引点,其中,与第一属性列对应的各个索引点是按照所述第一属性列的属性值排序而建立索引的数据分割点,所述查询目标包括在所述第一属性列上的查询值;
    行识别单元,配置为获取与所述若干关联索引点对应的各个行标识各自的编码密文,以及,将相应的编码密文恢复明文行标识,从而从预先针对所述联合数据表乱序得到的乱序数据表中获取相应的各个候选行,其中,各个行的行标识针对所述乱序数据表按行编码确定;
    目标确定单元,配置为针对各个候选行各自在所述第一属性列的属性值的密文与查询目标的密文进行比较,从而从各个候选行中确定最终的目标行,以获取所述目标行中的目标数据。
  12. 根据权利要求11所述的装置,其中,所述第三方具有可信密态计算架构,所述可信密态计算架构包括多个节点,各个节点均设有所述装置,各个所述装置之间通过多方安全计算处理所述联合数据表相关的业务。
  13. 根据权利要求12所述的装置,其中,单个装置针对所述联合数据表中的单个元素存储相应的单个分量密文,所述查询目标经由查询方拆分为各个分量,并由各个节点中的所述装置各自持有单个分量的分量密文。
  14. 根据权利要求11所述的装置,其中,所述装置还包括索引构建单元,配置为针对所述第一属性列按照以下方式确定各个索引点:
    从所述乱序数据表中提取所述第一属性列的各个属性值密文,与相应行标识的编码密文构成第一索引表,单个行标识的编码密文基于对该单个行标识按照预定方式加密得到;
    对所述第一索引表乱序后按照所述第一属性列的属性值大小进行排序;
    基于排序结果,按照预定属性值分割方式确定各个索引点。
  15. 根据权利要求11所述的装置,其中,所述第一属性列对应的多个索引点为所述第一属性列的各个一级索引点,单个一级索引点对应多个二级索引点,各个二级索引点将该单个一级索引点对应的行数据分割为多个二级索引属性值范围;所述索引单元进一步配置为:
    将所述查询目标的密文与所述第一属性列的各个一级索引点密文进行对比,得到与所述查询目标相关联的若干一级索引点;
    基于所述查询目标的密文与所述若干一级索引点对应的二级索引点密文的对比,从各个二级索引点中确定所述查询目标的若干关联索引点。
  16. 根据权利要求11所述的装置,其中,所述查询目标还包括第二属性列上的查询值,所述行识别单元进一步配置为:
    根据所述若干关联索引点对应的编码密文恢复的明文行标识得到第一明文行标识集;
    检测第一明文行标识集和第二明文行标识集的交集,得到若干共有行标识,其中,所述第二明文行标识集包括基于与所述查询目标在所述第二属性列相关联的各个索引点确定的各个明文行标识,所述查询目标在所述第二属性列相关联的各个索引点基于所述查询目标的密文与所述第二属性列对应的多个索引点密文的对比确定;
    将所述乱序数据表中与各个共有行标识分别对应的各个行确定为候选行。
  17. 根据权利要求14所述的装置,其中,所述索引构建单元还配置为,在满足以下中的一项时更新所述明文行标识的编码密文并重新建立索引:
    预定时刻到达;
    第三方系统空闲;
    被恢复的明文行标识条数达到预定条数阈值。
  18. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-10中任一项所述的方法。
PCT/CN2023/070474 2022-01-20 2023-01-04 基于隐私保护的联合数据查询方法及装置 WO2023138379A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210068068.8 2022-01-20
CN202210068068.8A CN114090638B (zh) 2022-01-20 2022-01-20 基于隐私保护的联合数据查询方法及装置

Publications (1)

Publication Number Publication Date
WO2023138379A1 true WO2023138379A1 (zh) 2023-07-27

Family

ID=80308954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/070474 WO2023138379A1 (zh) 2022-01-20 2023-01-04 基于隐私保护的联合数据查询方法及装置

Country Status (2)

Country Link
CN (1) CN114090638B (zh)
WO (1) WO2023138379A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886447A (zh) * 2023-09-07 2023-10-13 中国电子科技集团公司第十五研究所 一种精简编解码的加密传输方法及装置
CN117077209A (zh) * 2023-10-16 2023-11-17 云阵(杭州)互联网技术有限公司 大规模数据匿踪查询方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114090638B (zh) * 2022-01-20 2022-04-22 支付宝(杭州)信息技术有限公司 基于隐私保护的联合数据查询方法及装置
CN114726511B (zh) * 2022-03-08 2024-03-22 支付宝(杭州)信息技术有限公司 数据处理方法和装置
CN114726514B (zh) * 2022-03-21 2024-03-22 支付宝(杭州)信息技术有限公司 数据的处理方法和装置
CN115168409B (zh) * 2022-09-05 2023-02-28 金蝶软件(中国)有限公司 数据库分表的数据查询方法、装置和计算机设备
CN115239486A (zh) * 2022-09-20 2022-10-25 华控清交信息科技(北京)有限公司 一种联合数据统计方法、装置、系统和可读存储介质
CN115587233B (zh) * 2022-10-11 2023-06-23 华能信息技术有限公司 一种数据标识及目录管理方法及系统
CN115688167B (zh) * 2022-10-13 2023-09-26 北京沃东天骏信息技术有限公司 匿踪查询方法、装置和系统及存储介质
CN115587382B (zh) * 2022-12-14 2023-04-11 富算科技(上海)有限公司 全密态数据处理方法、装置、设备、介质
CN116166693B (zh) * 2023-04-21 2023-07-25 支付宝(杭州)信息技术有限公司 一种基于密态范围索引的数据查询方法、装置以及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160344707A1 (en) * 2015-05-21 2016-11-24 Nili Philipp Encrypted query-based access to data
US20190220620A1 (en) * 2018-01-18 2019-07-18 Sap Se Secure Substring Search to Filter Encrypted Data
CN111597548A (zh) * 2020-07-17 2020-08-28 支付宝(杭州)信息技术有限公司 实现隐私保护的数据处理方法及装置
CN111737751A (zh) * 2020-07-17 2020-10-02 支付宝(杭州)信息技术有限公司 实现隐私保护的分布式数据处理的方法及装置
CN113868295A (zh) * 2021-09-18 2021-12-31 支付宝(杭州)信息技术有限公司 数据查询方法、装置及多方安全数据库
CN114090638A (zh) * 2022-01-20 2022-02-25 支付宝(杭州)信息技术有限公司 基于隐私保护的联合数据查询方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7979711B2 (en) * 2007-08-08 2011-07-12 International Business Machines Corporation System and method for privacy preserving query verification
CN103049473A (zh) * 2012-10-15 2013-04-17 新浪技术(中国)有限公司 一种数据查询方法及装置
CN103345526B (zh) * 2013-07-22 2016-12-28 武汉大学 一种云环境下高效的隐私保护密文查询方法
CN103593476B (zh) * 2013-11-28 2017-01-25 中国科学院信息工程研究所 一种面向云存储的多关键词明密文检索方法和系统
CN106850187B (zh) * 2017-01-13 2018-02-06 温州大学瓯江学院 一种隐私字符信息加密查询方法及系统
CN111914264A (zh) * 2019-05-08 2020-11-10 华控清交信息科技(北京)有限公司 索引创建方法及装置、数据验证方法及装置
US11250151B2 (en) * 2020-05-05 2022-02-15 Google Llc Encrypted search over encrypted data with reduced volume leakage
CN111935141B (zh) * 2020-08-10 2022-03-22 合肥工业大学 一种针对密态数据的单次不经意抗链接的查询系统与方法
CN112860738B (zh) * 2021-04-23 2021-08-06 支付宝(杭州)信息技术有限公司 针对多方安全数据库的查询优化方法、装置和系统
CN113672949A (zh) * 2021-07-27 2021-11-19 美库尔商务信息咨询(上海)有限公司 用于广告多方隐私保护的数据传输方法及系统
CN113886887A (zh) * 2021-10-25 2022-01-04 支付宝(杭州)信息技术有限公司 基于多方安全计算的数据查询方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160344707A1 (en) * 2015-05-21 2016-11-24 Nili Philipp Encrypted query-based access to data
US20190220620A1 (en) * 2018-01-18 2019-07-18 Sap Se Secure Substring Search to Filter Encrypted Data
CN111597548A (zh) * 2020-07-17 2020-08-28 支付宝(杭州)信息技术有限公司 实现隐私保护的数据处理方法及装置
CN111737751A (zh) * 2020-07-17 2020-10-02 支付宝(杭州)信息技术有限公司 实现隐私保护的分布式数据处理的方法及装置
CN113868295A (zh) * 2021-09-18 2021-12-31 支付宝(杭州)信息技术有限公司 数据查询方法、装置及多方安全数据库
CN114090638A (zh) * 2022-01-20 2022-02-25 支付宝(杭州)信息技术有限公司 基于隐私保护的联合数据查询方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886447A (zh) * 2023-09-07 2023-10-13 中国电子科技集团公司第十五研究所 一种精简编解码的加密传输方法及装置
CN116886447B (zh) * 2023-09-07 2024-02-13 中国电子科技集团公司第十五研究所 一种精简编解码的加密传输方法及装置
CN117077209A (zh) * 2023-10-16 2023-11-17 云阵(杭州)互联网技术有限公司 大规模数据匿踪查询方法
CN117077209B (zh) * 2023-10-16 2024-02-23 云阵(杭州)互联网技术有限公司 大规模数据匿踪查询方法

Also Published As

Publication number Publication date
CN114090638A (zh) 2022-02-25
CN114090638B (zh) 2022-04-22

Similar Documents

Publication Publication Date Title
WO2023138379A1 (zh) 基于隐私保护的联合数据查询方法及装置
Durak et al. What else is revealed by order-revealing encryption?
Li et al. Fast range query processing with strong privacy protection for cloud computing
Kuzu et al. Efficient privacy-aware record integration
Giannotti et al. Privacy-preserving mining of association rules from outsourced transaction databases
Ganapathy et al. Distributing data for secure database services
Liu et al. SMC: A practical schema for privacy-preserved data sharing over distributed data streams
JP2019500645A (ja) 暗号プロトコルを用いたsqlベースのデータベースの保護
CN115080615A (zh) 基于多方安全计算的数据查询方法及装置
Karakasidis et al. Scalable blocking for privacy preserving record linkage
Li et al. CDPS: A cryptographic data publishing system
CN115242371A (zh) 差分隐私保护的集合交集及其基数计算方法、装置及系统
Zhan et al. MDOPE: Efficient multi-dimensional data order preserving encryption scheme
CN117171817B (zh) 基于区块链的电子签章安全管理方法
Roy Chowdhury et al. Strengthening order preserving encryption with differential privacy
Ranbaduge et al. A scalable privacy-preserving framework for temporal record linkage
Gitanjali et al. A pristine clean cabalistic foruity strategize based approach for incremental data stream privacy preserving data mining
Qian et al. Integer-granularity locality-sensitive bloom filter
Gao et al. Secure approximate nearest neighbor search over encrypted data
Kamble et al. A study on fuzzy keywords search techniques and incorporating certificateless cryptography
Zhang et al. HOPE-L: A Lossless Database Watermarking Method in Homomorphic Encryption Domain
El-Sisi et al. Evaluation of Encryption Algorithms for Privacy Preserving Association Rules Mining.
Moghadam et al. A secure order-preserving indexing scheme for outsourced data
Zhu et al. Privacy-preserving affinity propagation clustering over vertically partitioned data
Almutairi et al. Secure third‐party data clustering using SecureCL, Φ‐data and multi‐user order preserving encryption

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23742695

Country of ref document: EP

Kind code of ref document: A1