CN110674524A

CN110674524A - Mixed ciphertext indexing method and system

Info

Publication number: CN110674524A
Application number: CN201910940962.8A
Authority: CN
Inventors: 翟建军; 邢亚君; 陈青民; 孟铭; 郑敏波; 彭海龙
Original assignee: Beijing An Xin Tian Xing Technology Co Ltd
Current assignee: Beijing An Xin Tian Xing Technology Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-10

Abstract

The invention discloses a method and a system for indexing a mixed ciphertext. The method comprises the following steps: acquiring ciphertext data stored in a database; carrying out barrel division on the ciphertext data with the same attribute to obtain barrel division data; a B + tree is built for the bucket-partitioned data, with data pointers stored in the leaf nodes. The method and the system for indexing the mixed ciphertext can improve the indexing speed.

Description

Mixed ciphertext indexing method and system

Technical Field

The invention relates to the field of indexes, in particular to a method and a system for indexing a mixed ciphertext.

Background

The database encryption technology is the last line of defense for guaranteeing the security of the database, and insecurity caused by directly storing plaintext data in the database is solved. Meanwhile, the encryption of the database also brings a new problem that the encryption obviously reduces the efficiency of data query. The initial query result returned to the client by the existing bucket partition ciphertext indexing method contains records which do not meet the query condition. The data in the barrel returned to the client by the server is in a linear structure, and the data is encrypted and loses the original partial order relation, so that the client can only do sequential searching. The disadvantage of sequential lookup is that the lookup is inefficient, especially in the case of large amounts of data.

Disclosure of Invention

The invention aims to provide a method and a system for indexing a mixed ciphertext, which improve the indexing speed.

In order to achieve the purpose, the invention provides the following scheme:

a method of hybrid ciphertext indexing, comprising:

acquiring ciphertext data stored in a database;

carrying out barrel division on the ciphertext data with the same attribute to obtain barrel division data;

a B + tree is built for the bucket partitioned data, and data pointers are stored into leaf nodes.

Optionally, the barrel dividing is performed on the ciphertext data with the same attribute to obtain barrel divided data, and the method specifically includes:

determining a division point for the ciphertext data with the same attribute by taking the minimum total number of error detection elements as a target function; the total number of the error detection elements is the sum of the error detection elements corresponding to all the barrel intervals; the error detection element number is the number of data except the target data in the bucket interval obtained in the indexing process;

and carrying out barrel division on the ciphertext data with the corresponding attribute according to the dividing points.

Optionally, the determining, by using the minimum total number of error detection elements as a target function, a partition point for the ciphertext data with the same attribute specifically includes:

calculating the maximum value and the minimum value of ciphertext data to be divided, and determining the interval of the ciphertext data;

adding a division point to be inserted into the interval where the ciphertext data is located;

calculating the total number of error detection elements of the barrel interval divided by the division point to be inserted at different positions;

determining the position of the division point to be inserted when the total number of the error detection elements is minimum to obtain the insertion position of the division point to be inserted;

judging whether the number of the barrel intervals divided after the division point to be inserted is larger than or equal to a maximum preset threshold value or not, and obtaining a judgment result;

if the judgment result shows that the position is correct, the insertion position is determined to be finished, and all the insertion positions are recorded;

and if the judgment result shows no, returning to the step of adding a division point to be inserted into the interval where the ciphertext data is located.

Optionally, a calculation formula of the number of false detection elements corresponding to any one bucket interval is as follows:

wherein BC (α, β) represents an attribute value v_αAnd an attribute value v_βThe number of false detection elements in the barrel interval which is the boundary point; f. of_tIs a certain attribute value v_tAt v in_αAnd v_βThe frequencies of occurrence within the barrel interval as boundary points, α, t and β are attribute values v, respectively_αProperty value v_tAnd an attribute value v_βThe number of (2).

A hybrid ciphertext indexing system, comprising:

the acquisition module is used for acquiring the ciphertext data stored in the database;

the barrel dividing module is used for carrying out barrel division on the ciphertext data with the same attribute to obtain barrel divided data;

and the tree structure establishing module is used for establishing a B + tree for the barrel division data and storing the data pointer into a leaf node.

Optionally, the bucket dividing module includes:

the division point determining submodule is used for determining division points for the ciphertext data with the same attribute by taking the minimum total number of the error detection elements as a target function; the total number of the error detection elements is the sum of the error detection elements corresponding to all the barrel intervals; the error detection element number is the number of data except the target data in the bucket interval obtained in the indexing process; (ii) a

And the partitioning submodule is used for carrying out barrel partitioning on the ciphertext data with the corresponding attribute according to the partitioning points.

Optionally, the partition point determining sub-module specifically includes:

the ciphertext interval determining unit is used for calculating the maximum value and the minimum value of ciphertext data to be divided and determining an interval in which the ciphertext data is located;

a division point adding unit, configured to add a division point to be inserted to an interval in which the ciphertext data is located;

the total number of the error detection elements is used for calculating the total number of the error detection elements of the barrel interval divided by the division point to be inserted at different positions;

an insertion position determining unit, configured to determine a position of the division point to be inserted when the total number of the error detection elements is minimum, to obtain an insertion position of the division point to be inserted;

the judging unit is used for judging whether the number of the barrel intervals which are divided after the division points to be inserted are inserted is larger than or equal to a maximum preset threshold value or not to obtain a judgment result;

an insertion position determination completion unit configured to complete the insertion position determination and record all insertion positions if the determination result indicates yes;

and the returning unit is used for returning to the step of adding a division point to be inserted into the interval where the ciphertext data is located if the judgment result shows that the ciphertext data is not inserted into the interval.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the mixed ciphertext indexing method and the mixed ciphertext indexing system provided by the invention have the advantages that indexes are built by using the B + tree on the basis of barrel division, and the prior art is replaced by adopting a sequential searching mode for indexing. And the index is established by using the B + tree on the basis of bucket division, so that the rapid index can be realized, and the index speed is improved. Meanwhile, the invention takes the minimum total number of the false detection elements as a target to determine the partition point of the barrel partition, thereby optimizing the process of the barrel partition, effectively reducing the total number of the false detection elements during the indexing, obtaining the optimal query hit rate, reducing the query cost and balancing the safety of the ciphertext index and the query efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flowchart of a method for hybrid ciphertext indexing according to an embodiment 1 of the present invention;

fig. 2 is a schematic diagram of bucket division when N is 4.

FIG. 3 is a B + tree structure diagram built for the names in Table 2.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Example 1:

fig. 1 is a flowchart of a method for hybrid ciphertext indexing according to embodiment 1 of the present invention.

Referring to fig. 1, the mixed ciphertext indexing method includes:

step 101: ciphertext data stored in a database is obtained.

Step 102: and carrying out barrel division on the ciphertext data with the same attribute to obtain barrel division data.

The bucket division has the function of carrying out rough query once on the basis of ciphertext data and filtering out some irrelevant records. When the data is stored, a barrel number is added to the confidential data to indicate that the plaintext data value of the confidential data falls within a certain section of interval. When inquiring, the server side firstly searches the ciphertext data partition where the data to be inquired is located by the barrel number, and then carries out the next inquiry, so that the inquiry range is reduced, and the inquiry efficiency is improved.

In practical applications, a plaintext relationship R (A)₁,A₂,…,A_m) Usually contains multiple sensitive attributes and therefore in the ciphertext relationship R corresponding thereto^s(etuple,A₁ ^s,A₂ ^s,…,A_m ^s) It will also contain a plurality of ciphertext index columns. Wherein etuple is a ciphertext string after encrypting a plaintext tuple, A₁ ^sIs attribute A in relation R₁Index column of (2).

Table 1 is a student information table, and a storage model of the ciphertext index database is established using the table as an example.

TABLE 1 student information Table

Wherein id, name and score represent the number, name and score of the student respectively. The table is encrypted and indexed using a bucket partitioning technique, and the format of the index table is shown in table 2.

TABLE 2 bucket index Table

The attribute String in the index table represents the String after the corresponding tuple is encrypted. Index-id, Index-same, and Index-score represent Index columns of the corresponding attribute, respectively. The numbers in the table represent the barrel numbers to which the corresponding data is assigned. Wherein the allocation of the bucket numbers may employ a collision-free random number allocation.

For different attributes, different partition strategies can be adopted according to different conditions, and under the condition that the number of partition buckets is limited under the safety requirement, the query efficiency of the ciphertext index is related to a bucket partition function. Bucket partitioning is built on a non-negative integer property domain and it is assumed that the probability of all queries occurring is equal.

The existing barrel partitioning technology basically aims to uniformly partition ciphertext data with certain attribute according to the interval related to the ciphertext data. I.e. each bucket has the same interval size. The invention optimizes the barrel division algorithm in order to improve the query hit rate of barrel division and reduce the query cost. The goal of the optimization is to reduce the number of false detection elements. It is thus known which parameters in the bucket partitioning strategy the number of false positives is related to. Let a bucket interval B contain N attribute values V ═ V₁,v₂,…,v_NF ═ F, the frequency of occurrence of each attribute is set₁,f₂,…,f_NIn which f_t(1. ltoreq. t. ltoreq.N) represents the attribute value v_tFrequency of occurrence within the bucket interval. Q_kRepresenting the set of all queries with a query scope size k, q [ l, p]Set Q for a query_kAnd satisfies p-l +1 ═ k; the number of queries associated with a bucket interval and having a size of k is N + k-1. As shown in fig. 2, when k is 2 and N is 4, the number of range queries of size 2 associated with a bucket interval is N + k-1 is 5, q is q, respectively₁、q₂、q₃、q₄、q₅。

When some element values (not all element values) in the bucket interval meet the query condition of query in a certain range, all the attribute values meeting the index number are searched out due to the consistency of the index numbers in the same bucket interval, and the query result contains false detection tuples and the attribute value v is the attribute value v_iWill result in a secret query result set f_iThe occurrence of a false detection attribute value. With q₂For example, when a query q is required₂Due to q₂In the barrel interval except q₂V of the edge₁And v₂And also includes v₃And v₄. The number of false detection elements when k is 2 and N is 4 can be calculated as follows, as shown in table 3.

TABLE 3 wrong detection element number table

When the query range size k is 2 and the number of attribute values N in the bucket interval is 4, the total number of false detectors is expressed as follows

For the same reason, when N is 5, the total number of the false detection elements is expressed as follows

The induction proof method can deduce that when the bucket interval contains N attribute values, the total number of false detectors is shown in formula (3):

therefore, the total number of false detection elements is only related to the sum F of the number N of attribute values contained in the bucket interval and the occurrence frequency of the attribute values, and is not related to the size k of the range query. In the case that the number of bucket partitioning sub-intervals is limited, seeking oneThe seed bucket division method enables the total number N x F of the false detection tuples to be minimum, and the query hit rate to be highest, so that the optimal bucket division strategy can be obtained. The goal of the optimization algorithm is to minimize

Where M is the upper number of partition buckets (partition sub-intervals),in the jth barrel interval B_jTotal number of false positive attribute values. Let attribute value set V ═ V₁,v₂,…,v_n}(v₁＜…＜v_n) Wherein the attribute value v_t(1. ltoreq. t. ltoreq.n) is present at least once in the table.

From the above, an optimized bucket partitioning process can be obtained. Then, the step 102 specifically includes:

determining a division point for the ciphertext data with the same attribute by taking the minimum total number of error detection elements as a target function; the total number of the error detection elements is the sum of the error detection elements corresponding to all the barrel intervals; the error detection element number is the number of data except the target data in the bucket interval obtained in the indexing process; and carrying out barrel division on the ciphertext data with the corresponding attribute according to the dividing points.

In the above step, the specific process of determining partition points for the ciphertext data with the same attribute by using the minimum total number of error detection elements as a target function includes:

A. and calculating the maximum value and the minimum value of the ciphertext data to be divided, and determining the interval of the ciphertext data.

B. And adding a division point to be inserted into the interval where the ciphertext data is located.

C. And calculating the total number of the error detection elements of the barrel interval divided by the dividing point to be inserted at different positions. The position of the existing insertion point is not added with a new insertion point, so that the insertion of two or more insertion points into the same position is avoided.

The calculation formula of the number of the error detection elements corresponding to any one bucket interval is as follows:

wherein BC (α, β) represents an attribute value v_αAnd an attribute value v_βThe number of false detection elements in the barrel interval which is the boundary point; f. of_tIs a certain attribute value v_tAt v in_αAnd v_βThe frequencies of occurrence within the barrel interval as boundary points, α, t and β are attribute values v, respectively_αProperty value v_tAnd an attribute value v_βThe number of (2). F_αβTo take an attribute value v_αAnd an attribute value v_βIs the sum of the frequency of occurrence of each attribute value in the bucket interval of the boundary point.

D. And determining the position of the division point to be inserted when the total number of the error detection elements is minimum to obtain the insertion position of the division point to be inserted. After the insertion position is obtained, the insertion position is stored into the set of insertion positions.

The method for calculating the minimum value of the total number of the error detection elements can calculate the total number of the error detection elements by taking a certain assumed insertion position of the division point to be inserted currently as a boundary point. The calculation formula is as follows: MOB (1, n, M) ═ min [ MOB (1, i, j) + MOB (i +1, n, M-j) ].

Wherein n is the number of attribute values, M is the total number of barrel intervals formed after the partition point to be inserted is inserted currently, i is the attribute value at a certain assumed insertion position of the partition point to be inserted currently, j is the number of barrel intervals formed by the attribute values 1-i, and M-j is the number of barrel intervals formed by the next n-i attribute values. MOB (1, n, M) is the minimum value of the total number of error detection elements obtained after attribute values 1-n are divided into M barrel intervals. The MOB (1, i, j) is the number of error detection elements obtained after the attribute values 1-i are divided into j barrel intervals, and the MOB (i, n, M-j) is the number of error detection elements obtained after the attribute values i-n are divided into M-j barrel intervals.

E. And judging whether the number of the barrel intervals divided after the division point to be inserted is greater than or equal to a maximum preset threshold value or not, and obtaining a judgment result.

F. And if the judgment result shows that the position is positive, determining the insertion position, and recording all the insertion positions.

G. And if the judgment result shows no, returning to the step B.

After the ciphertext data of the corresponding attribute is subjected to barrel division according to the dividing points, the end point and the number of each barrel interval are recorded.

Step 103: a B + tree is built for the bucket partitioned data, and data pointers are stored into leaf nodes.

In the original ciphertext retrieval method based on barrel division, a server returns all data in a certain barrel, so that a plurality of ciphertext records are returned to a client for filtering and querying, the data volume in practical application is very large, the processing process aggravates the load of the client and slows down the response time of the client.

Therefore, the data in the barrel is searched and inquired at the server side until the record meeting the conditions is inquired and returned to the client side, and the data processing amount of the client side is reduced. In order to improve the data index speed in the barrel, after the barrel is encrypted, a B + tree is established on the basis of a ciphertext.

The B + tree is a dynamic multi-level index structure, and data pointers are only stored in leaf nodes of the tree; therefore, the structure of the leaf node is different from the internal node structure. If the lookup field is a key field, then for each value of the lookup field, there is an entry in the leaf node and a pointer to the record (or to the block containing the record). For non-key lookup fields, a pointer points to a block that contains a pointer to a data file record, thus creating an additional layer of indirection.

The leaf nodes of the B + tree are typically linked together to provide lookup field-based ordered access to the records. These leaf nodes are analogous to the first level (base) indices. The internal nodes of the B + tree correspond to other levels of the multi-level index. Some of the lookup field values of the leaf nodes appear repeatedly in the interior nodes of the B + tree to guide the lookup.

FIG. 3 is a B + tree structure diagram built for the names in Table 2.

Referring to fig. 3, when the current data is less than or equal to the left data of the current node, the current node is indexed to the lower left, and when the current data is greater than or equal to the right data of the current node, the current node is indexed to the lower right until a leaf node is indexed.

As can be seen from fig. 3, the index generated based on bucket partitioning and B + tree mixing is used to perform operations such as query and insertion on data in the ciphertext database, so that the hit rate of the server for ciphertext data query is improved. The higher the query accuracy of the server side is, the smaller the network bandwidth load of the server and the client side is, and the lower the cost spent on decryption query processing of the client side is, so that the response time of the client side is shortened.

Example 2:

an embodiment provides a mixed ciphertext indexing system, comprising:

Optionally, the bucket dividing module includes:

Optionally, the partition point determining sub-module specifically includes:

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

1. when the ciphertext index is established by using a bucket dividing technology, the smaller the number of divided buckets is, the lower the hit rate of query is; the more the number of the partition buckets is, the higher the hit rate of the query is, but the information leakage degree is also increased, and under the condition that the plaintext space of the indexed attribute value is relatively small, the data security can be greatly reduced. Therefore, the core of establishing the ciphertext index by using the bucket partitioning technology lies in the determination of the bucket partitioning function, and under the condition of giving the number of the buckets, the optimal bucket partitioning strategy is provided by the invention.

2. In the original bucket partition ciphertext indexing method: the data in the barrel returned to the client by the server is in a linear structure, and the data is encrypted and loses the original partial order relation, so that the client can only do sequential searching. The sequential search has the defect of low search efficiency, and particularly under the condition of large data volume, the rapid index can be realized by introducing the B + tree structure in the bucket, so that the index speed of the data in the bucket is improved.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A method for indexing a hybrid ciphertext, comprising:

acquiring ciphertext data stored in a database;

2. The method for indexing a mixed ciphertext according to claim 1, wherein the performing bucket partitioning on the ciphertext data with the same attribute to obtain bucket partitioned data specifically comprises:

3. The method for indexing mixed ciphertext according to claim 2, wherein the determining the partition point for the ciphertext data having the same attribute using the minimum total number of error detectors as an objective function specifically comprises:

4. The method of claim 3, wherein the calculation formula of the number of false positives corresponding to any one bucket interval is:

5. A hybrid ciphertext indexing system, comprising:

6. The hybrid ciphertext indexing system of claim 5, wherein the bucket partitioning module comprises:

7. The mixed ciphertext indexing system of claim 6, wherein the partition point determining sub-module specifically comprises:

8. The mixed ciphertext indexing system of claim 7, wherein the error detector number corresponding to any bucket interval is calculated as: