CN110166221B

CN110166221B - Ciphertext data compression storage structure RCPE and data dynamic read-write method

Info

Publication number: CN110166221B
Application number: CN201910165159.1A
Authority: CN
Inventors: 齐赛宇; 张萌; 袁浩然; 陈晓峰; 张夫猷
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2022-02-22
Anticipated expiration: 2039-03-05
Also published as: CN110166221A

Abstract

The invention belongs to the field of data security, and discloses a ciphertext data compression storage structure RCPE and a data dynamic read-write method. A two-layer cipher key value storage structure RCPE is designed by utilizing the design concept of the RCFile storage structure; the invention comprises the optimization of data storage format, data compression method and data reading; the method comprises the steps of dividing data into a plurality of row groups horizontally, wherein the size of each row group is specified by a user; then, each row group is vertically divided; the data is compressed and encrypted by columns in the row group. And a read-write interface is designed on the ciphertext storage structure to support rich query modes and dynamic write operation. The invention has the data loading speed and the load adaptive capacity equivalent to the line storage, can avoid unnecessary column reading when reading data, and reduces the time of decryption and decompression of the client. It has better performance than other structures, uses the compression of row dimension can effectively promote the storage space utilization ratio, and then improves system performance.

Description

Ciphertext data compression storage structure RCPE and data dynamic read-write method

Technical Field

The invention belongs to the field of data security, and particularly relates to a ciphertext data compression storage structure RCPE and a data dynamic read-write method.

Background

Currently, the current state of the art commonly used in the industry is such that: at present, computing technology integrates computing, networking and storage resources of a large number of physical devices, and provides users with corresponding services on demand and flexibly through the internet. However, to fully utilize these services, tasks that were performed at the local device in the past need to be delegated to a cloud service provider. Because KV storage has high performance, linear scalability, continuous availability, and even has great potential for advanced support of rich queries. In order to manage the ever-increasing amount of data, distributed key-value (KV) storage has become the backbone of many public cloud services. Meanwhile, the data compression can enable the server to contain more data in the main memory, so that the access times to the persistent storage are reduced, and the system performance is improved. Therefore, many large data stores utilize compression software to compress data, improving system performance. In order to protect data confidentiality, a natural solution is to encrypt data stored on a server, a key is stored at a client, and only a user who possesses the key can decrypt the data. An ideal system is to incorporate compression and encryption into the design.

Most of the current schemes only adopt one of compression technology or encryption technology because of the tension relationship between compression and encryption. First, it is not feasible to compress the encrypted data (randomly) because the pseudo-random data is not compressible. Second, encrypting compressed data works well in some systems, but has problems in database settings. Compressing a line of data generally provides a limited compression ratio, while compressing multiple lines of data means that the server cannot maintain fine-grained access to these line attributes, making it more difficult to maintain correct semantics. The first scheme MiniCrypt to combine compression and encryption based on key-value storage is proposed in the prior art. The scheme compresses a small number of lines together, and improves the usability of data while achieving a higher compression ratio. And using the minimum value of the key as the ID of the packet to realize the whole line query and the range query. However, since the scheme is the whole line compression, when the scheme only needs to query one column of data, all columns of data still need to be returned, the communication overhead is large, the decryption and decompression are slow, and the like, and since a packet has multiple lines of data, the problem of concurrency and conflict is easily caused, so that the update scheme of MiniCrypt will cause a large amount of repeated operations.

In summary, the problems of the prior art are as follows:

(1) most of the prior art only supports one technique of compression or encryption; in the current MiniCrypt scheme which simultaneously supports compression and encryption technologies, due to the integral packaging of each row group, when a single-column or a few-column query is performed, data of all columns in the row group still need to be returned, which causes a large amount of unnecessary communication overhead, and meanwhile, a client needs to perform decryption and decompression operations on the whole row group, which increases the time overhead of decryption and decompression of the client. Solving the problem will reduce the communication cost, improve the transmission speed, reduce the calculation pressure of the client and improve the system performance.

(2) The prior art has low compression ratio and poor dynamic effect of a database. The method mainly comprises the steps that multiple rows of data are compressed integrally, the data types are different, the compression ratio is not high, and due to the fact that multiple rows of data exist in one row group, a phenomenon that multiple users process one packet at the same time, and the updating loss phenomenon is caused easily occurs. And the phenomenon that the user continuously downloads and renews the package is easy to happen by adopting the mode of recording the package hash value by the optimistic lock. And the compression ratio is improved, so that the communication overhead is reduced, and the system throughput is improved. The new method is found to solve the problem of update loss in the concurrent update process, so that the overhead caused by repeated downloading and updating is reduced, and the dynamic property of the database is improved.

The difficulty of solving the technical problems is as follows: how to design a new key value data storage structure based on the design concept of the RCFile storage structure, and maintaining high availability and higher query efficiency while supporting data compression and encryption is a difficult point for solving the problems.

The significance of solving the technical problems is as follows: the problem can be solved, better confidentiality and high-efficiency guarantee can be provided for big data storage, and the method is more suitable for a data warehouse.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a ciphertext data compression storage structure RCPE and a data dynamic read-write method.

The invention is realized in this way, cipher text data compression storage structure RCPE and data dynamic read-write method, characterized by that, the said cipher text data compression storage structure RCPE and data dynamic read-write method use RCFile storage structure design idea to divide data, have designed a two-layer key value storage structure; the first layer is composed of row group keys and row group values, and the second layer is composed of column attribute names and column attribute values inside the row groups. The invention comprises the optimization of data storage format, data compression method and data reading; the method comprises the steps of dividing data into a plurality of row groups horizontally, wherein the size of each row group is specified by a user; then, each row group is vertically divided; compressing and encrypting the data in the row group according to the columns; and a read-write interface is designed on the ciphertext storage structure to support rich query modes and dynamic write operation. Because there are many rows of records in a row group, there is a high possibility that multiple users modify the same packet at the same time when writing operation is performed, and it is very easy to cause user update loss. Therefore, the invention designs a locking mechanism, when data is written, the updating lock is added to the matched row group, only other users are allowed to read the data, but the updating is not allowed. When the update is overtime or the update of the lock occupying user is finished, the update lock is released, and other users can continue to perform the update operation.

Further, the ciphertext data compression storage structure RCPE and the data dynamic read-write method comprise the following steps:

first, a configuration phase Setup (λ, DB, Q): initializing the original database DB with the size of Q, defining a public parameter lambda and generating a secret key K_cAnd K_oDividing the data into rows and groups, compressing each column in the rows and groups, carrying out AES encryption, and establishing an inverted index x for the column needing to establish the index; uploading the encoded data S and the index to a cloud server for data storage and query; the configuration stage is completed at the client;

second, Query stage Query (S, x, key, K)_c，C[k]): the client processes the query request to generate a query label, and the label and the hash value C [ k ] of the column name of the column to be returned]Sending the data to a server side, searching the ciphertext database by the server, and searching k required column packets

Returning to the client; the client carries out decryption and decompression operation on the column packet and searches a matching value;

third, data update phase

The client carries out order-preserving encryption on the keysGenerating a label, sending an updating request to the server, searching the matching column packet by the server by using the label provided by the client, returning the matching column packet to the client, and adding an updating lock to the row group; the user carries out decryption and decompression operation on the column packet, repacks the data after the data is changed, uploads the data to the server side, and releases an update lock;

fourth, a data addition phase Insert (S, key, C)_value[n]，Row_max): the server matches the query tag generated according to the key provided by the user to the row group, updates the row group, and returns all the data of the whole row group, and the user firstly checks the size of the whole row group; if the size of the row group exceeds a threshold value, executing a data splitting stage, otherwise, directly adding data; after the data are repackaged, uploading the repackaged data to a server for updating, and releasing an updating lock;

fifth, Split stage of data Split Split (RowID, C)_pack[n]): splitting all column packets according to the number of the key value pairs in the row group by a rounding method, wherein each column packet is split into a left column packet and a right column packet; covering all the left column packets with the original row groups, and constructing new row groups for all the right column packets;

sixthly, a data deleting stage delete (key): and searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing the matched row group by a user, deleting the data, packaging the data again, uploading the data to the server, and releasing the update lock.

Further, the specific process of the first step configuration algorithm comprises:

setup (λ, DB, Q): running at client end, inputting security parameter lambda, and generating key K by using key generation algorithm_cAnd K_o；

(K_c，K_o) Oid: wherein the input parameters correspond to the input parameters in the configuration algorithm; sequencing the original database DB according to keys, hashing each key, performing modulo operation on the hash value of each key, and dividing the hash value into eight partitions, wherein the ID of each partition is represented by PartID;

PartID＝Hash(key)mod 8

dividing the records in each bucket into line groups, wherein the line group ID is the minimum value of keys in each line group, and carrying out order-preserving encryption on the line groups:

RowID＝OPE(key_min，K_o)

in each row group, zlib compression and AES encryption are performed for each column:

establishing an index x of the conditional query:

α＝H(C_value)

β＝(PartID||RowID)

x←(α，β)

further, the Query algorithm Query (S, x, key, K) in the second step_c，C[k]) The method specifically comprises the following queries:

(1) get (key): inputting a key and returning a value corresponding to the key;

since the key-value pairs are packed and encrypted in the RCPE, the client can only get in the granularity of the column packets. It is a challenge how to determine the row composition in which it is located based on a given key. Since the RowID is selected to be less than or equal to the smallest key in the row group, the row group corresponding to the key is the row group for which the RowID is the highest among all RowIDs not exceeding the key. Then go to the second layer to find the specified column pack. The specific operation is as follows:

a client: inputting the key, carrying out order-preserving encryption on the key to generate the key^*And will key^*And sending the data to a server side.

key^*＝OPE(key，K_o)

A server side: lookup key^*And sequencing the small row groups, returning the matching column packets in the row group with the maximum RowID, and performing decryption and decompression operation on the matching column packets by the client to filter out the matching values.

(2)Get(key_low，key_high): inputting a range, returning all values in the range;

the scope query can be easily adapted to the design of the present invention. In fact, for a large range of queries involving multiple keys, the present invention uses less network bandwidth than conventional key-value storage, because RCPE compresses multiple keys together according to query range, and column dimension partitioning in row groups reduces unnecessary column reads. The specific operation is as follows:

a client: input (key)_low，key_high) Performing order-preserving encryption on the obtained product to generate

And will be

Sending the data to a server;

a server side: lookup RowID ratio

Large ratio of

Small groups of rows, add to the result set if

Less than minimum RowID in result set_minExecute Get (key)_low) And adding the result into the result set and returning the matched column packet in the row group.

(3)

One or more key-value pairs are searched for according to the value required for a particular column match.

We build an inverted index for the columns that need to be indexed. Each data field of the column attribute corresponds to a value of the partition key concatenated with the row group key. The specific operation is as follows:

a client: input column C_iValue of (A)

Using SHA256 pairs

Carrying out hash and sending a result alpha after the hash to a server side;

a server side: firstly, searching in an index table according to alpha, searching for beta, further obtaining a row group key and a partition key, finally searching for a required column packet according to the row group key and the partition key and returning the column packet to a client, and filtering data to obtain a matching value after the client decrypts and decompresses.

Further, the specific process of updating the algorithm in the third step includes:

the column attribute C corresponding to the key in the S database_iIs updated to

The update process is as follows: firstly, a client acquires a column packet of a database by calling a query algorithm, and adds an update lock to a row group:

and obtaining a query result tau, and carrying out decryption and decompression operation on the data packet by calling a decryption and decompression algorithm by the client:

to pair

And updating, namely performing compression, encryption and packaging on the columns again, uploading the columns to a server side for updating the column packets, and releasing an updating lock.

Further, the specific process of adding the algorithm in the fourth step includes:

Insert(S，key，C_value[n]，Row_max): searching a matched row group through the client according to the label generated by the key, adding an update lock to the row group, and returning the row group to the client; firstly, it is necessary to determine whether the number of records in a row group exceeds a threshold value, and if so, the number of records in a row group exceeds the threshold value (Keys)_sum＞Keys_max) And executing a packet splitting stage, otherwise, directly inserting data, repackaging and uploading to the server side, and releasing an updating lock.

Further, the specific process of the packet splitting algorithm in the fifth step includes:

Split(RowID_ε，C_pack[n])): the splitting algorithm must be deterministic for each column packet in a row group, so that each client reading the same packet will split the packet in exactly the same way. During splitting, for each column packet, the client splits the packet by creating one left packet from the first half of the key (rounding) and one right packet from the rest of the key. The client then compresses each packet and encrypts it as usual. It is inserted into the right bag first and then into the left bag.

Further, the specific process of the data deletion algorithm in the sixth step includes:

delete (key): searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing by a user, deleting the data, repackaging and uploading the data to the server, and releasing the update lock;

the invention also aims to provide a dynamic ciphertext database storage system applying the ciphertext data compression storage structure RCPE and the data dynamic read-write method.

In summary, the advantages and positive effects of the invention are: the invention designs a cipher text data storage structure RCPE based on the design idea of RCFile, and the RCPE is defined as follows: the RCPE structure comprises a data storage format, a data compression method and an optimization technology of data reading; all four requirements of data storage can be met: (1) the method has the advantages of high data loading speed, (2) high query processing speed, (3) high utilization rate of storage space, and (4) strong adaptability to dynamic data access modes. RCPE follows the design concept of "dividing horizontally and then dividing vertically". The data is divided into row groups according to row levels, so that the data in one row can be guaranteed to be stored in the same cluster node; then the row group is divided vertically. Horizontal partitioning first divides the table into a number of row groups according to the row group size, which is a user-specified value. RCPE supports flexible row group sizes that require a tradeoff between data compression performance and query performance. The RCPE divides data by using the design idea of RCFile, then compresses and encrypts columns in each row group, and the encryption key is stored in the client.

The RCPE carries out preprocessing operation on data at a client, and the specific implementation method is as follows: mod 8 operation is performed on the keys, the data is divided into 8 buckets, and the data in each bucket is sorted. The row group size is determined to be m records, each record having n columns. For the data in each bucket, data Q is divided horizontally by row group. Compressing the data in the row group by columns by using a compression algorithm, generating a column packet for each column, and encrypting the column packet by using AES (advanced encryption standard)

Then, the minimum value in the key is subjected to order-preserving encryption to be used as the ID of the row group, and a new key value pair is formed between each column attribute and the column packet

The RCPE has the data loading speed and the load adaptability which are equivalent to those of line storage, unnecessary column reading can be avoided when data is read, and the time overhead of decryption and decompression of a client is reduced; it has better performance than other structures, uses the compression of row dimension can effectively promote the storage space utilization ratio, and then improves system performance.

The distributed storage of the data is realized by adopting a Hash consistency algorithm, a two-layer key value pair storage structure RCPE is designed by utilizing the design idea of an RCFile storage structure, the horizontal partition and the vertical partition of the big data are realized, the compression ratio is obviously improved by utilizing column compression in a row group, the communication overhead is reduced when a small number of columns are inquired, and the system performance is further improved; in addition, the compressed data is encrypted by the AES, so that the confidentiality of the data is ensured. In order to enhance the availability of data, the invention adopts a method of taking the minimum key in each row group as the row group ID, thereby realizing KV inquiry and range inquiry of the compressed and encrypted data. Meanwhile, the invention establishes the inverted index aiming at other column attributes, realizes the function of matching one or more key value pairs aiming at a specific column, and simultaneously supports the dynamic addition and deletion of data. Meanwhile, in order to avoid the phenomenon of update loss in the write operation, the invention designs a lock mechanism, when the write operation is carried out on data, an update lock is added to the matched row group, only other users are allowed to read the data, but the update is not allowed. When the update is overtime or the update of the lock occupying user is finished, the update lock is released, and other users can continue to perform the update operation.

The method tests on the Cassandra database, firstly writes the data into the cache list in the process of writing the data, the cache list exists in the memory, and when the data exceeds the limit, the data is written into the disk. The invention adopts the design idea of RCFile to perform column compression on data in a row group, thereby obviously improving the compression ratio and achieving the effect of improving the system performance. Meanwhile, the invention can better embody the advantages of the method if the method is applied to the condition of only few columns of inquiry, because when few columns of inquiry are carried out, the data of the whole row group does not need to be returned, and only the data of the required row group needs to be returned, thereby reducing the storage overhead and the time for decrypting and decompressing the client. The invention is compared with the MiniCrypt scheme and the EDSK scheme. The MiniCrypt scheme adopts a multi-line integral compression and re-encryption mode for data storage, the EDSK scheme is an encrypted, distributed and searchable key value storage scheme, compression is not adopted, and the scheme uses a pseudorandom function to generate keys from line names and column attributes, so that large storage overhead is introduced.

Drawings

Fig. 1 is a flowchart of a ciphertext data compression storage structure RCPE and a data dynamic read-write method according to an embodiment of the present invention.

Fig. 2 is a diagram of an inverted index structure for column attribute query according to an embodiment of the present invention.

Fig. 3 is a distribution structure diagram applied to a distributed database according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a ciphertext key value storage structure provided by the embodiment of the present invention.

Fig. 5 is a comparison chart of the RCPE storage structure and the MiniCrypt storage structure compression ratio according to the embodiment of the present invention.

Fig. 6 is a comparison diagram of storage overhead of three schemes provided by the embodiment of the present invention.

Fig. 7 is a comparison diagram of communication overhead between a single-row query and a whole-row query according to the MiniCrypt scheme of the present invention.

Fig. 8 is a comparison diagram of communication overhead during range query between the present invention and the MiniCrypt scheme according to the embodiment of the present invention.

Fig. 9 is a comparison graph of throughput variation with an increase in thread number when a single-rank query is performed by the three schemes provided by the embodiment of the present invention.

Fig. 10 is a graph comparing throughput of the DCCE and MiniCrypt schemes in performing range query according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to realize higher compression ratio and avoid unnecessary column reading, the invention designs a new ciphertext data compression storage scheme based on RCFile.

The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.

As shown in fig. 1, the ciphertext data compression storage structure RCPE and the data dynamic read-write method provided by the embodiment of the present invention include the following steps:

s101: a configuration stage: initializing data by a client, defining common parameters, dividing the data into rows and groups, performing zlib compression and AES encryption on each column in the rows and establishing an inverted index for the column needing to establish the index. Uploading the data and the index to a cloud server, and storing the data and the index in a Cassandra database;

s102: and (3) an inquiry stage: the client processes the query request to generate a query label, and the label and the hash value C [ k ] of the column name of the returned column are processed]Sending the data to a server side, searching the ciphertext database by the server, and searching k required column packets

s103: a client updating stage: the client performs order preserving encryption processing on the keys to generate a label, sends an updating request to the server, the server finds out the matching column packet by using the label provided by the client and returns the matching column packet to the client, and an updating lock is added to the row group; the user carries out decryption and decompression operation on the column packet, repacks the data after the data is changed, uploads the data to the server side, and releases an update lock;

s104: a data adding stage: the server matches the query tag generated according to the key provided by the user to the row group, updates the row group, and returns all the data of the whole row group, and the user firstly checks the size of the whole row group; if the size of the row group exceeds a threshold value, executing a data splitting stage, otherwise, directly adding data; after the data are repackaged, uploading the repackaged data to a server for updating, and releasing an updating lock;

s105: a data splitting stage: data splitting phase Split (RowID, C)_pack[n]): splitting all column packets according to the number of the key value pairs in the row group by a rounding method, wherein each column packet is split into a left column packet and a right column packet; covering all the left column packets with the original row groups, and constructing new row groups for all the right column packets;

s106: and a data deleting stage: and searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing the matched row group by a user, deleting the data, packaging the data again, uploading the data to the server, and releasing the update lock.

The dynamic key value storage database supporting compression and encryption mainly realizes the following aims on the basis of the prior art:

safety: the confidentiality of user data is guaranteed, only a user with a secret key can decrypt the data, and attacks of external adversaries and malicious servers can be resisted.

High efficiency: for an arbitrary database DB ∈ [ q ]]×{0，1}^*Where q is poly (k), the computational and storage resources expended by clients, servers and other users do not exceed o (q).

The dynamic property: the method supports rich query and well supports the operation of data addition, deletion and modification.

The invention mainly comprises two parts, namely, a data storage structure RCPE constructed by using RCFile design idea, and the other part is designed for a read-write interface of a database.

1. A data storage structure:

RCPE follows the design concept of "dividing horizontally and then dividing vertically". And initializing the data by the client, defining public parameters and generating a key. The client divides the data table into a plurality of line groups according to the line group size, wherein the line group size is a value designated by a user, the RCPE supports the flexible line group size, and the line group size needs to balance two aspects of data compression performance and query performance. Then, each row group is vertically divided, each column in the row group is subjected to zlib compression and AES encryption, and an inverted index is established for the column needing to be indexed. And uploading the data and the index to a cloud server for data storage and query. The horizontal partition solves the defect that one row of data is stored across nodes due to column storage; the vertical partitioning solves the I/O overhead caused by reading unnecessary columns in the traditional row storage, and meanwhile, the compression ratio of the columns in the row group is higher than that of the rows, so that more data can be accommodated in the memory. Meanwhile, when a few columns are inquired, only the required column packets need to be returned, and communication overhead caused by returning the whole row group is avoided.

PartID＝Hash(key)mod 8

RowID＝OPE(key_min，K_o)

establishing an index x of the conditional query:

α＝H(C_value)

β＝(PartID||RowID)

x←(α，β)

design of database read-write interface

2. Designing a reading interface:

(1) get (key): inputting a key and returning a value corresponding to the key;

key^*＝OPE(key，K_o)

A server side: lookup key^*And (3) sequencing the small row groups, and returning a matching column packet in the row group with the largest RowID, wherein the algorithm is described as follows:

select C₁，C₂，...，C_kfrom table

where RowID≤key^*

order by RowID desc limit 1

a client:

Return C_value[k]

And will be

Sending the data to a server;

a server side: lookup RowID ratio

Large ratio of

Small groups of rows, add to the result set if

Less than minimum RowID in result set_minExecute Get (key)_low) And adding the result into a result set, and returning a matching column packet in the row group, wherein the algorithm is described as follows:

a client:

C_value[k]←Filter(result)

Return C_value[k]

(3)

searching for one or more key value pairs based on the values required for a particular column match;

a client: input column C_iValue of (A)

Using SHA256 pairs

Carrying out hash and sending a result alpha after the hash to a server side;

a server side: searching in the index table to find beta, and then searching in the data table, wherein the algorithm is described as follows:

a client:

Return C_value[k]

3. designing a writing interface:

(1)

to pair

Updating, performing compression, encryption and packaging on the columns again, uploading to a server side for updating the column packets, and releasing an update lock, wherein the algorithm is described as follows:

a client: select C_key，C_ifrom table

where RowID≤key^*

order by RowID desc limit 1(updlock)

A server side:

where RowID＝RowID_ε(unlock)；

(2)Insert(S，key，C_value[n]，Row_max): searching a matched row group through the client according to the label generated by the key, adding an update lock to the row group, and returning the row group to the client; firstly, it is necessary to determine whether the number of records in a row group exceeds a threshold value, and if so, the number of records in a row group exceeds the threshold value (Keys)_sum＞Keys_max) And executing a packet splitting stage, otherwise, directly inserting data, repackaging and uploading to the server side, and releasing an updating lock.

(3)Split(RowID_ε，C_pack[n]): the splitting algorithm must be deterministic for each column packet in a row group, so that each client reading the same packet will split the packet in exactly the same way. During the split, for each column packet, the client creates a left packet by creating one left packet from the first half of the key (rounding) and one from the rest of the keyRight packets to divide the packet. The client then compresses each packet and encrypts it as usual. It is inserted into the right bag first and then into the left bag.

where RowID＝RowID_ε

(4) Delete (key): searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing by a user, deleting the data, repackaging and uploading the data to the server, and releasing the update lock;

select C[n]from table

where RowID≤key^*

order by RowID desc limit 1(updlock)

a client:

result_new←Delete from result where key＝key

a server:

where RowID＝RowID_ε(unlock)

the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A ciphertext data compression storage structure RCPE and a data dynamic read-write method are characterized in that the ciphertext data compression storage structure RCPE and the data dynamic read-write method divide data by using the design idea of RCFile; the invention comprises the optimization of data storage format, data compression method and data reading; the method comprises the steps of dividing data into a plurality of row groups horizontally, wherein the size of each row group is specified by a user; then, each row group is vertically divided; compressing and encrypting the data in the row group according to the columns;

the ciphertext data compression storage structure RCPE and the data dynamic read-write method comprise the following steps:

second, Query stage Query (S, x, key, K)_c,C[k]): the client processes the query request to generate a query label, and the label and the hash value C [ k ] of the column name of the column to be returned]Sending the data to a server side, searching the ciphertext database by the server, and searching k required column packets

third, data update phase

Client-side carries out order-preserving encryption on keysProcessing, namely generating a label, sending an updating request to a server, searching the matching column packet by the server by using the label provided by the client, returning the matching column packet to the client, and adding an updating lock to the row group; the user carries out decryption and decompression operation on the column packet, repacks the data after the data is changed, uploads the data to the server side, and releases an update lock;

fourth, a data addition phase Insert (S, key, C)_value[n],Row_max): the server matches the query tag generated according to the key provided by the user to the row group, updates the row group, and returns all the data of the whole row group, and the user firstly checks the size of the whole row group; if the size of the row group exceeds a threshold value, executing a data splitting stage, otherwise, directly adding data; after the data are repackaged, uploading the repackaged data to a server for updating, and releasing an updating lock;

sixthly, a data deleting stage delete (key): the key is a label of a key generated by the client, and the key is a key of each column; and searching the row group where the data is located, adding an update lock to the matched row group, returning to the client, decrypting and decompressing by the user, deleting the data, repacking and uploading the data to the server, and releasing the update lock.

2. The RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the first step of configuring the algorithm specifically includes:

(K_c,K_o) Oid: wherein the input parameters correspond to the input parameters in the configuration algorithm; sequencing the original database DB according to keys, hashing each key, andperforming modulo operation on the hash value of each key, dividing the hash value into eight partitions, and representing the ID of each partition by PartID;

PartID＝Hash(key)mod 8

RowID＝OPE(key_min,K_o)

establishing an index x of the conditional query:

α＝H(C_value)

β＝(PartID||RowID)

x←(α,β) 。

3. the RCPE and data dynamic read-write method in the ciphertext data compression storage structure of claim 1, wherein the Query algorithm Query (S, x, key, K) in the second step_c,C[k]) The method specifically comprises the following queries:

(1) get (key): inputting a key and returning a value corresponding to the key;

a client: inputting the key, carrying out order-preserving encryption on the key to generate the key^*And will key^*Sending the data to a server;

key^*＝OPE(key,K_o)

select C₁,C₂,…,C_k from table

where RowID≤key^*

order by RowID desc limit 1

a client:

Return C_value[k]

(2)Get(key_low,key_high): inputting a range, returning all values in the range;

a client: input (key)_low,key_high) Performing order-preserving encryption on the obtained product to generate

And will be

Sending the data to a server;

a server side: lookup RowID ratio

Large ratio of

Small groups of rows, add to the result set if

select C₁,C₂,…,C_k from table

result←Get(key_low)

a client:

C_value[k]←Filter(result)

Return C_value[k]

(3)

a client: input column C_iValue of (A)

Using SHA256 pairs

Carrying out hash and sending a result alpha after the hash to a server side;

select value from Indextable

where key＝α allow filtering

(RartID_ε,RowID_ε)←β

Node[i]←PartID_ε

select C₁,C₂,…,C_k from table

where RowID＝RowID_ε

a client:

Return C_value[k]。

4. the RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the third step includes an algorithm updating process:

obtaining a query result tau, and calling a decryption and decompression algorithm by the client to perform data processingAnd (3) carrying out decryption and decompression operation on the packet:

to pair

a client: select C_key,C_i from table

where RowID≤key^*

order by RowID desc limit 1(updlock)

A server side:

where RowID＝RowID_ε(unlock)。

5. the RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the specific process of adding the algorithm in the fourth step includes:

Insert(S,key,C_value[n],Row_max): generated by the client according to the keyThe label searches for a matched row group, adds an update lock to the row group and returns the row group to the client; firstly, it is necessary to determine whether the number of records in a row group exceeds a threshold value, and if so, the number of records in a row group exceeds the threshold value (Keys)_sum>Keys_max) And executing a packet splitting stage, otherwise, directly inserting data, repackaging and uploading to the server side, and releasing an updating lock.

6. The RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the packet splitting algorithm in the fifth step includes:

Split(RowID_ε,C_pack[n]): splitting all column packets according to the number of keys by a rounding algorithm, and splitting each column packet into a left packet and a right packet;

where RowID＝RowID_ε。

7. the RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the data deletion algorithm in the sixth step specifically includes:

select C[n]from table

where RowID≤key^*

order by RowID desc limit 1(updlock)

a client:

result_new←Delete from result where key＝key

a server:

where RowID＝RowID_ε(unlock)。