CN110166221B - Ciphertext data compression storage structure RCPE and data dynamic read-write method - Google Patents

Ciphertext data compression storage structure RCPE and data dynamic read-write method Download PDF

Info

Publication number
CN110166221B
CN110166221B CN201910165159.1A CN201910165159A CN110166221B CN 110166221 B CN110166221 B CN 110166221B CN 201910165159 A CN201910165159 A CN 201910165159A CN 110166221 B CN110166221 B CN 110166221B
Authority
CN
China
Prior art keywords
data
key
client
column
row group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910165159.1A
Other languages
Chinese (zh)
Other versions
CN110166221A (en
Inventor
齐赛宇
张萌
袁浩然
陈晓峰
张夫猷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910165159.1A priority Critical patent/CN110166221B/en
Publication of CN110166221A publication Critical patent/CN110166221A/en
Application granted granted Critical
Publication of CN110166221B publication Critical patent/CN110166221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • H04L9/0631Substitution permutation network [SPN], i.e. cipher composed of a number of stages or rounds each involving linear and nonlinear transformations, e.g. AES algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Storage Device Security (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of data security, and discloses a ciphertext data compression storage structure RCPE and a data dynamic read-write method. A two-layer cipher key value storage structure RCPE is designed by utilizing the design concept of the RCFile storage structure; the invention comprises the optimization of data storage format, data compression method and data reading; the method comprises the steps of dividing data into a plurality of row groups horizontally, wherein the size of each row group is specified by a user; then, each row group is vertically divided; the data is compressed and encrypted by columns in the row group. And a read-write interface is designed on the ciphertext storage structure to support rich query modes and dynamic write operation. The invention has the data loading speed and the load adaptive capacity equivalent to the line storage, can avoid unnecessary column reading when reading data, and reduces the time of decryption and decompression of the client. It has better performance than other structures, uses the compression of row dimension can effectively promote the storage space utilization ratio, and then improves system performance.

Description

Ciphertext data compression storage structure RCPE and data dynamic read-write method
Technical Field
The invention belongs to the field of data security, and particularly relates to a ciphertext data compression storage structure RCPE and a data dynamic read-write method.
Background
Currently, the current state of the art commonly used in the industry is such that: at present, computing technology integrates computing, networking and storage resources of a large number of physical devices, and provides users with corresponding services on demand and flexibly through the internet. However, to fully utilize these services, tasks that were performed at the local device in the past need to be delegated to a cloud service provider. Because KV storage has high performance, linear scalability, continuous availability, and even has great potential for advanced support of rich queries. In order to manage the ever-increasing amount of data, distributed key-value (KV) storage has become the backbone of many public cloud services. Meanwhile, the data compression can enable the server to contain more data in the main memory, so that the access times to the persistent storage are reduced, and the system performance is improved. Therefore, many large data stores utilize compression software to compress data, improving system performance. In order to protect data confidentiality, a natural solution is to encrypt data stored on a server, a key is stored at a client, and only a user who possesses the key can decrypt the data. An ideal system is to incorporate compression and encryption into the design.
Most of the current schemes only adopt one of compression technology or encryption technology because of the tension relationship between compression and encryption. First, it is not feasible to compress the encrypted data (randomly) because the pseudo-random data is not compressible. Second, encrypting compressed data works well in some systems, but has problems in database settings. Compressing a line of data generally provides a limited compression ratio, while compressing multiple lines of data means that the server cannot maintain fine-grained access to these line attributes, making it more difficult to maintain correct semantics. The first scheme MiniCrypt to combine compression and encryption based on key-value storage is proposed in the prior art. The scheme compresses a small number of lines together, and improves the usability of data while achieving a higher compression ratio. And using the minimum value of the key as the ID of the packet to realize the whole line query and the range query. However, since the scheme is the whole line compression, when the scheme only needs to query one column of data, all columns of data still need to be returned, the communication overhead is large, the decryption and decompression are slow, and the like, and since a packet has multiple lines of data, the problem of concurrency and conflict is easily caused, so that the update scheme of MiniCrypt will cause a large amount of repeated operations.
In summary, the problems of the prior art are as follows:
(1) most of the prior art only supports one technique of compression or encryption; in the current MiniCrypt scheme which simultaneously supports compression and encryption technologies, due to the integral packaging of each row group, when a single-column or a few-column query is performed, data of all columns in the row group still need to be returned, which causes a large amount of unnecessary communication overhead, and meanwhile, a client needs to perform decryption and decompression operations on the whole row group, which increases the time overhead of decryption and decompression of the client. Solving the problem will reduce the communication cost, improve the transmission speed, reduce the calculation pressure of the client and improve the system performance.
(2) The prior art has low compression ratio and poor dynamic effect of a database. The method mainly comprises the steps that multiple rows of data are compressed integrally, the data types are different, the compression ratio is not high, and due to the fact that multiple rows of data exist in one row group, a phenomenon that multiple users process one packet at the same time, and the updating loss phenomenon is caused easily occurs. And the phenomenon that the user continuously downloads and renews the package is easy to happen by adopting the mode of recording the package hash value by the optimistic lock. And the compression ratio is improved, so that the communication overhead is reduced, and the system throughput is improved. The new method is found to solve the problem of update loss in the concurrent update process, so that the overhead caused by repeated downloading and updating is reduced, and the dynamic property of the database is improved.
The difficulty of solving the technical problems is as follows: how to design a new key value data storage structure based on the design concept of the RCFile storage structure, and maintaining high availability and higher query efficiency while supporting data compression and encryption is a difficult point for solving the problems.
The significance of solving the technical problems is as follows: the problem can be solved, better confidentiality and high-efficiency guarantee can be provided for big data storage, and the method is more suitable for a data warehouse.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a ciphertext data compression storage structure RCPE and a data dynamic read-write method.
The invention is realized in this way, cipher text data compression storage structure RCPE and data dynamic read-write method, characterized by that, the said cipher text data compression storage structure RCPE and data dynamic read-write method use RCFile storage structure design idea to divide data, have designed a two-layer key value storage structure; the first layer is composed of row group keys and row group values, and the second layer is composed of column attribute names and column attribute values inside the row groups. The invention comprises the optimization of data storage format, data compression method and data reading; the method comprises the steps of dividing data into a plurality of row groups horizontally, wherein the size of each row group is specified by a user; then, each row group is vertically divided; compressing and encrypting the data in the row group according to the columns; and a read-write interface is designed on the ciphertext storage structure to support rich query modes and dynamic write operation. Because there are many rows of records in a row group, there is a high possibility that multiple users modify the same packet at the same time when writing operation is performed, and it is very easy to cause user update loss. Therefore, the invention designs a locking mechanism, when data is written, the updating lock is added to the matched row group, only other users are allowed to read the data, but the updating is not allowed. When the update is overtime or the update of the lock occupying user is finished, the update lock is released, and other users can continue to perform the update operation.
Further, the ciphertext data compression storage structure RCPE and the data dynamic read-write method comprise the following steps:
first, a configuration phase Setup (λ, DB, Q): initializing the original database DB with the size of Q, defining a public parameter lambda and generating a secret key KcAnd KoDividing the data into rows and groups, compressing each column in the rows and groups, carrying out AES encryption, and establishing an inverted index x for the column needing to establish the index; uploading the encoded data S and the index to a cloud server for data storage and query; the configuration stage is completed at the client;
second, Query stage Query (S, x, key, K)c,C[k]): the client processes the query request to generate a query label, and the label and the hash value C [ k ] of the column name of the column to be returned]Sending the data to a server side, searching the ciphertext database by the server, and searching k required column packets
Figure BDA0001986046830000031
Returning to the client; the client carries out decryption and decompression operation on the column packet and searches a matching value;
third, data update phase
Figure BDA0001986046830000032
The client carries out order-preserving encryption on the keysGenerating a label, sending an updating request to the server, searching the matching column packet by the server by using the label provided by the client, returning the matching column packet to the client, and adding an updating lock to the row group; the user carries out decryption and decompression operation on the column packet, repacks the data after the data is changed, uploads the data to the server side, and releases an update lock;
fourth, a data addition phase Insert (S, key, C)value[n],Rowmax): the server matches the query tag generated according to the key provided by the user to the row group, updates the row group, and returns all the data of the whole row group, and the user firstly checks the size of the whole row group; if the size of the row group exceeds a threshold value, executing a data splitting stage, otherwise, directly adding data; after the data are repackaged, uploading the repackaged data to a server for updating, and releasing an updating lock;
fifth, Split stage of data Split Split (RowID, C)pack[n]): splitting all column packets according to the number of the key value pairs in the row group by a rounding method, wherein each column packet is split into a left column packet and a right column packet; covering all the left column packets with the original row groups, and constructing new row groups for all the right column packets;
sixthly, a data deleting stage delete (key): and searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing the matched row group by a user, deleting the data, packaging the data again, uploading the data to the server, and releasing the update lock.
Further, the specific process of the first step configuration algorithm comprises:
setup (λ, DB, Q): running at client end, inputting security parameter lambda, and generating key K by using key generation algorithmcAnd Ko
(Kc,Ko) Oid: wherein the input parameters correspond to the input parameters in the configuration algorithm; sequencing the original database DB according to keys, hashing each key, performing modulo operation on the hash value of each key, and dividing the hash value into eight partitions, wherein the ID of each partition is represented by PartID;
PartID=Hash(key)mod 8
dividing the records in each bucket into line groups, wherein the line group ID is the minimum value of keys in each line group, and carrying out order-preserving encryption on the line groups:
RowID=OPE(keymin,Ko)
in each row group, zlib compression and AES encryption are performed for each column:
Figure BDA0001986046830000041
establishing an index x of the conditional query:
α=H(Cvalue)
β=(PartID||RowID)
x←(α,β)
further, the Query algorithm Query (S, x, key, K) in the second stepc,C[k]) The method specifically comprises the following queries:
(1) get (key): inputting a key and returning a value corresponding to the key;
since the key-value pairs are packed and encrypted in the RCPE, the client can only get in the granularity of the column packets. It is a challenge how to determine the row composition in which it is located based on a given key. Since the RowID is selected to be less than or equal to the smallest key in the row group, the row group corresponding to the key is the row group for which the RowID is the highest among all RowIDs not exceeding the key. Then go to the second layer to find the specified column pack. The specific operation is as follows:
a client: inputting the key, carrying out order-preserving encryption on the key to generate the key*And will key*And sending the data to a server side.
key*=OPE(key,Ko)
A server side: lookup key*And sequencing the small row groups, returning the matching column packets in the row group with the maximum RowID, and performing decryption and decompression operation on the matching column packets by the client to filter out the matching values.
(2)Get(keylow,keyhigh): inputting a range, returning all values in the range;
the scope query can be easily adapted to the design of the present invention. In fact, for a large range of queries involving multiple keys, the present invention uses less network bandwidth than conventional key-value storage, because RCPE compresses multiple keys together according to query range, and column dimension partitioning in row groups reduces unnecessary column reads. The specific operation is as follows:
a client: input (key)low,keyhigh) Performing order-preserving encryption on the obtained product to generate
Figure BDA0001986046830000051
And will be
Figure BDA0001986046830000052
Sending the data to a server;
Figure BDA0001986046830000053
Figure BDA0001986046830000054
a server side: lookup RowID ratio
Figure BDA0001986046830000055
Large ratio of
Figure BDA0001986046830000056
Small groups of rows, add to the result set if
Figure BDA0001986046830000057
Less than minimum RowID in result setminExecute Get (key)low) And adding the result into the result set and returning the matched column packet in the row group.
(3)
Figure BDA0001986046830000058
One or more key-value pairs are searched for according to the value required for a particular column match.
We build an inverted index for the columns that need to be indexed. Each data field of the column attribute corresponds to a value of the partition key concatenated with the row group key. The specific operation is as follows:
a client: input column CiValue of (A)
Figure BDA0001986046830000059
Using SHA256 pairs
Figure BDA00019860468300000510
Carrying out hash and sending a result alpha after the hash to a server side;
Figure BDA00019860468300000511
a server side: firstly, searching in an index table according to alpha, searching for beta, further obtaining a row group key and a partition key, finally searching for a required column packet according to the row group key and the partition key and returning the column packet to a client, and filtering data to obtain a matching value after the client decrypts and decompresses.
Further, the specific process of updating the algorithm in the third step includes:
Figure BDA0001986046830000061
the column attribute C corresponding to the key in the S databaseiIs updated to
Figure BDA0001986046830000062
The update process is as follows: firstly, a client acquires a column packet of a database by calling a query algorithm, and adds an update lock to a row group:
Figure BDA0001986046830000063
and obtaining a query result tau, and carrying out decryption and decompression operation on the data packet by calling a decryption and decompression algorithm by the client:
Figure BDA0001986046830000064
to pair
Figure BDA0001986046830000065
And updating, namely performing compression, encryption and packaging on the columns again, uploading the columns to a server side for updating the column packets, and releasing an updating lock.
Further, the specific process of adding the algorithm in the fourth step includes:
Insert(S,key,Cvalue[n],Rowmax): searching a matched row group through the client according to the label generated by the key, adding an update lock to the row group, and returning the row group to the client; firstly, it is necessary to determine whether the number of records in a row group exceeds a threshold value, and if so, the number of records in a row group exceeds the threshold value (Keys)sum>Keysmax) And executing a packet splitting stage, otherwise, directly inserting data, repackaging and uploading to the server side, and releasing an updating lock.
Further, the specific process of the packet splitting algorithm in the fifth step includes:
Split(RowIDε,Cpack[n])): the splitting algorithm must be deterministic for each column packet in a row group, so that each client reading the same packet will split the packet in exactly the same way. During splitting, for each column packet, the client splits the packet by creating one left packet from the first half of the key (rounding) and one right packet from the rest of the key. The client then compresses each packet and encrypts it as usual. It is inserted into the right bag first and then into the left bag.
Further, the specific process of the data deletion algorithm in the sixth step includes:
delete (key): searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing by a user, deleting the data, repackaging and uploading the data to the server, and releasing the update lock;
the invention also aims to provide a dynamic ciphertext database storage system applying the ciphertext data compression storage structure RCPE and the data dynamic read-write method.
In summary, the advantages and positive effects of the invention are: the invention designs a cipher text data storage structure RCPE based on the design idea of RCFile, and the RCPE is defined as follows: the RCPE structure comprises a data storage format, a data compression method and an optimization technology of data reading; all four requirements of data storage can be met: (1) the method has the advantages of high data loading speed, (2) high query processing speed, (3) high utilization rate of storage space, and (4) strong adaptability to dynamic data access modes. RCPE follows the design concept of "dividing horizontally and then dividing vertically". The data is divided into row groups according to row levels, so that the data in one row can be guaranteed to be stored in the same cluster node; then the row group is divided vertically. Horizontal partitioning first divides the table into a number of row groups according to the row group size, which is a user-specified value. RCPE supports flexible row group sizes that require a tradeoff between data compression performance and query performance. The RCPE divides data by using the design idea of RCFile, then compresses and encrypts columns in each row group, and the encryption key is stored in the client.
The RCPE carries out preprocessing operation on data at a client, and the specific implementation method is as follows: mod 8 operation is performed on the keys, the data is divided into 8 buckets, and the data in each bucket is sorted. The row group size is determined to be m records, each record having n columns. For the data in each bucket, data Q is divided horizontally by row group. Compressing the data in the row group by columns by using a compression algorithm, generating a column packet for each column, and encrypting the column packet by using AES (advanced encryption standard)
Figure BDA0001986046830000071
Then, the minimum value in the key is subjected to order-preserving encryption to be used as the ID of the row group, and a new key value pair is formed between each column attribute and the column packet
Figure BDA0001986046830000072
The RCPE has the data loading speed and the load adaptability which are equivalent to those of line storage, unnecessary column reading can be avoided when data is read, and the time overhead of decryption and decompression of a client is reduced; it has better performance than other structures, uses the compression of row dimension can effectively promote the storage space utilization ratio, and then improves system performance.
The distributed storage of the data is realized by adopting a Hash consistency algorithm, a two-layer key value pair storage structure RCPE is designed by utilizing the design idea of an RCFile storage structure, the horizontal partition and the vertical partition of the big data are realized, the compression ratio is obviously improved by utilizing column compression in a row group, the communication overhead is reduced when a small number of columns are inquired, and the system performance is further improved; in addition, the compressed data is encrypted by the AES, so that the confidentiality of the data is ensured. In order to enhance the availability of data, the invention adopts a method of taking the minimum key in each row group as the row group ID, thereby realizing KV inquiry and range inquiry of the compressed and encrypted data. Meanwhile, the invention establishes the inverted index aiming at other column attributes, realizes the function of matching one or more key value pairs aiming at a specific column, and simultaneously supports the dynamic addition and deletion of data. Meanwhile, in order to avoid the phenomenon of update loss in the write operation, the invention designs a lock mechanism, when the write operation is carried out on data, an update lock is added to the matched row group, only other users are allowed to read the data, but the update is not allowed. When the update is overtime or the update of the lock occupying user is finished, the update lock is released, and other users can continue to perform the update operation.
The method tests on the Cassandra database, firstly writes the data into the cache list in the process of writing the data, the cache list exists in the memory, and when the data exceeds the limit, the data is written into the disk. The invention adopts the design idea of RCFile to perform column compression on data in a row group, thereby obviously improving the compression ratio and achieving the effect of improving the system performance. Meanwhile, the invention can better embody the advantages of the method if the method is applied to the condition of only few columns of inquiry, because when few columns of inquiry are carried out, the data of the whole row group does not need to be returned, and only the data of the required row group needs to be returned, thereby reducing the storage overhead and the time for decrypting and decompressing the client. The invention is compared with the MiniCrypt scheme and the EDSK scheme. The MiniCrypt scheme adopts a multi-line integral compression and re-encryption mode for data storage, the EDSK scheme is an encrypted, distributed and searchable key value storage scheme, compression is not adopted, and the scheme uses a pseudorandom function to generate keys from line names and column attributes, so that large storage overhead is introduced.
Drawings
Fig. 1 is a flowchart of a ciphertext data compression storage structure RCPE and a data dynamic read-write method according to an embodiment of the present invention.
Fig. 2 is a diagram of an inverted index structure for column attribute query according to an embodiment of the present invention.
Fig. 3 is a distribution structure diagram applied to a distributed database according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a ciphertext key value storage structure provided by the embodiment of the present invention.
Fig. 5 is a comparison chart of the RCPE storage structure and the MiniCrypt storage structure compression ratio according to the embodiment of the present invention.
Fig. 6 is a comparison diagram of storage overhead of three schemes provided by the embodiment of the present invention.
Fig. 7 is a comparison diagram of communication overhead between a single-row query and a whole-row query according to the MiniCrypt scheme of the present invention.
Fig. 8 is a comparison diagram of communication overhead during range query between the present invention and the MiniCrypt scheme according to the embodiment of the present invention.
Fig. 9 is a comparison graph of throughput variation with an increase in thread number when a single-rank query is performed by the three schemes provided by the embodiment of the present invention.
Fig. 10 is a graph comparing throughput of the DCCE and MiniCrypt schemes in performing range query according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to realize higher compression ratio and avoid unnecessary column reading, the invention designs a new ciphertext data compression storage scheme based on RCFile.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
As shown in fig. 1, the ciphertext data compression storage structure RCPE and the data dynamic read-write method provided by the embodiment of the present invention include the following steps:
s101: a configuration stage: initializing data by a client, defining common parameters, dividing the data into rows and groups, performing zlib compression and AES encryption on each column in the rows and establishing an inverted index for the column needing to establish the index. Uploading the data and the index to a cloud server, and storing the data and the index in a Cassandra database;
s102: and (3) an inquiry stage: the client processes the query request to generate a query label, and the label and the hash value C [ k ] of the column name of the returned column are processed]Sending the data to a server side, searching the ciphertext database by the server, and searching k required column packets
Figure BDA0001986046830000091
Returning to the client; the client carries out decryption and decompression operation on the column packet and searches a matching value;
s103: a client updating stage: the client performs order preserving encryption processing on the keys to generate a label, sends an updating request to the server, the server finds out the matching column packet by using the label provided by the client and returns the matching column packet to the client, and an updating lock is added to the row group; the user carries out decryption and decompression operation on the column packet, repacks the data after the data is changed, uploads the data to the server side, and releases an update lock;
s104: a data adding stage: the server matches the query tag generated according to the key provided by the user to the row group, updates the row group, and returns all the data of the whole row group, and the user firstly checks the size of the whole row group; if the size of the row group exceeds a threshold value, executing a data splitting stage, otherwise, directly adding data; after the data are repackaged, uploading the repackaged data to a server for updating, and releasing an updating lock;
s105: a data splitting stage: data splitting phase Split (RowID, C)pack[n]): splitting all column packets according to the number of the key value pairs in the row group by a rounding method, wherein each column packet is split into a left column packet and a right column packet; covering all the left column packets with the original row groups, and constructing new row groups for all the right column packets;
s106: and a data deleting stage: and searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing the matched row group by a user, deleting the data, packaging the data again, uploading the data to the server, and releasing the update lock.
The dynamic key value storage database supporting compression and encryption mainly realizes the following aims on the basis of the prior art:
safety: the confidentiality of user data is guaranteed, only a user with a secret key can decrypt the data, and attacks of external adversaries and malicious servers can be resisted.
High efficiency: for an arbitrary database DB ∈ [ q ]]×{0,1}*Where q is poly (k), the computational and storage resources expended by clients, servers and other users do not exceed o (q).
The dynamic property: the method supports rich query and well supports the operation of data addition, deletion and modification.
The invention mainly comprises two parts, namely, a data storage structure RCPE constructed by using RCFile design idea, and the other part is designed for a read-write interface of a database.
1. A data storage structure:
RCPE follows the design concept of "dividing horizontally and then dividing vertically". And initializing the data by the client, defining public parameters and generating a key. The client divides the data table into a plurality of line groups according to the line group size, wherein the line group size is a value designated by a user, the RCPE supports the flexible line group size, and the line group size needs to balance two aspects of data compression performance and query performance. Then, each row group is vertically divided, each column in the row group is subjected to zlib compression and AES encryption, and an inverted index is established for the column needing to be indexed. And uploading the data and the index to a cloud server for data storage and query. The horizontal partition solves the defect that one row of data is stored across nodes due to column storage; the vertical partitioning solves the I/O overhead caused by reading unnecessary columns in the traditional row storage, and meanwhile, the compression ratio of the columns in the row group is higher than that of the rows, so that more data can be accommodated in the memory. Meanwhile, when a few columns are inquired, only the required column packets need to be returned, and communication overhead caused by returning the whole row group is avoided.
Setup (λ, DB, Q): running at client end, inputting security parameter lambda, and generating key K by using key generation algorithmcAnd Ko
(Kc,Ko) Oid: wherein the input parameters correspond to the input parameters in the configuration algorithm; sequencing the original database DB according to keys, hashing each key, performing modulo operation on the hash value of each key, and dividing the hash value into eight partitions, wherein the ID of each partition is represented by PartID;
PartID=Hash(key)mod 8
dividing the records in each bucket into line groups, wherein the line group ID is the minimum value of keys in each line group, and carrying out order-preserving encryption on the line groups:
RowID=OPE(keymin,Ko)
in each row group, zlib compression and AES encryption are performed for each column:
Figure BDA0001986046830000111
establishing an index x of the conditional query:
α=H(Cvalue)
β=(PartID||RowID)
x←(α,β)
design of database read-write interface
2. Designing a reading interface:
(1) get (key): inputting a key and returning a value corresponding to the key;
a client: inputting the key, carrying out order-preserving encryption on the key to generate the key*And will key*And sending the data to a server side.
key*=OPE(key,Ko)
A server side: lookup key*And (3) sequencing the small row groups, and returning a matching column packet in the row group with the largest RowID, wherein the algorithm is described as follows:
select C1,C2,...,Ckfrom table
where RowID≤key*
order by RowID desc limit 1
a client:
Figure BDA0001986046830000121
Figure BDA0001986046830000122
Return Cvalue[k]
(2)Get(keylow,keyhigh): inputting a range, returning all values in the range;
a client: input (key)low,keyhigh) Performing order-preserving encryption on the obtained product to generate
Figure BDA0001986046830000123
And will be
Figure BDA0001986046830000124
Sending the data to a server;
Figure BDA0001986046830000125
Figure BDA0001986046830000126
a server side: lookup RowID ratio
Figure BDA0001986046830000127
Large ratio of
Figure BDA0001986046830000128
Small groups of rows, add to the result set if
Figure BDA0001986046830000129
Less than minimum RowID in result setminExecute Get (key)low) And adding the result into a result set, and returning a matching column packet in the row group, wherein the algorithm is described as follows:
Figure BDA00019860468300001210
a client:
Figure BDA00019860468300001211
Cvalue[k]←Filter(result)
Return Cvalue[k]
(3)
Figure BDA00019860468300001212
searching for one or more key value pairs based on the values required for a particular column match;
a client: input column CiValue of (A)
Figure BDA00019860468300001213
Using SHA256 pairs
Figure BDA00019860468300001214
Carrying out hash and sending a result alpha after the hash to a server side;
Figure BDA00019860468300001215
a server side: searching in the index table to find beta, and then searching in the data table, wherein the algorithm is described as follows:
Figure BDA00019860468300001216
a client:
Figure BDA0001986046830000131
Figure BDA0001986046830000132
Return Cvalue[k]
3. designing a writing interface:
(1)
Figure BDA0001986046830000133
the column attribute C corresponding to the key in the S databaseiIs updated to
Figure BDA0001986046830000134
The update process is as follows: firstly, a client acquires a column packet of a database by calling a query algorithm, and adds an update lock to a row group:
Figure BDA0001986046830000135
and obtaining a query result tau, and carrying out decryption and decompression operation on the data packet by calling a decryption and decompression algorithm by the client:
Figure BDA0001986046830000136
to pair
Figure BDA0001986046830000137
Updating, performing compression, encryption and packaging on the columns again, uploading to a server side for updating the column packets, and releasing an update lock, wherein the algorithm is described as follows:
a client: select Ckey,Cifrom table
where RowID≤key*
order by RowID desc limit 1(updlock)
Figure BDA0001986046830000138
Figure BDA0001986046830000139
Figure BDA00019860468300001310
A server side:
Figure BDA00019860468300001311
where RowID=RowIDε(unlock);
(2)Insert(S,key,Cvalue[n],Rowmax): searching a matched row group through the client according to the label generated by the key, adding an update lock to the row group, and returning the row group to the client; firstly, it is necessary to determine whether the number of records in a row group exceeds a threshold value, and if so, the number of records in a row group exceeds the threshold value (Keys)sum>Keysmax) And executing a packet splitting stage, otherwise, directly inserting data, repackaging and uploading to the server side, and releasing an updating lock.
(3)Split(RowIDε,Cpack[n]): the splitting algorithm must be deterministic for each column packet in a row group, so that each client reading the same packet will split the packet in exactly the same way. During the split, for each column packet, the client creates a left packet by creating one left packet from the first half of the key (rounding) and one from the rest of the keyRight packets to divide the packet. The client then compresses each packet and encrypts it as usual. It is inserted into the right bag first and then into the left bag.
Figure BDA0001986046830000141
Figure BDA0001986046830000142
Figure BDA0001986046830000143
where RowID=RowIDε
(4) Delete (key): searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing by a user, deleting the data, repackaging and uploading the data to the server, and releasing the update lock;
select C[n]from table
where RowID≤key*
order by RowID desc limit 1(updlock)
a client:
Figure BDA0001986046830000144
result_new←Delete from result where key=key
Figure BDA0001986046830000145
a server:
Figure BDA0001986046830000146
where RowID=RowIDε(unlock)
the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (7)

1. A ciphertext data compression storage structure RCPE and a data dynamic read-write method are characterized in that the ciphertext data compression storage structure RCPE and the data dynamic read-write method divide data by using the design idea of RCFile; the invention comprises the optimization of data storage format, data compression method and data reading; the method comprises the steps of dividing data into a plurality of row groups horizontally, wherein the size of each row group is specified by a user; then, each row group is vertically divided; compressing and encrypting the data in the row group according to the columns;
the ciphertext data compression storage structure RCPE and the data dynamic read-write method comprise the following steps:
first, a configuration phase Setup (λ, DB, Q): initializing the original database DB with the size of Q, defining a public parameter lambda and generating a secret key KcAnd KoDividing the data into rows and groups, compressing each column in the rows and groups, carrying out AES encryption, and establishing an inverted index x for the column needing to establish the index; uploading the encoded data S and the index to a cloud server for data storage and query; the configuration stage is completed at the client;
second, Query stage Query (S, x, key, K)c,C[k]): the client processes the query request to generate a query label, and the label and the hash value C [ k ] of the column name of the column to be returned]Sending the data to a server side, searching the ciphertext database by the server, and searching k required column packets
Figure FDA0003454977810000011
Returning to the client; the client carries out decryption and decompression operation on the column packet and searches a matching value;
third, data update phase
Figure FDA0003454977810000012
Client-side carries out order-preserving encryption on keysProcessing, namely generating a label, sending an updating request to a server, searching the matching column packet by the server by using the label provided by the client, returning the matching column packet to the client, and adding an updating lock to the row group; the user carries out decryption and decompression operation on the column packet, repacks the data after the data is changed, uploads the data to the server side, and releases an update lock;
fourth, a data addition phase Insert (S, key, C)value[n],Rowmax): the server matches the query tag generated according to the key provided by the user to the row group, updates the row group, and returns all the data of the whole row group, and the user firstly checks the size of the whole row group; if the size of the row group exceeds a threshold value, executing a data splitting stage, otherwise, directly adding data; after the data are repackaged, uploading the repackaged data to a server for updating, and releasing an updating lock;
fifth, Split stage of data Split Split (RowID, C)pack[n]): splitting all column packets according to the number of the key value pairs in the row group by a rounding method, wherein each column packet is split into a left column packet and a right column packet; covering all the left column packets with the original row groups, and constructing new row groups for all the right column packets;
sixthly, a data deleting stage delete (key): the key is a label of a key generated by the client, and the key is a key of each column; and searching the row group where the data is located, adding an update lock to the matched row group, returning to the client, decrypting and decompressing by the user, deleting the data, repacking and uploading the data to the server, and releasing the update lock.
2. The RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the first step of configuring the algorithm specifically includes:
setup (λ, DB, Q): running at client end, inputting security parameter lambda, and generating key K by using key generation algorithmcAnd Ko
(Kc,Ko) Oid: wherein the input parameters correspond to the input parameters in the configuration algorithm; sequencing the original database DB according to keys, hashing each key, andperforming modulo operation on the hash value of each key, dividing the hash value into eight partitions, and representing the ID of each partition by PartID;
PartID=Hash(key)mod 8
dividing the records in each bucket into line groups, wherein the line group ID is the minimum value of keys in each line group, and carrying out order-preserving encryption on the line groups:
RowID=OPE(keymin,Ko)
in each row group, zlib compression and AES encryption are performed for each column:
Figure FDA0003454977810000021
establishing an index x of the conditional query:
α=H(Cvalue)
β=(PartID||RowID)
x←(α,β) 。
3. the RCPE and data dynamic read-write method in the ciphertext data compression storage structure of claim 1, wherein the Query algorithm Query (S, x, key, K) in the second stepc,C[k]) The method specifically comprises the following queries:
(1) get (key): inputting a key and returning a value corresponding to the key;
a client: inputting the key, carrying out order-preserving encryption on the key to generate the key*And will key*Sending the data to a server;
key*=OPE(key,Ko)
a server side: lookup key*And (3) sequencing the small row groups, and returning a matching column packet in the row group with the largest RowID, wherein the algorithm is described as follows:
select C1,C2,…,Ck from table
where RowID≤key*
order by RowID desc limit 1
a client:
Figure FDA0003454977810000031
Figure FDA0003454977810000032
Return Cvalue[k]
(2)Get(keylow,keyhigh): inputting a range, returning all values in the range;
a client: input (key)low,keyhigh) Performing order-preserving encryption on the obtained product to generate
Figure FDA0003454977810000033
And will be
Figure FDA0003454977810000034
Sending the data to a server;
Figure FDA0003454977810000035
Figure FDA0003454977810000036
a server side: lookup RowID ratio
Figure FDA0003454977810000037
Large ratio of
Figure FDA0003454977810000038
Small groups of rows, add to the result set if
Figure FDA0003454977810000039
Less than minimum RowID in result setminExecute Get (key)low) And adding the result into a result set, and returning a matching column packet in the row group, wherein the algorithm is described as follows:
select C1,C2,…,Ck from table
Figure FDA00034549778100000310
Figure FDA00034549778100000311
Figure FDA00034549778100000312
result←Get(keylow)
a client:
Figure FDA00034549778100000313
Cvalue[k]←Filter(result)
Return Cvalue[k]
(3)
Figure FDA00034549778100000314
searching for one or more key value pairs based on the values required for a particular column match;
a client: input column CiValue of (A)
Figure FDA00034549778100000315
Using SHA256 pairs
Figure FDA00034549778100000316
Carrying out hash and sending a result alpha after the hash to a server side;
Figure FDA00034549778100000317
a server side: searching in the index table to find beta, and then searching in the data table, wherein the algorithm is described as follows:
select value from Indextable
where key=α allow filtering
(RartIDε,RowIDε)←β
Node[i]←PartIDε
select C1,C2,…,Ck from table
where RowID=RowIDε
a client:
Figure FDA0003454977810000041
Figure FDA0003454977810000042
Return Cvalue[k]。
4. the RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the third step includes an algorithm updating process:
Figure FDA0003454977810000043
the column attribute C corresponding to the key in the S databaseiIs updated to
Figure FDA0003454977810000044
The update process is as follows: firstly, a client acquires a column packet of a database by calling a query algorithm, and adds an update lock to a row group:
Figure FDA0003454977810000045
obtaining a query result tau, and calling a decryption and decompression algorithm by the client to perform data processingAnd (3) carrying out decryption and decompression operation on the packet:
Figure FDA0003454977810000046
to pair
Figure FDA0003454977810000047
Updating, performing compression, encryption and packaging on the columns again, uploading to a server side for updating the column packets, and releasing an update lock, wherein the algorithm is described as follows:
a client: select Ckey,Ci from table
where RowID≤key*
order by RowID desc limit 1(updlock)
Figure FDA0003454977810000048
Figure FDA0003454977810000049
Figure FDA00034549778100000410
A server side:
Figure FDA00034549778100000411
where RowID=RowIDε(unlock)。
5. the RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the specific process of adding the algorithm in the fourth step includes:
Insert(S,key,Cvalue[n],Rowmax): generated by the client according to the keyThe label searches for a matched row group, adds an update lock to the row group and returns the row group to the client; firstly, it is necessary to determine whether the number of records in a row group exceeds a threshold value, and if so, the number of records in a row group exceeds the threshold value (Keys)sum>Keysmax) And executing a packet splitting stage, otherwise, directly inserting data, repackaging and uploading to the server side, and releasing an updating lock.
6. The RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the packet splitting algorithm in the fifth step includes:
Split(RowIDε,Cpack[n]): splitting all column packets according to the number of keys by a rounding algorithm, and splitting each column packet into a left packet and a right packet;
Figure FDA0003454977810000051
Figure FDA0003454977810000052
Figure FDA0003454977810000053
where RowID=RowIDε
7. the RCPE and data dynamic read-write method for the ciphertext data compression storage structure according to claim 1, wherein the data deletion algorithm in the sixth step specifically includes:
delete (key): searching the row group where the data is located through a key label generated by the client, adding an update lock to the matched row group, returning the updated lock to the client, decrypting and decompressing by a user, deleting the data, repackaging and uploading the data to the server, and releasing the update lock;
select C[n]from table
where RowID≤key*
order by RowID desc limit 1(updlock)
a client:
Figure FDA0003454977810000054
result_new←Delete from result where key=key
Figure FDA0003454977810000055
a server:
Figure FDA0003454977810000056
where RowID=RowIDε(unlock)。
CN201910165159.1A 2019-03-05 2019-03-05 Ciphertext data compression storage structure RCPE and data dynamic read-write method Active CN110166221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910165159.1A CN110166221B (en) 2019-03-05 2019-03-05 Ciphertext data compression storage structure RCPE and data dynamic read-write method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910165159.1A CN110166221B (en) 2019-03-05 2019-03-05 Ciphertext data compression storage structure RCPE and data dynamic read-write method

Publications (2)

Publication Number Publication Date
CN110166221A CN110166221A (en) 2019-08-23
CN110166221B true CN110166221B (en) 2022-02-22

Family

ID=67644880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910165159.1A Active CN110166221B (en) 2019-03-05 2019-03-05 Ciphertext data compression storage structure RCPE and data dynamic read-write method

Country Status (1)

Country Link
CN (1) CN110166221B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984651A (en) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 Column type storage method, device and equipment based on persistent memory
CN112463866A (en) * 2020-11-25 2021-03-09 上海中通吉网络技术有限公司 Express delivery industry data exporting method, system, computer and storage medium
CN112684986B (en) * 2021-01-05 2023-01-24 中交智运有限公司 Mass data processing method
CN112968706B (en) * 2021-01-29 2023-02-24 上海联影医疗科技股份有限公司 Data compression method, FPGA chip and FPGA online upgrading method
CN117435629A (en) * 2023-12-18 2024-01-23 天津神舟通用数据技术有限公司 Data processing method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682036A (en) * 2015-11-11 2017-05-17 上海汽车集团股份有限公司 Data exchange system and exchange method thereof
US10162832B1 (en) * 2014-09-25 2018-12-25 Imanis Data, Inc. Data aware deduplication

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10162832B1 (en) * 2014-09-25 2018-12-25 Imanis Data, Inc. Data aware deduplication
CN106682036A (en) * 2015-11-11 2017-05-17 上海汽车集团股份有限公司 Data exchange system and exchange method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于 Hadoop 的公交物联网海量采集数据的存储平台设计;张庆;《CNKI中国硕士学位论文全文数据库信息科技辑》;20170315;第27-28页 *
基于云计算的大数据学习性能优化技术研究;黄山;《CNKI中国博士学位论文全文数据库信息科技辑》;20180715;全文 *

Also Published As

Publication number Publication date
CN110166221A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110166221B (en) Ciphertext data compression storage structure RCPE and data dynamic read-write method
Stefanov et al. Practical dynamic searchable encryption with small leakage
US11637689B2 (en) Efficient encrypted data management system and method
US10581603B2 (en) Method and system for secure delegated access to encrypted data in big data computing clusters
Wu et al. VBTree: forward secure conjunctive queries over encrypted data for cloud computing
EP3058678A1 (en) System and method for dynamic, non-interactive, and parallelizable searchable symmetric encryption
Rashid et al. A secure data deduplication framework for cloud environments
US9659190B1 (en) Storage system configured for encryption of data items using multidimensional keys having corresponding class keys
Zheng et al. MiniCrypt: Reconciling encryption and compression for big data stores
Ocansey et al. Dynamic searchable encryption with privacy protection for cloud computing
WO2018208786A1 (en) Method and system for secure delegated access to encrypted data in big data computing clusters
Kim et al. Survey on Data Deduplication in Cloud Storage Environments.
KR101129335B1 (en) Data distribution storing and restoring methods and apparatuses
Abdelraheem et al. Executing boolean queries on an encrypted bitmap index
Huang et al. EBD-MLE: enabling block dynamics under BL-MLE for ubiquitous data
EP3959841B1 (en) Compression and oblivious expansion of rlwe ciphertexts
Waage et al. Practical application of order-preserving encryption in wide column stores
US11669506B2 (en) Searchable encryption
Pramanick et al. Searchable encryption with pattern matching for securing data on cloud server
Yan et al. Cloud storage security deduplication scheme based on dynamic bloom filter
Gordon et al. Analysis of path ORAM toward practical utilization
CN115718926B (en) Method for dynamically distributing dual-system isolated file system
Tekin et al. Implementation and evaluation of improved secure index scheme using standard and counting bloom filters
Sion et al. Conjunctive keyword search on encrypted data with completeness and computational privacy
Nikolopoulos Efficient private information retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant