CN109165226B

CN109165226B - Searchable encryption method for ciphertext large data set

Info

Publication number: CN109165226B
Application number: CN201811194140.1A
Authority: CN
Inventors: 周福才; 贾强; 秦诗悦; 张宗烨
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2021-03-02
Anticipated expiration: 2038-10-15
Also published as: CN109165226A

Abstract

The invention provides a searchable encryption method for a ciphertext large-scale data set, and relates to the technical field of the Internet. The method comprises the following steps: the data owner completes the file uploading process: preprocessing an original file set F, dividing ciphertext data into N parts equally and uploading the N parts to a data server

Uploading the encrypted index to an index server S_I(ii) a The data owner completes the keyword search process: to index server S_IIssuing a search token τ for a keyword w_w；S_IAccording to τ_wAnd the data server where the safety index DB calculates w

Returning the ciphertext data to the data owner; the data owner completes the file downloading process: and downloading the ciphertext data set corresponding to the keyword w by the data owner, and decrypting by using the key to obtain the data file set. The invention optimizes the data structure of the security index and adopts an indirect addressing mode, so that the good search time complexity can still be kept under the condition of overlarge security index, and the acceptable range is reached.

Description

Searchable encryption method for ciphertext large data set

Technical Field

The invention relates to the technical field of internet, in particular to a searchable encryption method for a ciphertext large-scale data set.

Background

With the rapid development of cloud computing, the cloud storage technology is widely applied, and users gradually migrate data to a cloud server to avoid local huge storage overhead and cumbersome data management and obtain more convenient services. The openness and sharing of the cloud itself also pose a significant challenge to the security of data stored in a distributed environment. In order to ensure data security and user privacy, data is generally stored in a cloud server in a form of a ciphertext. However, after the plaintext data is encrypted into the ciphertext, although confidentiality and security of the data are guaranteed, original characteristics of a plurality of plaintext data are lost, so that keyword search on the ciphertext becomes a difficult problem. The Searchable Encryption (SE) technology is a cryptology primitive developed in recent years and supporting keyword search on a ciphertext, which saves a large amount of computing and network overhead for users, and makes full use of distributed storage and computing power in a cloud environment to search keywords on the ciphertext. With the development of cloud computing, under the application scenarios of massive users and massive data, providing a safe, flexible and efficient SE mechanism will be one of the targets that researchers pursue to the utmost.

In the searchable encryption scheme, a user firstly encrypts data by using an encryption algorithm and stores a ciphertext into a cloud server; when a user initiates a search request, a keyword trapdoor is sent to a cloud server, the server conducts heuristic matching on each file through the received trapdoor, and if the matching is successful, the keyword is contained in the description file; and finally, the server sends the matched file ciphertext back to the user, and the user only needs to decrypt the returned file. In terms of security, the cloud server does not obtain any information of searched keyword content and plaintext except that the access mode, the search mode, the file ciphertext, the ciphertext size, the file number and the like are obtained.

While most of the current indexes in symmetric searchable encryption schemes theoretically have the best search time, the performance performed on large data sets is not ideal. Also, I/O latency, storage utilization, and dataset distributed storage all reduce the practical performance of the symmetric searchable encryption scheme. When large-scale data sets are faced, the constructed security index is too large, and the security index is used for sequentially matching keywords for searching, which is an important reason that the searching efficiency is low in practice.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a searchable encryption method for ciphertext large data sets, which optimizes the storage structure of the security index by the idea of performing indirect addressing on indexes in a hierarchical manner in the security index generation algorithm, so that a good time complexity is still maintained when the security index is too large.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a searchable encryption method for ciphertext large data sets comprises the following specific steps:

step 1: the data owner completes the file uploading process at the client; the data owner firstly preprocesses an original file set F, wherein the preprocessing comprises the steps of generating ciphertext data by using symmetric encryption, carrying out semantic analysis on the original file set, extracting key words, constructing an inverted index for the key words and generating a safety index DB; after file preprocessing, the data owner equally divides the ciphertext data into N parts and uploads the N parts of ciphertext data to the data server

And uploads the encrypted index to the index server S_I；

Step 2: the data owner completes the keyword search process; the process comprises searching and file updating;

the file updating process comprises file addition and file deletion; after the file is added or deleted, the search process for the search keyword w is converted into the search process for D + D_add-D_delWherein D is a dictionary not containing file additions and deletions, D_addDictionary added for files, D_delMerging the search results of the three parts and returning the merged search results to the ciphertext data set;

the search process is for the data owner to index server S_ISending a search request for the keyword w and sending the search request to the index server S_IProviding a search token τ of w_w；S_IAccording to τ_wAnd the data server where the safety index DB calculates w

Wherein nu is more than or equal to 1 and less than or equal to N;

returning τ to data owner_wCorresponding ciphertext data set

Wherein

The total number of the ciphertext data is;

and step 3: the data owner completes the file downloading process at the client; in the process of downloading the file, the data owner downloads the ciphertext data set corresponding to the keyword w

Decryption using a secret key

Get a data file set containing w

The step 1 comprises the following substeps:

step 1.1: a data owner generates a secret key K at a client through an initialization algorithm, and then ciphertext data c are generated through symmetric encryption;

step 1.1.1: input a security parameter k, where k ∈ {0,1}^k；

Step 1.1.2: using pseudo-random number generators

Generating 3 random numbers K₁,K₂,K₃A key that is a pseudo-random function PRF;

wherein the pseudo-random function PRF is represented as

Step 1.1.2.1: inputting client passwordKey K is equal to {0,1}^kAnd the keyword w ∈ {0,1}^*Outputting the encryption key K generated corresponding to w₁∈{0,1}^kAnd K₂∈{0,1}^k；

Step 1.1.2.2: by K₃←SKE.Gen(1^k) Calculating to obtain a key of a symmetric encryption algorithm, wherein the key is used for encrypting the original file set F; SEK (Gen, Enc, Dec) is a symmetric encryption scheme, where Gen denotes a key generation algorithm, Enc denotes an encryption algorithm, and Dec denotes a decryption algorithm.

Step 1.1.3: output K ═ K (K)₁,K₂,K₃) As a key;

step 1.2: encrypting a file, namely inputting an original file set F, and encrypting the F into ciphertext data c by using a symmetric encryption algorithm;

step 1.2.1: inputting an original file set F;

step 1.2.2: for file F in F_ηExecute by

Wherein 0<Eta is less than or equal to | F |, to generate cipher text c_η，c_η∈c；

Step 1.2.3: c is divided into N parts and sent to a data server

Step 1.3: the data owner carries out semantic analysis on the F, extracts keywords w, constructs an inverted index for the w and generates a safety index DB, after the DB is classified, an array A and a Block Block for storing Block information are obtained, a list L is created, the Block and an encryption tag generated by the K are stored in the L, and the L is uploaded to the S_IExecuting D ← Create (L) to generate a dictionary D and outputting K, D and A; the method comprises the following specific steps:

step 1.3.1: generating an array A for storing data of the first time blocking of the inverted index and a list L for storing a pointer of the second time blocking;

step 1.3.2: for each keyword w, perform K₁,K₂Ae of ae, e of e, e of₁And K₂；

Step 1.3.3: determining the safe index DB and safe index blocking parameters B and B, and dividing DB (w) into three types of Small, Medium and Large according to the inverted index length | DB (w) | of the keyword w:

step 1.3.3.1: the secure index

Then, the number Num of the blocks is taken_BS1, namely, the blocking operation is not needed; when | DB (w) |<When b is performed, random data filling is performed on DB (w), the size of b is filled up, and the Block is recorded as Block_S(ii) a Execute

L ← (α, β), uploading L to S_I；

Step 1.3.3.2: the secure index

Taking the number of blocks

Num_BMB is less than or equal to b; when the last block is less than the size of B, the size of B is supplemented;

BM for each block_i，1≤i≤Num_BMComputing its label using symmetric encryption

Will be provided with

Is randomly stored to S_IIn, its pointer is noted as

Obtain the binary group

Wherein i is more than or equal to 1 and less than or equal to Num_BMThe process actually performs an indirect addressing operation;

creating an array A of

Writing in A; partitioning A according to the size of b, and taking the number of the partitions

Due to Num_BMB is less than or equal to b, Num is obtained_bMLet 1 denote this Block as Block_M(ii) a If the block size is less than b, random data filling is carried out, and the block size is filled to b; execute

L←(α,β)

Uploading L to S_I；

Step 1.3.3.3: the secure index

Taking the number of blocks

b<Num_BL≤Bb；

After the array A is obtained by calculation, Num in A is paired_BLThe stripe data continues to be subjected to blocking operation according to the size B, and secondary indirect addressing is carried out; number of blocks

Num_BL' < b > is less than or equal to; filling random numbers in the last block which is less than B in size, and filling the random numbers to B in size;

for each block BL_j'，1≤j≤Num_BL' calculating its label using a symmetric encryption algorithm

Will be provided with

Is randomly stored to S_IIn, its pointer is noted as

Obtain the binary group

Wherein j is more than or equal to 1 and less than or equal to Num_BL'；

Creating an array A of

Writing in A; partitioning A according to the size of b, and taking the number of partitions

Due to Num_BL' < b > to obtain Num_bM1, the Block is recorded as Block_L(ii) a If the size of the block is less than b, random data filling is carried out, and the block is filled to b; execute

L←(α,β)

Uploading L to S_I；

Step 1.3.4: l upload to S_IThen, D ← Create (L) is executed to generate a dictionary D;

step 1.3.5: outputting K, D and A;

step 1.4: generating trapdoor tau corresponding to keyword w_wMainly comprises the following steps:

step 1.4.1: inputting a search keyword w;

step 1.4.2: execute

Wherein tau is₁、τ₂∈τ_w；

Step 1.4.3: will (tau)₁,τ₂) As a search token for w, is uploaded to S_I；

The step 2 comprises the following substeps:

step 2.1: input (tau)₁,τ₂) And DB;

step 2.2: execute

Obtaining the classification of DB (w);

step 2.2.1: when in use

When it is executed

Step 2.2.2: when in use

When it is executed

Step 2.2.3: when in use

When it is executed

Step 2.2.4: output (tau)₁,τ₂) Corresponding ciphertext data set

The step 3 comprises the following steps:

step 3.1: file decryption, input of ciphertext data set

Using a symmetric encryption algorithm

Reverting to a fileset containing w

Step 3.1.1: inputting ciphertext data

Step 3.1.2: for ciphertext

Execute

Restoring corresponding data

Step 3.1.3: outputting a data file set comprising w

Adding the files in the step 2: inputting an original file set F to be added at a client_addExecute Enc_K(F_add) Generating a ciphertext c_addAnd uploaded to S_F(ii) a Inputting a set W of keywords to be added_addAnd an inverted index set DB (W)_add) (ii) a Executing Enc_K(W_add,DB(W_add) To produceRaw L_addAnd uploaded to S_I(ii) a Output K, D_add,A_add；

And deleting the file: inputting an original file set F to be deleted at a client_delExtracting F_delSet of keywords W_delAnd generates an inverted index set DB (W)_del) Execute Enc_K(W_del,DB(W_del) To produce L_delAnd uploaded to S_IOutput K, D_del,A_del。

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the invention provides a searchable encryption method for a large ciphertext data set, which optimizes a data structure of a security index by using the idea of partitioning the security index, and directly or indirectly addresses in a keyword search process according to the size of the security index, thereby overcoming the defect that the whole security index needs to be traversed in the traditional searchable encryption scheme. With the increase of the security index, when the size of the security index exceeds a certain threshold, the search time is not increased linearly any more, but is decreased to be increased sub-linearly, so that the keyword search efficiency is improved.

Drawings

Fig. 1 is a schematic diagram of a system model of a searchable encryption method for a large ciphertext data set according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a keyword-document inverted index structure according to an embodiment of the present invention;

fig. 3 is a graph illustrating a relationship between search time and a security index size of a searchable encryption method for a large ciphertext data set according to an embodiment of the present invention;

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

As shown in fig. 1, the method of the present embodiment is as follows.

A searchable encryption method for ciphertext large data sets comprises three types of entities: one is the data owner (holding the original file set, security index, key trap door, key), one is the index server (holding the security index), and one is the data server (holding the encrypted data set). Firstly, a data owner completes encryption operation on an original file set locally, and uploads an encrypted file and a security index to a data server and an index server respectively; when performing keyword search, the data owner sends a keyword search request to the index server; then, the index server finds a data server where the ciphertext corresponding to the search keyword is located according to the security index; finally, the data server returns the search results to the data owner.

The method comprises a key generation algorithm setup (k) and a file encryption algorithm Enc_K(F) Secure index generation algorithm Enc_K(W, DB (W)), trap door generation algorithm SToken_K(w), Search algorithm Search ((τ)₁,τ₂) I), Update algorithm Update_K(add, del) and File decryption Algorithm Dec_K(c) The method comprises the following specific steps:

And uploads the encrypted index to the index server S_I；

the file updating process comprises file addition and file deletion; after the file is added or deleted, the search process for the search keyword w is converted into the search process for D + D_add-D_delWherein D is a dictionary not containing file additions and deletions, D_addDictionary added for files, D_delDeleting filesThe divided dictionaries are combined and returned to the ciphertext data set;

Wherein nu is more than or equal to 1 and less than or equal to N;

returning τ to data owner_wCorresponding ciphertext data set

Wherein

The total number of the ciphertext data is;

Decryption using a secret key

Get a data file set containing w

The step 1 comprises the following substeps:

step 1.1.1: input a security parameter k, where k ∈ {0,1}^k；

Step 1.1.2: using pseudorandom number generationDevice for cleaning the skin

wherein the pseudo-random function PRF is represented as PRF: {0,1}^k×{0,1}^*→{0,1}^k；

Step 1.1.2.1: input client key K e {0,1}^kAnd the keyword w ∈ {0,1}^*Outputting the encryption key K generated corresponding to w₁∈{0,1}^kAnd K₂∈{0,1}^k；

Step 1.1.3: output K ═ K (K)₁,K₂,K₃) As a key;

in the embodiment, no third party is involved except for the client (data owner) and the server (index server and data server), the client key is generated at the client through an initialization algorithm, and a key distribution process is not involved. However, if the client key is lost, the client key cannot interact with the server, so that the previously uploaded document cannot be obtained, and the data of the client key is stolen.

step 1.2.1: inputting an original file set F;

step 1.2.2: for file F in F_ηExecute by

Step 1.2.3: c is divided into N parts and sent to a data server

Step 1.3: the data owner carries out semantic analysis on the F, extracts keywords w, constructs an inverted index for the w and generates a safety index DB, after the DB is classified, an array A and a Block Block for storing Block information are obtained, a list L is created, the Block and an encryption tag generated by the K are stored in the L, and the L is uploaded to the S_IExecuting D ← Create (L) to generate a dictionary D and outputting K, D and A;

in the embodiment, aiming at the scene of global search, research on a generation process of a security index in a traditional searchable encryption scheme discovers that the time generated by a traditional security index generation algorithm in the search process is mainly caused by traversing the security index, so that the time complexity is required to be reduced only by reducing the traversal time of the security index, and one mode is to optimize a storage structure of the security index. The traditional secure index generation algorithm is modified, and some identifiers are encrypted in each ciphertext. Specifically, a block of size B is fixed, when constructing the result list, B identifiers are processed at a time, the last block identifier is filled to the same length, and is encapsulated into a ciphertext d, and the same tag is used. The search process is exactly the same as before, except that the server decrypts and parses the results on a block-by-block basis, rather than individually.

In order to reduce the time for retrieving the safety index, an index increasing mode is adopted, namely the inverted index is partitioned according to the size B, and each piece of information is extracted to form a tag for searching. At this time, if the total number of the information blocks is divided into t blocks, the information block where the keyword is located can be found by one search, and then the corresponding file information can be found. This is the first blocking, and then the data after the blocking is again blocked by b size. Similar to the first blocking process, the tag of the block extracted at this time is stored in L. As shown in fig. 2, the specific steps are as follows:

step 1.3.2: for each oneA key word w, execute K₁,K₂Ae of ae, e of e, e of₁And K₂；

step 1.3.3.1: the secure index

Uploading L to S_I；

Step 1.3.3.2: the secure index

Taking the number of blocks

Will be provided with

Is randomly stored to S_IIn, its pointer is noted as

Obtain the binary group

creating an array A of

L←(α,β)

Uploading L to S_I；

Step 1.3.3.3: the secure index

Taking the number of blocks

b<Num_BL≤Bb；

Will be provided with

Is randomly stored to S_IIn, its pointer is noted as

Obtain the binary group

Wherein j is more than or equal to 1 and less than or equal to Num_BL'；

Creating an array A of

Due to Num_BL'B is less than or equal to b, Num is obtained_bM1, the Block is recorded as Block_L(ii) a If the size of the block is less than b, random data filling is carried out, and the block is filled to b; execute

L←(α,β)

Uploading L to S_I；

step 1.3.5: outputting K, D and A;

step 1.4.1: inputting a search keyword w;

step 1.4.2: execute

Wherein tau is₁、τ₂∈τ_w；

Step 1.4.3: will (tau)₁,τ₂) As a search token for w, is uploaded to S_I；

The step 2 comprises the following substeps:

step 2.1: input (tau)₁,τ₂) And DB;

step 2.2: execute

Obtaining the classification of DB (w);

step 2.2.1: when in use

When it is executed

Step 2.2.2: when in use

When it is executed

Step 2.2.3: when in use

When it is executed

Step 2.2.4: output (tau)₁,τ₂) Corresponding ciphertext data set

The step 3 comprises the following steps:

step 3.1: file decryption, input of ciphertext data set

Using a symmetric encryption algorithm

Reverting to a fileset containing w

Step 3.1.1: inputting ciphertext data

Step 3.1.2: for ciphertext

Execute

Restoring corresponding data

Step 3.1.3: outputting a data file set comprising w

Adding the files in the step 2: inputting an original file set F to be added at a client_addExecute Enc_K(F_add) Generating a ciphertext c_addAnd uploaded to S_F(ii) a Inputting a set W of keywords to be added_addAnd reverse index setDB(W_add) (ii) a Executing Enc_K(W_add,DB(W_add) To produce L_addAnd uploaded to S_I(ii) a Output K, D_add,A_add；

And deleting the file: inputting an original file set F to be deleted at a client_delExtracting F_delSet of keywords W_delAnd generates an inverted index set DB (W)_del) Execute Enc_K(W_del,DB(W_del) To produce L_delAnd uploaded to S_IOutput K, D_del,A_del；

In this embodiment, the keyword search performance is studied by the search time of the security indexes of different sizes. The size of the security index reflects the size of the logarithm of the mapping relationship between the keywords and the file information to a certain extent, and the number of the logarithm of the mapping relationship between the keywords and the file information is increased with the increase of the security index.

This embodiment uses a part of news data generated from month 6 to month 7 of 2012 provided by a certain laboratory and public mail data of a certain company. In an embodiment, under the same search keyword set, 5 sizes of secure indexes are used for the search query, and the sizes are as shown in table 1:

TABLE 1 secure index Classification Table

Classification	A	B	C	D	E
						size/Kb	100	200	300	400	500

In the embodiment, in the traditional scheme, no storage structure processing is carried out on the security index, and the traversal operation is directly carried out on the security index during searching; in the scheme provided by the invention, the block structure processing is carried out once before the safety index is stored, and the searching is completed through the safety index block storage structure. By comparing the difference results of the keyword search time caused by the difference of the sizes of the five security indexes, in the search process, compared with the traditional method for traversing the whole security index, when the security index is smaller, the advantages of the method are not obvious, even the search time is slightly higher than that of the traditional traversing security index scheme, but with the increase of the security index, the advantages of the scheme are gradually shown, and the search time is shorter and shorter compared with that of the traditional method, as shown in fig. 3.

The invention optimizes the data structure of the security index by using the idea of partitioning the security index, and directly or indirectly addresses in the keyword searching process according to the size of the security index, thereby overcoming the defect that the whole security index needs to be traversed in the traditional searchable encryption scheme. With the increase of the security index, when the size of the security index exceeds a certain threshold, the search time is not increased linearly any more, but is decreased to be increased sub-linearly, so that the keyword search efficiency is improved.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims

1. A searchable encryption method for ciphertext large-scale data sets is characterized in that: the method comprises the following steps:

And uploads the encrypted index to the index server S_I(ii) a The method specifically comprises the following steps:

step 1.1.1: input a security parameter k, where k ∈ {0,1}^k；

Step 1.1.2: using pseudo-random number generators

Step 1.1.2.2: by K₃←SKE.Gen(1^k) Calculating to obtain a key of a symmetric encryption algorithm, wherein the key is used for encrypting the original file set F; SEK(Gen, Enc, Dec) is a symmetric encryption scheme, where Gen denotes a key generation algorithm, Enc denotes an encryption algorithm, and Dec denotes a decryption algorithm;

step 1.1.3: output K ═ K (K)₁,K₂,K₃) As a key;

step 1.2.1: inputting an original file set F;

step 1.2.2: for file F in F_ηExecute by

Step 1.2.3: c is divided into N parts and sent to a data server

step 1.3.3.1: the secure index

L ← (α, β), uploading L to S_I；

Step 1.3.3.2: the secure index

Taking the number of blocks

Will be provided with

Is randomly stored to S_IIn, its pointer is noted as

Obtain the binary group

creating an array A of

L←(α,β)

Uploading L to S_I；

Step 1.3.3.3: the secure index

Taking the number of blocks

b<Num_BL≤Bb；

Num_BL' < b > is less than or equal to; the last block is less than B size, and random number filling is performedThe size of B is filled;

Will be provided with

Is randomly stored to S_IIn, its pointer is noted as

Obtain the binary group

Wherein j is more than or equal to 1 and less than or equal to Num_BL'；

Creating an array A of

L←(α,β)

Uploading L to S_I；

step 1.3.5: outputting K, D and A;

step 1.4.1: inputting a search keyword w;

step 1.4.2: execute

Wherein tau is₁、τ₂∈τ_w；

Step 1.4.3: will (tau)₁,τ₂) As a search token for w, is uploaded to S_I；

Wherein nu is more than or equal to 1 and less than or equal to N;

returning τ to data owner_wCorresponding ciphertext data set

Wherein

The total number of the ciphertext data is;

the method specifically comprises the following steps:

step 2.1: input (tau)₁,τ₂) And DB;

step 2.2: execute

Obtaining the classification of DB (w);

step 2.2.1: when in use

When it is executed

c_res←Get(S_F；Block_S)；

Step 2.2.2: when in use

When it is executed

Step 2.2.3: when in use

When it is executed

Step 2.2.4: output (tau)₁,τ₂) Corresponding ciphertext data set

Decryption using a secret key

Get a data file set containing w

2. The searchable encryption method for the large ciphertext data set according to claim 1, wherein: the step 3 comprises the following steps:

step 3.1: file decryption, input of ciphertext data set

Using a symmetric encryption algorithm

Reverting to a fileset containing w

Step 3.1.1: inputting ciphertext data

Step 3.1.2: for ciphertext

Execute

Restoring corresponding data

Step 3.1.3: outputting a data file set comprising w

3. The searchable encryption method for the large ciphertext data set according to claim 1, wherein: adding the file: inputting an original file set F to be added at a client_addExecute Enc_K(F_add) Generating a ciphertext c_addAnd uploaded to S_F(ii) a Inputting a set W of keywords to be added_addAnd an inverted index set DB (W)_add) (ii) a Executing Enc_K(W_add,DB(W_add) To produce L_addAnd uploaded to S_I(ii) a Output K, D_add,A_add；