CN113722366B

CN113722366B - Safety data retrieval method based on careless ciphertext inverted index

Info

Publication number: CN113722366B
Application number: CN202111072425.XA
Authority: CN
Inventors: 吴志强; 蔡竹斌; 李文军; 李睿; 张经宇; 郑少莹; 王进
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2021-09-14
Filing date: 2021-09-14
Publication date: 2024-05-03
Anticipated expiration: 2041-09-14
Also published as: CN113722366A

Abstract

The invention discloses a secure data retrieval method based on careless ciphertext inverted index, which comprises the following steps: step 1, constructing an unintentional ciphertext inverted index; step 2, writing the data into an unintentional ciphertext inverted index; step 3, searching cloud data according to a communication protocol, wherein step 1 comprises the following steps: the method comprises the steps of constructing an ORAM tree of a cloud inadvertent random access tree, constructing a user side buffer and constructing a user side keyword hash table, wherein the cloud inadvertent random access tree is used for storing ciphertext inverted indexes, and the user side buffer and the user side keyword hash table are used for temporarily storing data; the step 2 comprises the following steps: and writing the data in the user side buffer area into an ORAM tree path of the cloud through an eviction operation.

Description

Safety data retrieval method based on careless ciphertext inverted index

Technical Field

The invention belongs to the technical field of information security, and particularly relates to a security data retrieval method based on an inadvertent ciphertext inverted index, which can be applied to a security cloud computing network.

Background

For inexpensive storage resources, many enterprises outsource sensitive data to cloud servers. Users typically employ a dynamically searchable encryption (Dynamic Searchable Encryption, DSE) scheme to encrypt data to protect personal privacy. DSEs provide efficient encrypted data search, data update services while protecting users' sensitive information, such as files and queried content. DSEs typically achieve excellent query and update efficiency at the cost of leaky search mode and access mode. The search mode (SEARCH PATTERN) mainly refers to frequency privacy information generated by searching a certain keyword by a user, and the access mode (ACCESS PATTERN) mainly contains result privacy information searched by the user. However, a recent series of attacks has shown that data query privacy can be leveraged by hackers to reveal user sensitive data. The more queries that are initiated, the more leakage that is incurred, and in extreme cases even the cloud can be made aware of all user query information. The conventional method of hiding the search and access modes is to use an inadvertent random access machine (Oblivious Random ACCESS MACHINE, ORAM) technology, which is an encryption method of remote read-write data with relatively high overhead, and the search and access modes are not revealed by constantly changing the data storage locations and re-encrypting the accessed data when accessing the data. The direct use of ORAM to construct an inadvertent DSE scheme encounters new problems such as large client location Map (Position Map), multiple interactions with cloud servers per update, and high computational overhead.

Traditional inverted indexes can quickly map keywords into a set of keyword-file identifiers, i.e., mapping of file identifiers to keywords into keyword-to-file identifier mappings, keywords to a set of files. Typically inverted indexing can reduce the time to find from keywords. The inverted index needs to be combined with ORAM and other technologies to better protect the search mode and access mode.

The invention provides a secure data retrieval method based on careless ciphertext inverted index, which has the following advantages and technical effects: 1) The single round trip interaction is realized, and the user can obtain data or update data only by sending a request and receiving a response; 2) The strong forward and backward privacy is realized, and the search mode, the update mode and the operation type are not exposed in any data inquiry; 3) Support large-scale insertion and insertion efficiency is high.

Disclosure of Invention

The invention is realized by adopting the following technical scheme:

A secure data retrieval method based on careless ciphertext inverted index comprises the following steps: step 1, constructing an unintentional ciphertext inverted index; step 2, writing the data into an unintentional ciphertext inverted index; step 3, searching cloud data according to a communication protocol, wherein,

The step 1 comprises the following steps: the method comprises the steps of constructing an ORAM tree of a cloud inadvertent random access tree, constructing a user side buffer and constructing a user side keyword hash table, wherein the cloud inadvertent random access tree is used for storing ciphertext inverted indexes, and the user side buffer and the user side keyword hash table are used for temporarily storing data;

The step 2 comprises the following steps: and writing the data in the user side buffer area into an ORAM tree path of the cloud through an eviction operation.

The security data retrieval method based on the careless ciphertext inverted index comprises the following steps: the ORAM tree is an encrypted full binary tree, the ORAM tree with the height L has (2 ^L -1) nodes in total, each tree node stores Z triples (key, value, leaf), wherein (key, value) is a key value pair with fixed size, the stored data is derived from an encrypted inverted index, the value is called a data block, and leaf is a leaf identifier for identifying the path from a leaf node to a root node of the current triplet; in the ORAM tree, a path from a root node to any node is represented by a character string, from the root node, a left branch represents '0', a right branch represents '1', and a 01 character string formed by connecting paths represents a path; tree nodes in the ORAM tree are encoded according to the following rule, the root node is encoded as 0, the node encoded as x, the left child is encoded as (2x+1), and the right child is encoded as (2x+2); ORAM tree with height L can store Z (2 ^L -1) triples at most, and the leaf value ranges from (2 ^L-1 -1) to (2 ^L -2) according to the coding rule.

The security data retrieval method based on the careless ciphertext inverted index comprises the following steps: the user side keyword hash table refers to a data structure capable of mapping a keyword w into keyword information KI, and is denoted by ht, wherein the keyword information KI is composed of two parts, i.e., ki= (wLength, sCounter), wLength and sCounter, where wLength is the number of data blocks obtained by searching the keyword w, sCounter refers to the number of times the keyword w is searched and updated, and sCounter = sCounter +1 is executed every time the user searches or updates the keyword w, and is used for generating different pseudo random numbers.

The security data retrieval method based on the careless ciphertext inverted index comprises the following steps: the user end buffer area is a stateful hash table, a triplet (value, leaf) is stored according to a key, namely a triplet t is given, and the buffer area stores the triplet in the buffer area according to the key t.key; after an eviction operation, the state of the triplet is recorded in the buffer, i.e. whether the triplet is in the t.leaf to root path of the ORAM tree or in the buffer, and for any triplet t it exists either in the t.leaf to root path or in Stash.

The secure data retrieval method based on the careless ciphertext inverted index, wherein the expelling operation in the step 2 comprises the following steps:

Let x be an accessed leaf node identifier, P (x, j) denote the path from the root node to the j-th level node in the path from node x to the root, and for any triplet in any buffer, the triplet in the buffer that can be evicted to the cloud ORAM tree satisfies equation 1:

P(x,j)＝P(triplet.leaf,j) (1)

Leaf calculation was performed: assuming that the result set of the key w after the inverted index processing is DB (w) = { id ₁,id₂,…,id_r }, where each id is a file identifier, let F be a hash function with a key, Is the key of the user, c is sCounter in the hash table ht value of the key, and is used for recording the number of times the key w is searched and updated, and is marked as c=ht [ w ]. SCounter, and the ith file identifier id _i meets the following principles: the key-file identifier pair (w, id _i) is either uploaded to the path of leaf node x _i to the root node or remains in the buffer, and leaf x _i is calculated by equation 2:

Wherein the computed result token (w, i, c) is the leaf identifier for the search key-file identifier pair (w, id _i), (2 ^L-1 -1) is the value of the first leaf identifier of the cloud ORAM tree, and (F _k(w||i||c)％2^L ^-1) is an integer value computed using the keyed hash function F and 2 ^L-1.

The secure data retrieval method based on the careless ciphertext inverted index, wherein the eviction operation in the step 2 comprises K nearest neighbor eviction:

step A, initializing a hash table pathNodes for storing tree nodes: defining a hash table pathNodes for temporarily storing tree node information to be uploaded to a cloud ORAM tree, wherein an index of the hash table is a character string formed by a path from a root node to a current node, and represents path information through which a corresponding node in the ORAM tree can be found;

step B, copying the triples in the buffer area into an ordered array, sorting the triples, and storing the triples in the buffer area at the user end in the ordered array, wherein the elements in the array are sorted according to leaves;

Step C, performing the following operations on paths corresponding to a group of leaf nodes from the bottom layer to the upper layer, wherein KNNEA is input as a group of leaf nodes leave= { x ₁,x₂,…,x_r }, each leaf is a leaf, and the paths from r leaf nodes to the root node are corresponding;

Step D, searching K triples T nearest to the leaf in the array, and judging one by one, wherein the process description of searching the K triples T nearest to the leaf in the array is shown as step E, and the judging process description is shown as step F;

Step E, for a node N in a path, selecting a triplet in (Z+1) leaf less than or equal to the current leaf value from leaf to the left, and selecting a triplet in (Z+1) leaf greater than or equal to the current leaf value from leaf to the right, wherein a total of K= (2Z+1) nearest neighbor triples are selected from a user side buffer as node N candidate triples T;

Step F, determining whether the triplet can be uploaded to the tree node by judging whether the triplet meets the formula (1) in the eviction principle, namely comparing whether P (leaf, j) is equal to P (triplet, leaf, j), wherein j represents the number of layers from the bottommost layer of the tree to the topmost layer of the tree, namely j epsilon [ L,1], the number of layers is reduced by 1 after the nodes at the bottommost layer are filled, if the triplet can not be uploaded to the tree node, repeating the step to judge the next triplet, otherwise, continuing to judge whether pathNodes is full, if pathNodes is not full, storing the triplet in pathNodes, deleting the stored triplet from the ordered number group and the buffer area, and repeating the step until all triples in the K triples T are judged to be finished;

And G, encrypting pathNodes and uploading to the cloud.

The secure data retrieval method based on the careless ciphertext inverted index, wherein the eviction operation in the step 2 comprises a partition-based eviction operation:

Step A, sorting the triplets in the buffer area and the input group of leaf identifiers leave according to the size of the leaf value;

b, dividing the leaf identifier range into a group of partitions with fixed sizes, wherein the sizes of the partitions are powers of 2;

Step C, respectively distributing the triples and the reaves in the buffer areas to the corresponding subareas according to the leaf values to obtain a group of non-empty subareas and a group of non-empty paths;

Step D, initializing a hash table global variable pathNodes for storing tree nodes;

And E, executing the following operations on each non-space partition: invoking KNNEA ^* to evict the path of the triplet in the non-space partition to the corresponding non-space partition and saving in pathNodes, KNNEA ^* referring to performing data operations in the same manner as B, C, D, E, F steps in the K nearest neighbor eviction algorithm;

step F, calling KNNEA to expel the rest triples;

And G, encrypting pathNodes and uploading to the cloud.

The security data retrieval method based on the careless ciphertext inverted index comprises the following steps:

Step A, generating leaf identifiers, wherein a user creates a group of empty blocks for storing result data, the user generates a group of leaf identifiers leave through a token (w) method and sends the leaf identifiers leave to the cloud, the token (w) method generates a group of leaf identifiers leave through a leaf calculation method token (w, i, c) according to keyword information in a keyword hash table, wherein i refers to an ith keyword-file identifier, and c refers to the number of times the keyword w is searched and updated;

Step B, reading tree nodes, calling READPATHS (tree, { x ₁,…,x_r }) sub-method to read a group of tree nodes in the ORAM tree, taking a group of leaf identifiers leave= { x ₁,…,x_r } generated in the previous step as input, and returning the read tree nodes to a user; the READPATHS algorithm is specifically described as follows: a group of leaf identifiers { x ₁,…,x_r } are transmitted, all nodes in the paths from all leaf nodes to the root node are sequentially obtained according to the leaf identifiers and stored in a node set nodes, and finally a group of tree nodes are obtained;

C, reconstructing a leaf identifier, decrypting tree nodes read from the cloud to obtain triples stored in the nodes, storing the triples into a local buffer area according to keys, executing htw. sCounter +1 operation by a user every time search operation access w is executed, mapping the leaf identifier into a new random leaf identifier, generating a unique key value for each data block corresponding to the key by using a key (w, i) method by the user, and calculating and generating the new leaf identifier by using the key (w, i, C) method, wherein the value of the key (w, i) is equal to G (w||i), G is a pseudo-random function, and the value range of i is [1, htw ]. WLength ];

step D, expelling nodes, wherein a user acquires an identifier set from the acquired data block, wherein the identifier set is obtained by calling KNNEA an algorithm by the searched final result user, and the data in the buffer area are re-encrypted and expelled to the cloud;

And E, updating the nodes, wherein the cloud replaces the nodes in the original path with the new nodes.

Drawings

FIG. 1 is a diagram of an exemplary configuration of an OBI;

FIG. 2 is a flow chart of the KNNEA algorithm;

FIG. 3 is a flow chart of the PBEA algorithm;

fig. 4 is a flowchart of a search key.

Detailed Description

The following describes specific embodiments of the present invention in detail with reference to the drawings.

The following describes embodiments of the present invention in detail with reference to fig. 1-4.

The invention relates to both a user and a cloud server. And the user encrypts the private data, establishes an unintentional ciphertext inverted index for the private data, and stores the encrypted private data in the cloud server. And the user communicates with the cloud server through a specific communication protocol to realize private dynamic query data and dynamic update data. Assuming the user is trusted, the cloud server is "honest but curious". The cloud server can completely execute the user instruction, but can also try to illegally acquire the user private data.

The security data retrieval method based on the careless ciphertext inverted index relates to three aspects: the invention discloses a new ciphertext index structure, two methods for expelling data and a communication protocol, wherein the ciphertext index structure specifically refers to an careless ciphertext inverted index, the two methods for expelling data specifically refer to two algorithms for writing data into the careless ciphertext inverted index, and the communication protocol specifically refers to a protocol READANDREPLACE for reading and writing data in the whole communication process. The method proposed by the invention is described in detail in three parts: 1. constructing an unintentional ciphertext inverted index; 2. writing data into an unintentional ciphertext inverted index through an eviction algorithm; 3. and reading and writing data according to the communication protocol.

1. Construction of careless ciphertext inverted index

The inadvertent ciphertext inverted index (Oblivious inverted Index, OBI for short) of the present invention is an encrypted data structure that includes three sub-data storage structures and a communication protocol. The three sub-structures are: a cloud unintentional random access tree (ORAM tree), a client key hash table (ht), and a client buffer (Stash). The cloud unintentional random access tree is used for storing ciphertext inverted indexes, and the user side buffer area and the user side keyword hash table are used for temporarily storing data.

The ORAM tree is an encrypted full binary tree, each non-leaf node has two child nodes, each node is either encrypted or null, and the ORAM tree with the height L has (2 ^L -1) nodes in total. Each tree node has a total of Z triples (keys, values), where (keys, values) are fixed-size key-value pairs, whose stored data is derived from the encrypted inverted index. The value is referred to as a data block, leaf is a leaf identifier that identifies the path of the leaf node to the root node where the current triplet is located. In the ORAM tree, a path from a root node to any node can be represented by a character string, from the root node, a left branch represents '0', a right branch represents '1', and a 01 character string formed by connecting paths represents a path. Tree nodes in the ORAM tree are encoded according to the following rule, the root node is encoded as 0, the node encoded as x, the left child is encoded as (2x+1), and the right child is encoded as (2x+2). ORAM tree with height L can store Z (2 ^L -1) triples at most, and the leaf value ranges from (2 ^L-1 -1) to (2 ^L -2) according to the coding rule.

The client-side keyword hash table refers to a data structure capable of mapping a keyword w into keyword information (Keyword Information, abbreviated as KI), wherein the keyword information KI consists of two parts of wLength and sCounter, namely ki= (wLength, sCounter), wherein wLength is the number of data blocks obtained by searching the keyword w, sCounter refers to the number of times the keyword w is searched and updated, and sCounter = sCounter +1 is executed every time the user searches or updates w, and the method is used for generating different pseudo random numbers.

The client buffer is a stateful hash table, and stores a triplet (key, value, leaf) according to a key, that is, a triplet t is given, and the buffer stores the triplet in the buffer according to the key t. After an eviction operation, the state of the triplet is recorded in the buffer, i.e. whether the triplet is in the t.leaf to root path of the ORAM tree or in the buffer. For any triplet t, it exists either in the t.leaf to root path or in the buffer Stash.

FIG. 1 is an example of the construction of an OBI, which includes an ORAM tree of the cloud, a hash table of the client, and a buffer. The ORAM tree node stores an encryption result obtained by the right inverted index processing, and a dotted line indicates that the inverted index exists only logically. Examples include two files of identifiers 50 and 52, three keywords { a, b, c } being included in the file of identifier 50, two keywords { a, b } being included in the file of identifier 52, whereby an inverted index shown on the right side of fig. 1 can be established, a corresponding to the two files 50 and 52, c corresponding to one file 50, b corresponding to the two files 52 and 50, a keyword-file identifier set DB (a) = { (a, 50), (a, 52) }, DB (b) = { (b, 50), (b, 52) }, DB (c) = { (c, 50) }, one file { (a, 50), (b, 50), (c, 50) } being made up of three keyword-identifier pairs corresponding to 50, one file { (a, 52), (b, 52) }, OBI being placed in the nodes of the orttree, wherein (a, 50) is placed in the path of values of (a, 50) being placed in the path of (10, leaf values. In order to obtain the result corresponding to the keyword a, the user needs to calculate the leaves {8,10} in advance, then call the communication protocol READANDREPLACE ({ 8,10 }) in the third step, read the nodes on the two paths with the leaf values of 8 and 10 to decrypt locally and obtain data, and upload the nodes to replace the nodes of the original path after shuffling, and specifically adopt the communication protocol in the third step.

2. Expelling algorithm

The eviction operation in the invention refers to an algorithm for writing data in a user side buffer area into paths from a plurality of leaves of a cloud ORAM tree to tree roots, and aims to reduce the space occupied by data stored in the user side. The eviction operations in the present invention may be used to insert, search, and modify data in large volumes. When the eviction operation is executed, more triples can be stored because the nodes at the bottom layer of the cloud tree are more than the nodes at the top layer, so that the triples evicted from the user side buffer area are stored from the bottom layer of the tree, namely the leaf nodes of the tree.

The eviction algorithm described in the present invention needs to satisfy the following conditions. Let x be an accessed leaf node identifier and P (x, j) denote the path from the root node to the j-th level node in the path from node x to the root. For any triplet in any buffer, the triplet in the buffer that can be evicted to the cloud ORAM tree satisfies the following equation:

P(x,j)＝P(triplet.leaf,j) (1)

the leaf calculation method according to the present invention is to calculate a leaf value for executing a communication protocol READANDREPLACE ({ x ₁,x₂,…,x_r }), which is described in detail below.

Assume that the result set obtained by reverse indexing the keyword w is DB (w) = { id ₁,id₂,…,id_r }, where each id is a file identifier. Let F be a keyed hash function,Is the key of the user, c is sCounter in the key hash table ht value, and is used to record the number of times the key w is searched and updated, denoted as c=ht [ w ]. SCounter. The i-th file identifier id _i satisfies the following principle: the key-file identifier pair (w, id _i) is either uploaded to the path of leaf node x _i to the root node or remains in the buffer, and leaf x _i can be calculated by the following formula:

The eviction operations described in the present invention include use of two multi-path eviction algorithms, specifically a K nearest neighbor eviction algorithm (K-Nearest Neighbor Eviction Algorithm, KNNEA) and a Partition-based eviction algorithm (Partition-Based Eviction Algorithm, PBEA).

The K nearest neighbor eviction algorithm is a multipath eviction algorithm, and when the algorithm is executed, triples in a user side buffer area can be evicted into r paths of the cloud ORAM tree, and the algorithm is suitable for a smaller result set. KNNEA the main idea is to select Z evictions from the K nearest neighbor triples into the tree node without having to scan the entire buffer to select the triples. As shown in fig. 2, the K nearest neighbor eviction algorithm proposed by the present invention is specifically described below.

Step A, initializing hash table pathNodes for storing tree nodes. A hash table pathNodes is defined for temporarily storing the tree node information to be uploaded to the cloud ORAM tree later, and the index (path information) is a character string formed by the path from the root node to the current node, and represents a piece of path information, so that the corresponding node in the ORAM tree can be found directly through the path information.

And B, copying the triples in the buffer area to an ordered array locally at the user terminal. The triples are sorted, and the triples in the client buffer are initially stored in an ordered array whose elements are sorted by leaf.

And C, carrying out the following operation on the paths corresponding to a group of leaves leave from the bottom layer to the upper layer. KNNEA is input as a set of leaf nodes leave= { x ₁,x₂,…,x_r }, each leaf is leaf, and r paths from leaf nodes to root nodes can be corresponding, namely x ₁-to-root,x₂-to-root,…,x_r -to-root, KNNEA adopt a bottom-up strategy to expel triplets of a user side buffer into the above r paths.

And D, searching K triples T nearest to the leaf in the array, and judging one by one. The process of searching K triples T nearest to leaf in the array is described as step E, and the judging process is described as step F.

And E, for the node N in one path, selecting a triplet in (Z+1) leaf less than or equal to the current leaf value from leaf to the left, and selecting a triplet in (Z+1) leaf greater than or equal to the current leaf value from leaf to the right, wherein a total of K= (2Z+1) nearest neighbor triples are selected from the user side buffer as node N candidate triples T, wherein one leaf identifier corresponding to the node N is leaf.

Step F, determining whether the triplet can be uploaded to the tree node by judging whether the triplet meets the formula (1) in the eviction principle, namely comparing whether P (leaf, j) is equal to P (triple. Leaf, j), wherein j represents the number of layers from the bottommost layer (root) of the tree to the topmost layer of the tree, namely j E [ L,1], and the number of layers is reduced by one after the nodes of the bottommost layer are filled. If the triplet cannot be uploaded to the tree node, the step is repeated to judge the next triplet, otherwise, whether pathNodes is full is continuously judged, if pathNodes is not full, the triplet is stored in pathNodes, and KNNEA deletes the stored triplet from the ordered stack and the buffer. And repeating the step until all triples in the selected K triples T are judged.

And G, encrypting pathNodes and uploading to the cloud, wherein the encryption algorithm is random symmetric encryption of the private key. The private key random symmetric encryption is a symmetric encryption algorithm, a user has a private key, and the user introduces a globally-increased counter to plaintext data to be encrypted each time so as to ensure that ciphertext after each encryption is a unique value.

The partition-based eviction algorithm is also a multipath eviction algorithm, and is suitable for evicting triples of a large number of buffers to a cloud path and inserting a large number of files. As shown in FIG. 3, the partition-based eviction algorithm proposed by the present invention is described in detail below.

Step A, sorting the triplets in the buffer and the input set of leaf identifiers leave by the size of the leaf value.

And B, dividing the leaf identifier range into a group of partition sections with fixed sizes. When the amount of data to be processed is large, the core idea of the PBEA algorithm is to divide the leaf identifier range into a set of fixed-size partitions whose size is an integer power of 2, i.e. partition size size=2 ^c. The leaf identifier leaf has a value ranging from (2 ^L-1 -1) to (2 ^L -2), and this interval can be divided into a set of fixed-size partitions, typically with size set to size=2 ¹⁶.

And C, distributing the ordered triplets and the leaf identifiers in the step A to the corresponding partitions according to the leaf values to obtain a group of non-space partitions and a group of paths of the non-space partitions.

Step D, initialize a hash table global variable pathNodes for storing tree nodes, which need to be stored multiple times. Similar to step A in KNNEA, pathNodes is used to temporarily save tree node information that is subsequently uploaded into the cloud ORAM tree.

And E, executing the following operation on each partition in the parts. The KNNEA ^* algorithm is invoked to find out that the triplet to be evicted is saved in pathNodes, where the difference between the KNNEA ^* algorithm and KNNEA algorithm is that KNNEA ^* evicts the triplet of the partition into the path of the partition, and KNNEA evicts the triplet of the entire buffer into the full path at once.

And F, calling KNNEA an algorithm to expel the remaining triples.

And G, encrypting pathNodes and uploading to the cloud.

3. The client performs data retrieval on the cloud unintentional random access tree ORAM tree through a communication protocol

The communication protocol described in the present invention is denoted READANDREPLACE ({ x ₁,x₂,…,x_r }), where x _i (i e [1, r ]) refers to a leaf identifier (leaf) generated by a user using a pseudo-random function. The method for searching the safety data provided by the invention is used for initializing the data and reading and writing the data in batches, and the communication protocol comprises four steps: 1) The cloud reads the path from the tree leaves of a group of ORAM tree to the tree root; 2) The user decrypts the path from each leaf to the root and writes it into the buffer, and further processing is performed. This process is performed locally, and the user-executable operations include: inserting new data into the buffer area, reading and writing data in batches and updating data in batches. 3) Evicting data using the two eviction algorithms KNNEA and PBEA; 4) The cloud replaces the original visited set of paths with the received nodes.

By adopting the communication protocol, the data is inserted into the careless ciphertext inverted index and is divided into four steps. Step 1) r random paths from leaves to tree roots are read, and all read values are written into a user buffer; step 2) initializing a triplet to be inserted, and writing the triplet into a user side buffer area; step 3) adopting the KNNEA and PBEA two expelling algorithms to expel the data in the user side buffer area to the cloud; 4) The cloud replaces the data in the original r paths of the cloud with the received data. In the process, for the cloud server, only the operation of reading and writing a group of tree root to leaf nodes is performed; for the user, he performs data decryption, encryption, reading, writing, shuffling, and eviction.

The method for searching the safety data is adopted to search the data, and the method is divided into five steps. Assuming that the Search keyword w is denoted as Search (w), the algorithm flow thereof is as shown in fig. 4, and is described in detail below.

And step A, generating a leaf identifier. The user creates a set of empty blocks for storing the result data. The user generates a set of leaf identifiers leave by the token (w) method and sends to the cloud. The token (w) method generates a set of leaf identifiers leaves according to the key information in the key hash table using the leaf calculation method token (w, i, c) mentioned above, wherein i and c refer specifically to the number wLength of key-file identifier pairs in DB (w) and the number sCounter of times w is searched.

And B, reading the tree nodes. A call READPATHS (tree, { x ₁,…,x_r }) sub-method reads a set of tree nodes in the ORAM tree. The leaf identifiers leave generated in the previous step are used as input to return the read tree nodes to the user. The READPATHS algorithm is specifically described as follows: and a group of leaf identifiers { x ₁,…,x_r } are transmitted, all nodes in the paths from all leaf nodes to the root node are sequentially acquired according to the leaf identifiers and stored in a node set nodes, and finally a group of tree nodes is obtained.

And C, reconstructing the leaf identifier. And the user decrypts the tree node read from the cloud to obtain the triplet stored in the node, and stores the triplet into a local buffer area according to the key. The user performs related operation on the data in the local buffer area, when the user executes the search operation to access w, the user executes htw. sCounter +1 operation, the leaf identifier is correspondingly changed, and the leaf identifier is mapped into a new random leaf identifier. The user uses a key (w, i) method to generate a unique key value key for each data block corresponding to the key, calculates and generates a new leaf identifier through the key (w, i, c) method, calls READWRITESTASH (op, key, value, leaf) sub-methods to read and write data of a local buffer area, and the key (w, i) is equal to G (w I), wherein G is a pseudo-random function, and the value range of i is [1, ht [ w ]. WLength ]. The op in READWRITESTASH (op, key, value, leaf) algorithm indicates whether the type of operation performed is read or write, the key is a value generated by using the key (w, i) algorithm, the value indicates a value to be written or read, if a write operation is performed, a triplet (key, value, leaf) is written into the buffer according to the key, otherwise, if a read operation is performed, the value in the triplet is read according to the key, and the leaf value of the triplet in the buffer is replaced by a new leaf.

And D, expelling the node. The user obtains a set of identifiers from the retrieved data block, which is the final result of the search. The user invokes KNNEA an algorithm to re-encrypt and evict the data in the buffer to the cloud.

And E, updating the node. The cloud replaces the nodes in the original path with new nodes.

In the above steps, the cloud performs only a read set of random paths and write set of random paths operation. The expelling and encrypting of the data are completed at the user side. The last step of the search protocol may be packaged into the next query to implement a single round of communication protocol. The user only has to temporarily cache the tree nodes to be evicted.

Claims

1. A secure data retrieval method based on careless ciphertext inverted index is characterized by comprising the following steps: step1, constructing an unintentional ciphertext inverted index; step 2, writing the data into an unintentional ciphertext inverted index; step 3, searching cloud data according to a communication protocol, wherein,

The step 1 comprises the following steps: the method comprises the steps of constructing an ORAM tree of a cloud inadvertent random access tree, constructing a user side buffer and constructing a user side keyword hash table, wherein the cloud inadvertent random access tree is used for storing ciphertext inverted indexes, and the user side buffer and the user side keyword hash table are used for temporarily storing data; wherein: the user side keyword hash table refers to a data structure capable of mapping a keyword w into keyword information KI, and is represented by ht, wherein the keyword information KI is composed of wLength and sCounter parts, namely ki= (wLength, sCounter), wherein wLength is the number of data blocks obtained by searching the keyword w, sCounter refers to the number of times the keyword w is searched and updated, and sCounter = sCounter +1 is executed every time the user searches or updates the keyword w, and is used for generating different pseudo random numbers;

the step 2 comprises the following steps: writing the data in the user buffer area into an ORAM tree path of the cloud through an eviction operation;

The user side buffer area is a stateful hash table, a triplet (value, leaf) is stored according to a key, the leaf is a leaf identifier, namely a triplet t is given, and the buffer area stores the triplet in the buffer area according to the key t.key; after an eviction operation, the state of the triplet is recorded in the buffer, i.e. whether the triplet is in the path from t.leaf to root of the ORAM tree or in the buffer, and for any triplet t, it exists either in the path from t.leaf to root or in the client buffer Stash;

The eviction operation in step2 comprises:

P(x,j)＝P(triplet.leaf,j) (1)

Leaf calculation was performed: assuming that the result set obtained by reverse index processing of the keyword w is DB (w) = { id ₁,id₂,…,id_r }, wherein each id is a file identifier, F is a hash function with a key, k is a key of a user, c is sCounter in the hash table ht value of the keyword, and is used for recording the number of times the keyword w is searched and updated, and is recorded as c=ht [ w ]. SCounter, and the ith file identifier id _i satisfies the following principles: the key-file identifier pair (w, id _i) is either uploaded to the path of leaf node x _i to the root node or remains in the buffer, and leaf x _i is calculated by equation 2:

Wherein the computed result token (w, i, c) is a leaf identifier for retrieving the keyword-file identifier pair (w, id _i), (2 ^L-1 -1) is a value of the first leaf identifier of the cloud ORAM tree, (F _k(w||i||c)％2^L-1) is an integer value obtained by computing a value and 2 ^L-1 remainders by using the hash function F with a key, and L is the height of the ORAM tree;

Wherein the eviction operation in step2 comprises a K nearest neighbor eviction:

Step C, performing the following operations on a path corresponding to a group of leaf nodes from the bottom layer to the upper layer, wherein the input of the K nearest neighbor eviction algorithm KNNEA is a group of leaf nodes leave= { x ₁,x₂,…,x_r }, each leaf is a leaf, and the path from the corresponding r leaf nodes to the root node;

step G, encrypting pathNodes and uploading to the cloud;

The eviction operation in step 2 comprises a partition-based eviction operation:

step F, calling KNNEA to expel the rest triples;

step G, encrypting pathNodes and uploading to the cloud;

Wherein step 3 comprises:

step B, reading tree nodes, calling READPATHS (tree, { x ₁,…,x_r }) sub-method to read a group of tree nodes in the ORAM tree, taking a group of leaf identifiers leave= { x ₁,…,x_r } generated in the previous step as input, and returning the read tree nodes to a user; the ReadPath algorithm is specifically described as follows: a group of leaf identifiers { x ₁,…,x_r } are transmitted, all nodes in the paths from all leaf nodes to the root node are sequentially obtained according to the leaf identifiers and stored in a node set nodes, and finally a group of tree nodes are obtained;

2. The method for secure data retrieval based on careless ciphertext inverted index of claim 1, wherein: the ORAM tree is an encrypted full binary tree, the ORAM tree with the height L has (2 ^L -1) nodes in total, each tree node stores Z triples (key, value, leaf), wherein (key, value) is a key value pair with fixed size, the stored data is derived from an encrypted inverted index, the value is called a data block, and leaf is a leaf identifier for identifying the path from a leaf node to a root node of the current triplet; in the ORAM tree, a path from a root node to any node is represented by a character string, from the root node, a left branch represents '0', a right branch represents '1', and a 01 character string formed by connecting paths represents a path; tree nodes in the ORAM tree are encoded according to the following rule, the root node is encoded as 0, the node encoded as x, the left child is encoded as (2x+1), and the right child is encoded as (2x+2); ORAM tree with height L can store Z (2 ^L -1) triples at most, and the leaf value ranges from (2 ^L-1 -1) to (2 ^L -2) according to the coding rule.