WO2019165880A1 - 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法 - Google Patents

一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法 Download PDF

Info

Publication number
WO2019165880A1
WO2019165880A1 PCT/CN2019/074061 CN2019074061W WO2019165880A1 WO 2019165880 A1 WO2019165880 A1 WO 2019165880A1 CN 2019074061 W CN2019074061 W CN 2019074061W WO 2019165880 A1 WO2019165880 A1 WO 2019165880A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
query
node
vector
nodes
Prior art date
Application number
PCT/CN2019/074061
Other languages
English (en)
French (fr)
Inventor
唐韶华
何志强
Original Assignee
华南理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学 filed Critical 华南理工大学
Publication of WO2019165880A1 publication Critical patent/WO2019165880A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0435Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/101Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying security measures for digital rights management

Definitions

  • the present invention relates to the field of information security technologies, and in particular, to an efficient and verifiable multi-keyword sortable searchable encryption method supporting preference search and logical search.
  • cloud computing With the increasing popularity of cloud computing, in order to reduce the overhead of data management, storage and computing, data owners will outsource the large amount of data they have to the cloud server for storage or processing. But such data ownership loses control of the data, so that the data may be accessed and accessed by cloud servers or intruders, but these numbers may be confidential and privacy-related data such as medical records, government documents, and so on.
  • cloud servers generally claim to be secure, users are generally suspicious of the security mechanisms provided by cloud servers. This concern is also an obstacle to the further development of cloud computing.
  • a common mechanism for protecting data privacy is to encrypt data before uploading it to the cloud server.
  • encryption can greatly limit the availability of data.
  • a simple download, decryption, and processing mechanism is adopted, then a large amount of bandwidth and a large computational cost of the user are required, which is not applicable to the concept of cloud computing.
  • symmetric searchable encryption there are also many searchable encryption schemes for single-keywords and multiple keywords, and corresponding improvement schemes have been proposed, but the functionality of these schemes is relatively simple, and many of them also have great Efficiency issues.
  • the functionality of symmetric searchable encryption still has a very big gap with the plaintext search.
  • the functions in symmetric searchable encryption such as personalized search, logical search, semantic search, fuzzy search, dynamic update, etc., still need further research. .
  • the object of the present invention is to solve the above-mentioned drawbacks in the prior art, and to provide an efficient and verifiable multi-keyword sortable searchable encryption method supporting preference search and logical search.
  • the data owner performs pre-processing of searchable encryption as the owner of the data, including key generation, document encryption, index generation, and digest generation.
  • the data owner encrypts the held document data to obtain a ciphertext document collection.
  • E the data owner builds a secure index based on the document collection FS
  • the data owner generates a message digest, and then the data owner will encrypt the encrypted ciphertext document set E together with the message digest and ciphertext index of the document.
  • Uploading to the cloud server at the same time, the data owner authorizes the data consumer to access its outsourced data, ie, shares the key with the data consumer, including the symmetric key used to encrypt the document and the secret key of the encrypted trapdoor;
  • the data user submits a query to the cloud server for searching.
  • the query is turned into the query trap T Q , and then the query is trapped.
  • the gate T Q and the number of target documents of the data consumer are submitted to the cloud server; once the cloud server receives the query trap T Q , the cloud server performs the computing task; after the calculation is completed, the cloud server returns the top-K most relevant documents after sorting. And the relevant verification object; finally, the data consumer receives the top-K most relevant documents and related verification objects, performs a verification algorithm to verify the accuracy and completeness of the search results, and then decrypts the search results;
  • the cloud server provides the data owner with the "on-demand billing" storage and computing service, and provides the query service to the data user.
  • the cloud server stores the ciphertext document and the ciphertext index, and once received the query sent from the data user.
  • Trap gate T Q and target number K cloud server uses ciphertext index
  • the query trapdoor T Q is used for secure retrieval, and the top-K most relevant encrypted documents are obtained, and the verification object is generated according to the relevance of the query, and then the top-K most relevant ciphertext documents and verification objects are sent. To the data user.
  • the data owner generates a symmetric cipher of the user encrypted document and a secret key of the encrypted document vector, that is, two invertible matrices and one random bit vector;
  • the data owner uses the symmetric key to encrypt the document content, and uses the vector space model and the TF ⁇ IDF to abstract the content of the document and the associated weights, constructs the plaintext binary tree index, and then recursively encrypts the entire binary tree index, according to the encryption process.
  • the random bit vector divides the document vector into two sub-vectors according to a certain rule, and then uses two transposable matrices of the invertible matrix to encrypt the two sub-vectors, and the data owner uses the message digest of the document according to the document content and the key generation.
  • Validation objects are generated during the search phase to ensure verifiability.
  • the data owner uses a binary tree to organize the index, wherein the process of constructing the binary tree is as follows: first, each document is used as a leaf node, then the two most relevant documents are selected, and then the rules are constructed according to certain rules. The parent node, and then continue to select the two nodes with the most relevant nodes of the remaining nodes, continue to get the parent node upwards, so that the plaintext binary tree index is constructed from the bottom up, and then the plaintext binary tree index is encrypted to obtain the ciphertext binary tree index.
  • the data user generates a vector representation of the query according to the submitted query, constructs a query vector according to the historical preference information of the user and the query information to support the preference search, and constructs according to the logically linked keywords and the constructed numerical sequence.
  • the query vector supports the logical search.
  • the data consumer divides the query vector into two sub-vectors according to the rules according to the random bit vector, and then uses the inverse matrix of the two invertible matrices to encrypt the two query sub-vectors to obtain the encrypted vector. Inquiring into the door;
  • the query trapdoor and the number of target documents to be acquired are sent to the cloud server, and the top-K most relevant ciphertexts returned by the cloud server according to the correlation order are obtained.
  • the integrity and accuracy of the search results are verified by the verification algorithm, and then the result document collection is obtained after decryption.
  • the cloud server After receiving the ciphertext binary tree index, the ciphertext document and the message digest generated by the data owner, and the number of encrypted query trapdoors and target documents generated by the data user, the cloud server performs the ciphertext binary tree index according to the index.
  • Query and index that is, the relevance score between documents, obtains top-K most relevant documents. Once the search is completed, the top-K most relevant documents are obtained, sorted according to the relevance score, and then the verification object is generated and sent to Data user.
  • the encryption method includes:
  • the encryption key, the security key SK is only shared between the data owner and the data consumer, and the cloud server does not know any information of the security key SK;
  • each document vector u.PV stored in the node u is calculated using the following formula:
  • u.PV u.
  • the encrypted form of the PV vector is For each node in the index tree, u.PV is replaced with its encrypted form.
  • the encrypted form of Q is The data owner then trapdoor T Q passed the cloud server, T Q comprises And the number of target documents to be acquired K;
  • the cloud server performs a depth-first algorithm to obtain the result set R, and then constructs a verifiable object VO. Then, the cloud server returns the result set R and VO to the data consumer, and the ciphertext binary tree index search algorithm is executed during the execution process.
  • the correlation score between the .PV and Q vectors is shown below:
  • u.PV is the unencrypted document vector (for leaf nodes, PV is the document vector)
  • Q is the unencrypted query vector, and the result of this calculation indicates the correlation score between the index and the trapdoor and the plaintext document vector.
  • the correlation scores with the query are equal or proportional;
  • the data consumer uses the key k f to decrypt the search results and verify the correctness and completeness of the search results.
  • Each leaf node of the ciphertext binary tree index contains the message digest of the current document, and the cloud server utilizes the obtained top-K The message digest of the document generates the verification object and sends it to the data consumer.
  • the data consumer decrypts each document, and then generates a message digest of the document in combination with the key k f , according to The message digest of these newly generated documents generates a new verification object VO'.
  • the data user decides whether to accept this. The result of the secondary query.
  • each dimension corresponding to the keyword contained in the dictionary in the document vector is calculated by the following formula:
  • the user's historical preference and the submitted query are used to construct a vector representation of the query.
  • the keywords in the query submitted by the user are arranged in ascending order of importance. Where 1 ⁇ n 1 ⁇ n 2 ⁇ ... ⁇ n l ⁇ m, then the data user randomly generates a super-incremental sequence as follows: d 1 >0, d 2 ,..., d l is satisfied d i is a keyword
  • the preference factor is the value of the corresponding keyword in the query vector, and the position of the redundant keyword in the query vector is randomly set.
  • the search result is as follows:
  • the key generation phase, the construction index phase, and the generated trapdoor phase are respectively adjusted as follows:
  • each dimension of the document vector only represents the inclusion relationship of the keywords in the current document and the dictionary, 1 indicates that the current document contains a specific keyword, and 0 indicates the current document. Does not contain a specific keyword, where the (n+1)th dimension is set to 1, and other splitting rules and mechanisms for constructing a binary tree index are unchanged;
  • plaintext binary tree index is constructed first, and the construction process and basic structure are as follows:
  • the node u of the plaintext binary tree index is a nine-tuple (P', P", PV, CV, N, PL, PR, FD, sig), where u.PL, u.PR are pointers to the left and right nodes.
  • u.FD document is unique descriptor
  • u.sig the message digest is the generated document content
  • u.CV represents cluster center vector of the cluster G u
  • G u uN represents the number of documents in the cluster
  • the cluster G u represents the document associated with all the leaf nodes in the subtree of the root node of u; it should be noted that because u.CV, uN and u.PV exist only in the stage of constructing the plaintext binary tree index, the plaintext binary tree will be When indexing encryption, you need to set the u.CV and u.PV fields in each node to NULL and the uN field to 0 in the encryption process.
  • nodes u There are two main types of nodes u, namely leaf nodes and intermediate nodes.
  • u.FD
  • u.sig
  • u.PL and u.PR point to the left and right child nodes of node u.
  • u.PV is generated by the respective pruning vectors PV of the two child nodes of node u
  • u.CV is the two child nodes of node u
  • the u.VC clustering center vector generated by the respective cluster center vector CV is mainly used to construct the ciphertext binary tree index, and is used to find the most relevant node.
  • the generation rules are as follows:
  • the cluster center vector is used to calculate the correlation score between nodes and nodes. It is used to find the two closest nodes and construct their parent nodes in the process of constructing the binary tree index.
  • the pruning vector generates two sub-vectors and uses them.
  • the reversible matrix performs encryption for calculating a correlation score between the trapdoor and the query in the binary-tree depth-first retrieval process to determine whether to enter the current subtree for retrieval;
  • the process of constructing the plaintext binary tree index is as follows: In the construction process of the index binary tree, the current processing node set CPNS represents the current round of processing node set, and the pending node set NGNS represents the next round of processing node set. First initialize each document as a node containing a five-tuple and add all document nodes to the CPNS. When the number of nodes in the CPNS is greater than 1, the two most relevant nodes are continuously found, that is, two nodes with the largest correlation scores of the cluster center vectors of all nodes are calculated, which can also be understood as the two most similar documents. . Then construct the parent nodes of the two nodes with the largest correlation score according to the rules mentioned earlier.
  • the constructed parent node is added to the NGNS, and then the two nodes just found are removed from the CPNS, and thus the number of nodes in the CPNS is less than or equal to 1 (because the node in the original CPNS may be an odd number, it may remain A node), then the nodes in the NGNS are added to the CPNS for a new round of processing, and the loop processing is performed until the nodes in the NGNS are added to the CPNS and only one node remains in the CPNS.
  • the only remaining node in the CPNS is the root node in the plaintext index binary tree, and then return to the root node representing the binary tree.
  • the plaintext binary tree index contains all the information of the document collection. Therefore, the plaintext binary tree index needs to be encrypted into a ciphertext binary tree index before the ciphertext binary tree index can be uploaded to the cloud server.
  • u.P', u.P" in the nine-member group representing the node is an encrypted form after the two pruning sub-vectors generated by the pruning vector u.PV are divided according to the following formula, wherein the S vector serves as a segmentation indicator:
  • the uP' and uP" vectors in the nine-tuple representing the node are mainly used in the search phase. If the node is a leaf node, the correlation score between the document and the query vector can be calculated using the same mechanism as the Secure KNN algorithm; The node is an intermediate node, then the correlation score between uP′ and uP′′ and the query trapdoor can be used to determine whether to enter the subtree of the current intermediate node for searching, that is, pruning, and the specific process is described in the search algorithm.
  • the process of encrypting the plaintext binary tree index into the ciphertext binary tree index is as follows: the current root node is empty and returns; if the root node is not empty, firstly, the pruning vector is divided into the pruning vector according to the above formula, and then the reversible matrix is used. The matrix is used to encrypt the pruning vector; in order to prevent the information of the collection of plaintext documents from leaking to the cloud server, the u.CV and u.PV fields in the node need to be set to NULL, and the uN field is set to 0; finally, if the current root If the left subtree of the node is not empty, then continue to recursively encrypt the left subtree. If the right subtree of the current root node is not empty, continue to recursively encrypt the right subtree. Until all the nodes are encrypted, the root node of the ciphertext binary tree index is returned.
  • the target result set is represented by R
  • the threshold represents the minimum value of the correlation score of the node and the query in the current result set
  • K represents the number of documents to be acquired
  • the number of nodes in R is equal to K, and the current The correlation score between the leaf node and the query is greater than the threshold value, then the least relevant node is removed and queried from R, then the current leaf node is added, and the threshold value is updated; if the current node is an intermediate node, then if the pruning The correlation score between the vector and the query trapdoor is less than the threshold value, then the subtree represented by the current node can be directly pruned without subsequent retrieval, otherwise the subtree will continue to be retrieved, so that the index tree is traversed and the node set is returned. R.
  • SecureKNN is used to realize symmetric multi-keyword ciphertext retrieval. It also has the functions of preference search and logical search. It can also sort the search results according to the degree of relevance of the query, and can verify the accuracy of the search results according to the verification object. And integrity, in order to reduce the time complexity of the search, the data owner pre-structured the ciphertext binary tree index, using the ciphertext binary tree index, can effectively cut the subtree to reduce the search space, thus improving the efficiency of the search.
  • FIG. 1 is a schematic structural diagram of a searchable encryption method for efficiently verifying multi-keyword ordering of support preferences and logical search disclosed in the present invention
  • Figure 2 is a clustering process tree diagram.
  • This embodiment discloses an efficient and verifiable multi-keyword sortable searchable encryption method supporting preference search and logical search, which includes the following three parts:
  • the data owner is the owner of the data, mainly for preprocessing of searchable encryption, including key generation, document encryption, index generation, and summary generation.
  • the data owner needs to encrypt the document data held to obtain the ciphertext document set E.
  • Collection FS build security index
  • the data owner will also generate a message digest to facilitate the data user to verify the integrity and accuracy of the search results.
  • the data owner will encrypt the encrypted ciphertext document set E together with the message summary and security index of the document. Upload to the cloud server.
  • the data owner can authorize the data consumer to access its outsourced data, ie, share the key with the data consumer, including the symmetric key used to encrypt the document and the secret key of the encrypted trapdoor.
  • the data owner generates a symmetric cipher for the user-encrypted document and a secret key for the encrypted document vector, namely two reversible matrices and a random bit vector.
  • the data owner uses the symmetric key to encrypt the content of the document.
  • the vector space model and TF ⁇ IDF are used to abstract the content of the document and the related weights.
  • the entire binary tree index is recursively encrypted.
  • the random bit vector is used in the encryption process.
  • the document vector is divided into two sub-vectors according to a certain rule, and then the two sub-vectors are encrypted by using the transposed matrix of the two invertible matrices respectively, and the data owner generates a message digest of the document according to the document content and the key for the search stage construction. Verify the object to ensure verifiability.
  • the data owner uses a binary tree to organize the index.
  • a binary tree When constructing a binary tree, first each document acts as a leaf node, then selects the two most relevant documents at the moment, then constructs its parent node according to certain rules, and then continues to select the two nodes most relevant to the remaining nodes.
  • the upward rule gets the parent node, so that the plaintext binary tree index is constructed from the bottom up, and then the plaintext binary tree index is encrypted to obtain the ciphertext binary tree index.
  • the data owner After the data owner generates the encrypted document and the ciphertext binary tree index and the document message digest, the data is outsourced to the cloud server, and the cloud server provides storage and search services externally.
  • a data consumer is a user who shares a key with a data owner who can submit a query to the server for searching.
  • the query is turned into the query trap T Q , then the query trap T Q and the number of target documents of the data consumer are submitted to the cloud server provider; once the cloud server receives Query the trapdoor T Q , the cloud server performs the calculation task; after the calculation is completed, the cloud server returns the top-K most relevant documents and related verification objects after sorting; finally, the data user receives the top-K most relevant documents and The relevant verification object performs a verification algorithm to verify the accuracy and completeness of the search result, and then decrypts the search result.
  • the data user will generate a vector representation of the query according to the submitted query, but in order to support the preference search, the query vector needs to be constructed according to the user's historical preference information combined with the query information.
  • the logically coupled keywords and constructs need to be constructed.
  • a sequence of values to construct a query vector After the query vector is generated, the data consumer divides the query vector into two sub-vectors according to a certain rule according to a random bit vector, and then uses the inverse matrix of the two invertible matrices to encrypt the two query sub-vectors to obtain the encrypted query trapdoor.
  • the query trapdoor and the number of target documents to be acquired are sent to the cloud server, and the top-K most relevant ciphertext documents and verifications returned by the cloud server according to the correlation order are obtained.
  • the integrity and accuracy of the search results are verified by a verification algorithm, and then decrypted to obtain a result document collection.
  • the cloud server provides data occupants with "on-demand billing" storage and computing services to provide query services to data consumers. It stores the ciphertext document and the ciphertext index. Once the cloud server receives the query trap T Q and the target number K sent from the data consumer, the cloud server uses the ciphertext index. And the query trapdoor T Q is used for secure retrieval, and the top-K most relevant encrypted documents are obtained, and the verification object is generated according to the relevance of the query, and then the top-K most relevant ciphertext documents and verification objects are sent. To the data user.
  • the cloud server After receiving the ciphertext binary tree index generated by the data owner, the ciphertext document and the message digest, and the number of encrypted query trapdoors and target documents generated by the data consumer, the cloud server selects the query and index on the ciphertext binary tree index, that is, the document. The correlation score between the query and the query obtains the top-K most relevant documents. Once the search is completed, the top-K most relevant documents are obtained, sorted according to the relevance score, and then the verification object is generated and sent to the data user.
  • k f is a symmetric encryption key (such as AES, DES).
  • the security key SK is only shared between the data owner and the data consumer, and the cloud server does not know any information about the security key SK.
  • each document vector u.PV stored in the node u is calculated by the following formula:
  • u.PV u.
  • the encrypted form of the PV vector is For each node in the index tree, u.PV is replaced with its encrypted form.
  • the encrypted form of Q is The data owner then trapdoor T Q passed the cloud server, T Q comprises And the number K of target documents to be acquired.
  • the cloud server performs a depth-first algorithm to obtain the result set R, and then constructs a verifiable object VO, and then the cloud server returns the result sets R and VO to the data consumer.
  • the correlation score between the encrypted forms of u.PV and Q vectors is calculated as follows:
  • u.PV is an unencrypted document vector and Q is an unencrypted query vector. This calculation indicates that the correlation score between the index and the trapdoor is equal to the correlation score between the plaintext document vector and the query (or In a positive relationship).
  • Verify(R, VO, SK) The data consumer uses the key k f to decrypt the search results and verify the correctness and completeness of the search results.
  • Each leaf node of the ciphertext binary tree index contains a message digest of the current document, and the cloud server generates a verification object by using the obtained message summary of the top-K documents and sends it to the data consumer.
  • the data user After receiving the top-K ciphertext documents and the verification object, the data user decrypts each document, and then generates a message digest of the document in combination with the key k f , and generates a new verification object VO according to the message digest of the newly generated documents.
  • the data user decides whether to accept the result of the query.
  • each dimension of the document vector is calculated by the following formula, representing the weight score of the keyword at the position in the sorted dictionary.
  • b. Generate a trapdoor phase, constructing a vector representation of the query based on the user's historical preferences and submitted queries. First, the keywords in the query submitted by the user are sorted in ascending order according to importance. (1 ⁇ n 1 ⁇ n 2 ⁇ ... ⁇ n l ⁇ m), then the data user randomly generates a super-incremental sequence (d 1 >0, d 2 ,..., d l is satisfied d i is a keyword The preference factor is the weight value at the corresponding keyword of the query vector, and the weight value at the redundant keyword in the query vector is randomly set to 1.
  • Search keyword collection ( 1 ⁇ n 1 ⁇ n 2 ⁇ ...n l ⁇ m) are arranged in ascending order according to the degree of preference. If the document F 1 contains a keyword with a higher degree of preference than the document F 2 , the document F 1 has a higher return priority than the document F 2 .
  • Search keyword collection ( 1 ⁇ n 1 ⁇ n 2 ⁇ ...n l ⁇ m) are arranged in ascending order according to the degree of preference. If the documents F 1 , F 2 contain keywords of the same degree of preference, then if the document F 1 contains a keyword with a higher weight value, the document F 1 has a higher return priority than the document F 2 .
  • S is a (n+1)-bit random vector
  • M 1 and M 2 are Two (n+1) ⁇ (n+1) invertible matrices
  • n is the size of the generated dictionary
  • 1 is the need to construct a query trap
  • k f is a symmetric encryption key (such as AES, DES).
  • each dimension of the document vector is no longer a score, and each dimension of the document vector only represents the inclusion relationship of the current document and the keywords in the dictionary.
  • 1 indicates that the current document contains a specific keyword, and 0 indicates that the current document does not contain a specific keyword, wherein the (n+1)th dimension is set to 1, and the other segmentation rules and the mechanism for constructing the binary tree index are unchanged, as described above.
  • the weight value in the Q related position can be set to The values of other locations are set to 0.
  • the (n+1)th dimension in the query vector is set to In the query result, if the result R j >0 for the document F j , then F j satisfies the requirements of the logical search.
  • a tree index is needed. First, a plaintext binary tree index is constructed, and then the plaintext binary tree index is encrypted to obtain a ciphertext binary tree index, which can determine whether to prune the related subtree according to the correlation score between the node and the trapdoor.
  • the speed of search, the basic structure and construction process of the plaintext binary tree index is as follows:
  • the node u of the plaintext binary tree index is a nine-tuple (P', P", PV, CV, N, PL, PR, FD, sig), where u.PL, u.PR are pointers to the left and right nodes.
  • u.FD document is unique descriptor
  • u.sig the message digest is the generated document content
  • u.CV represents cluster center vector of the cluster G u
  • G u uN represents the number of documents in the cluster
  • the cluster G u represents the document associated with all the leaf nodes in the subtree of the root node of u; it should be noted that because u.CV, uN and u.PV exist only in the stage of constructing the plaintext binary tree index, the plaintext binary tree will be When indexing encryption, you need to set the u.CV and u.PV fields in each node to NULL and the uN field to 0 in the encryption process.
  • nodes u There are two main types of nodes u, namely leaf nodes and intermediate nodes.
  • u.FD
  • u.sig
  • u.PL and u.PR point to the left and right child nodes of node u.
  • u.PV is generated by the respective pruning vectors PV of the two child nodes of node u
  • u.CV is the two child nodes of node u
  • the u.CV cluster center vector generated by the respective cluster center vector CV is mainly used to construct the ciphertext binary tree index, and is used to find the most relevant node.
  • the generation rules are as follows:
  • the cluster center vector is used to calculate the correlation score between nodes and nodes, which is used to find the two closest nodes and construct their parent nodes in the process of constructing the binary tree index.
  • the pruning vector generates two sub-vectors and encrypts them with the invertible matrix. It is used to calculate the correlation score between the trapdoor and the query in the binary-tree depth-first retrieval process to determine whether to enter the current subtree for retrieval.
  • the process of constructing the plaintext binary tree index is as follows: In the construction process of the index binary tree, the current processing node set CPNS represents the current round of processing node set, and the pending node set NGNS represents the next round of processing node set. First initialize each document as a node containing a five-tuple and add all document nodes to the CPNS. When the number of nodes in the CPNS is greater than 1, the two most relevant nodes are continuously found, that is, two nodes with the largest correlation scores of the cluster center vectors of all nodes are calculated, which can also be understood as the two most similar documents. . Then construct the parent nodes of the two nodes with the largest correlation score according to the rules mentioned earlier.
  • the constructed parent node is added to the NGNS, and then the two nodes just found are removed from the CPNS, and thus the number of nodes in the CPNS is less than or equal to 1 (because the node in the original CPNS may be an odd number, it may remain A node), then the nodes in the NGNS are added to the CPNS for a new round of processing, and the loop processing is performed until the nodes in the NGNS are added to the CPNS and only one node remains in the CPNS.
  • the only remaining node in the CPNS is the root node in the plaintext index binary tree, and then return to the root node representing the binary tree.
  • Encrypting the plaintext binary tree index to obtain the ciphertext binary tree indexing process is as follows:
  • the plaintext binary tree index contains all the information of the document collection. Therefore, the plaintext binary tree index needs to be encrypted into a ciphertext binary tree index before the ciphertext binary tree index can be uploaded to the cloud server.
  • u.P', u.P" in the nine-member group representing the node is an encrypted form after the two pruning sub-vectors generated by the pruning vector u.PV are divided according to the following formula, wherein the S vector serves as a segmentation indicator:
  • the uP' and uP" vectors in the nine-tuple representing the node are mainly used in the search phase. If the node is a leaf node, the correlation score between the document and the query vector can be calculated using the same mechanism as the Secure KNN algorithm; The node is an intermediate node, then the correlation score between uP' and uP" and the query trapdoor can be used to determine whether to enter the subtree of the current intermediate node for searching, that is, to perform pruning.
  • the process of encrypting the plaintext binary tree index into the ciphertext binary tree index is as follows: the current root node is empty and returns; if the root node is not empty, it is preferred to split the pruning vector into a pruning subvector according to the above formula, and then use the reversible matrix.
  • the matrix is used to encrypt the pruning vector; in order to prevent the information of the collection of plaintext documents from leaking to the cloud server, the u.CV and u.PV fields in the node need to be set to NULL, and the uN field is set to 0; finally, if the current root If the left subtree of the node is not empty, then continue to recursively encrypt the left subtree. If the right subtree of the current root node is not empty, continue to recursively encrypt the right subtree. Until all the nodes are encrypted, the root node of the ciphertext binary tree index is returned.
  • the target result set is represented by R
  • the threshold represents the minimum value of the correlation score of the node and the query in the current result set
  • K represents the number of documents to be acquired.
  • the threshold value is updated; if the current node is an intermediate node, if the correlation score between the pruning vector and the query trapdoor is less than the threshold value, the subtree represented by the current node can be directly pruned without subsequent retrieval. Because its correlation is already less than the least relevant node in the result set R, otherwise it continues to retrieve in the subtree. So the index tree is traversed. Returns the node set R.
  • the foregoing for encrypting document vectors and trapdoors is an invertible matrix of (n+e) ⁇ (n+e), the inverse of the matrix is very long, and the indexing phase is constructed, each segmentation After the document sub-vectors need to be left multiplied by an invertible matrix of your matrix, the time complexity is O(N 2 ), if there are m documents, the total time complexity is O(2mN 2 ) (if the binary tree is considered) The construction of the index tree, then the time complexity is O(log m ⁇ m 2 ⁇ N 2 ). If the two reversible full matrices are changed into two reversible diagonal matrices, then the transposed matrix is the original matrix.
  • the inverse matrix is also a diagonal matrix, and the value of each element on the diagonal is the reciprocal of the value of the element at the same position of the original matrix.
  • the time complexity of constructing the index changes from O(2mN 2 ) to O(2mN) (considering the construction of the binary tree, the total time complexity is reduced from O(log m ⁇ m 2 ⁇ N 2 ) to O(log m ⁇ m 2 ⁇ N)), because the time complexity is introduced into the product of a diagonal matrix, the matrix and the vector from O (N 2) becomes O (N), so it is also time complexity Space complexity of an order of magnitude has decreased, while, in the semi-trusted (honest-but-curious i.e. honest but curious) model, which security may be reduced to the security of SecureKNN.
  • the following is a specific example to illustrate the specific process of multi-keyword preference search.
  • the logical search schemes are basically similar.
  • the formal proof of the correctness of the search is also given in the previous section, and will not be described here.
  • the diagonal matrix is mainly used to reduce the amount of calculation, and the calculation steps are basically the same, and will not be described here.
  • the document vector generated corresponding to each document is as follows, wherein the weight value of the redundant keyword obeys a uniform distribution U (-0.01, 0.01).
  • U uniform distribution
  • the two most relevant nodes are first calculated by the document center vector. Since there are a total of 8 nodes, the first round needs to be iterated 4 times, and the 8 document sets are divided into 4 subclasses. The four small clusters at this time are (f3, f1), (f8, f2), (f4, f7), (f5, f6), respectively.
  • the parent node is then constructed up by extracting the center vector and the pruning vector.
  • the second round needs to be iterated twice.
  • the result of clustering is (f3f1, f8f2), (f4f7, f5f6). There are only two nodes in the third round.
  • the submitted query is "java python go", and the number of target documents to be acquired is 2, and the user's interest preference model established according to the user's search history is given different weights with different weights as follows: "" c": 2, "cpp”: 5, "javascript”: 1, "python”: 8, "java”: 7, “go”: 10, "scala”: 6.
  • the query vector Q constructed according to the query submitted by the user and the encrypted forms Q' and Q" of the sub-vector generated by the segmentation of the query vector Q are as follows:
  • the depth-first algorithm is used to recursively search on the ciphertext binary tree index.
  • the correlation score between the root root node and the trapdoor is 86.08, and then continues to traverse down, encountering the intermediate node f3f1f8f2, the correlation score between this node and the trapdoor is 86.08; traversing down, middle
  • the correlation score between the node f3f1 and the trapdoor is 86.07; continue to traverse down, the first node leaf node encountered is the node represented by f3.txt, and the correlation between the node and the trapdoor
  • the value is 85.42, because it is the first leaf node, so it is added directly to the result set; continue to traverse, encounter the second leaf node is the node represented by f1.txt, the correlation between this node and the trapdoor
  • the score is 11.32, which is added to the result set.
  • the node represented by f8.txt and f1.txt in the result set has a threshold of 11.32; then it goes back to the intermediate node f8f2, and the correlation between the intermediate node and the trapdoor The score of the score is 82.52, which is greater than the threshold of 11.32. Then, the search is made in the left subtree of the intermediate node.
  • the third leaf node encountered is the node represented by the f8.txt file.
  • the score of sex is 57.38, which is greater than the threshold of 11.32.
  • the node represented by f1.txt is removed from the result set.
  • the node represented by f3.txt and f8.txt is in the result set, and the update threshold is 57.38; the traversal continues downward, and the fourth leaf node is encountered.
  • F2.txt the correlation score between this node and the trapdoor is 75.40, which is greater than the threshold of 57.38, so the node represented by f8.txt is removed, so only the nodes represented by f3.txt and f2.txt are in the result set.
  • the update threshold is 75.40; then back to the intermediate node f4f7f5f6, the correlation score between the intermediate node and the trapdoor is 7.12, which is less than the threshold of 75.40, so this branch can be directly pruned, so the algorithm runs and the returned result set
  • the content contained in f3.txt is "python go”
  • the content contained in f2.txt is "java go”
  • the weight values of "go" in both documents are the same, both are 0.64
  • the user preference of the "python” keyword in f3.txt is higher than the user preference value of "java" contained in f2.txt, so f3.txt returns a higher priority.
  • the message digest of the node represented by f3.txt and f2.txt needs to generate verifiable objects.
  • the data user After sending to the data consumer, the data user needs to decrypt the ciphertext document and reconstruct each piece.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

本发明公开了一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其中,数据拥有者加密文档并基于文档集合构造密文索引、利用密钥和文档内容生成文档的摘要信息,然后将密文文档和加密后的索引以及摘要信息发送到云服务器;数据使用者共享数据拥有者生成的密钥信息,并根据查询生成查询陷门,并将加密后的查询陷门和待获取的文档数目K发送到云服务器。云服务器接收到密文索引和查询陷门后会执行安全内积操作,搜索到和用户查询最相关的K个文档并按照和查询之间的相关分值排序,然后后生成验证对象,最后返回最相关的K个文档和验证对象到数据使用者;数据使用者通过验证算法来验证返回结果的正确性和完整性。

Description

一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法 技术领域
本发明涉及信息安全技术领域,具体涉及一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法。
背景技术
随着云计算的日益普及,为了减少数据管理、存储和计算的开销,数据拥有者会将其所拥有的大量的数据外包到云服务器进行存储或者处理。但是这样数据拥有则就失去了对数据的有力控制,这样数据就可能被云服务器或者入侵者获取、访问,但是这些数可能是涉及机密性和隐私性的数据如医疗记录,政府文档等。虽然云服务器一般声称是安全的,但是用户对云服务器提供的安全性机制一般是存疑的,这种担忧也是云计算进一步发展普及的障碍。
常用的保护数据隐私的机制就是将数据上传到云服务器之前,先加密数据,但是,加密会极大的限制数据的可用性。而如果采用简单的下载、解密、处理的机制的话那么就需要消耗大量的带宽和用户极大的计算开销,对于云计算这种理念是不适用的。目前也有大量基于同态加密的方案或者基于公钥的可搜索加密方案被提了出来,但是这些方案的计算往往因其巨大的计算开销而变得非常的不实用。所以仍然关注的是对称可搜索加密。在对称可搜索加密中,也有很多针对单关键字和多关键字的可搜索加密方案以及相应的改进方案被提了出来,但是这些方案的功能性相对较为单一,其中不少也存在极大的效率问题。目前对称可搜索加密的功能性仍然和明文的检索存在着非常大的差距,对称可搜索加密中的功能性如个性化检索、逻辑检索、语义检索、模糊检索、动态更新等仍然有待进一步的研究。
发明内容
本发明的目的是为了解决现有技术中的上述缺陷,提供一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法。
本发明的目的可以通过采取如下技术方案达到:
一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,所述的加密方法包括下列步骤:
数据拥有者作为数据的所有者进行可搜索加密的预处理,包括密钥的生成、文档的加密、索引的生成、摘要的生成,数据拥有者将所持有的文档数据加密得到密文文档集合E,数据拥有者基于文档集合FS构建安全索引
Figure PCTCN2019074061-appb-000001
同时数据拥有者生成消息摘要,然后,数据拥有者将加密后的密文文档集合E连同文档的消息摘要和密文索引
Figure PCTCN2019074061-appb-000002
上传到云服务器,同时,数据拥有者授权数据使用者访问其外包数据,即与数据使用者共享密钥,包括用于加密文档的对称密钥和加密陷门的秘密钥;
数据使用者作为和数据的所有者共享密钥的用户,向云服务器提交查询以进行搜索,当数据使用者想要检索文档时,首先,将查询转为查询陷门T Q,然后,查询陷门T Q以及数据使用者的目标文档数目被提交到云服务器;一旦云服务器接收到查询陷门T Q,云服务器执行计算任务;计算完成,云服务器返回排序后top-K个最相关的文档以及相关的验证对象;最后,数据使用者接收到top-K个最相关的文档和相关的验证对象,执行验证算法来验证搜索结果的准确性和完全性,然后再解密得到搜索结果;
云服务器向数据拥有者提供“按需计费”的存储和计算服务,向数据使用者提供查询服务,云服务器存储有密文文档和密文索引,一旦接收到来自数据使用者发送过来的查询陷门T Q和目标数目K,云服务器利用密文索引
Figure PCTCN2019074061-appb-000003
和查询陷门T Q进行安全检索,得到top-K个最相关的加密文档,按与查询的相关性大小排序后生成验证对象,然后将top-K个最相关的密文文档和验证对象发送给数据使用者。
进一步地,所述的数据拥有者生成用户加密文档的对称密码和加密文档向量的秘密钥,即两个可逆矩阵和一个随机的比特向量;
所述的数据拥有者使用对称密钥来加密文档内容,同时使用向量空间模型和TF×IDF来抽象文档的内容以及相关权重,构造明文二叉树索引后,然后递归加密整个二叉树索引,加密过程中根据随机比特向量将文档向量按照一定的规则切分成两个子向量,然后分别使用两个可逆矩阵的转置矩阵来加密两个子向量,同时数据拥有者根据文档内容和密钥生成文档的消息摘要用于在搜索阶段生成验证对象以保证可验证性。
进一步地,所述的数据拥有者使用二叉树来组织索引,其中,构造二叉树过程如下:首先每个文档作为一个叶子结点,然后会选择当前最相关的两个文档,然后按照一定的规则构造其父节点,然后继续选择剩余的节点最相关的两个节点,继续向上规 约得到父节点,如此自底向上构造出明文二叉树索引,然后加密明文二叉树索引得到密文二叉树索引。
进一步地,所述的数据使用者根据提交的查询生成查询的向量表示形式,根据用户的历史偏好信息结合查询信息构造查询向量以支持偏好搜索,根据逻辑联结的关键词以及构造的数值序列来构造查询向量以支持逻辑搜索,在生成查询向量以后,数据使用者根据随机比特向量按照规则将查询向量分成两个子向量,然后分别使用两个可逆矩阵的逆矩阵来加密两个查询子向量得到加密后的查询陷门;
所述的数据使用者生成查询陷门后,会将查询陷门和待获取的目标文档数目发送到云服务器,获取云服务器返回的按照相关性排好序的top-K个最相关的密文文档和验证对象以后,会通过验证算法验证搜索结果的完整性和准确性,然后解密后得到结果文档集合。
进一步地,所述的云服务器接收到数据拥有者生成的密文二叉树索引、密文文档和消息摘要以及数据使用者生成的加密查询陷门和目标文档数目后,会在密文二叉树索引上按照查询和索引,即文档之间的相关性评分获取top-K个最相关的文档,一旦检索结束,得到top-K个最相关的文档,会按照相关性评分排序,然后生成验证对象后发送到数据使用者。
进一步地,所述的加密方法包括:
密钥生成阶段GenKey(1 l(n)):
初始化阶段,数据拥有者生成一样四元组的安全密钥SK=(S,M 1,M 2,k f),其中S是一个(n+e)比特的随机向量,M 1和M 2是两个(n+e)×(n+e)的可逆矩阵,其中n是生成的字典大小,e是为了防止陷门的不可链接性而引入的冗余关键词的数目,k f是一个对称加密密钥,安全密钥SK仅仅在数据拥有者和数据使用者之间共享,云服务器不知道安全密钥SK的任何信息;
构造索引阶段BuildIndex(FS,SK):
根据构造索引二叉树算法,对节点u中存储的每一个文档向量u.PV使用如下的公式计算:
Figure PCTCN2019074061-appb-000004
其中
Figure PCTCN2019074061-appb-000005
表示w i在F d中的TF值,
Figure PCTCN2019074061-appb-000006
是关键词w i出现在文档F d的频率,构造出了明文二叉树索引以后,然后加密得到密文二叉树索引,加密过程中,针对剪枝向量(叶子结点中,剪枝向量即文档向量)应用如下的切分规 则得到两个随机子向量{P′,P″},其中SK.S充当切分指示器,切分规则如下:
Figure PCTCN2019074061-appb-000007
u.PV向量的加密形式是
Figure PCTCN2019074061-appb-000008
对于索引树中的每一个节点,u.PV被替换成了其加密形式
Figure PCTCN2019074061-appb-000009
生成陷门阶段GenTrapdoor(S q,k,SK):
假定S q={w 1,w 2,...,w t}是用户的查询关键字集合,S q的向量形式是Q,每一个维通过如下的公式进行计算:
Figure PCTCN2019074061-appb-000010
然后,进行归一化操作,随后,Q被切分成两个随机子向量{Q′,Q″},SK.S充当切分指示器,切分规则如下:
Figure PCTCN2019074061-appb-000011
Q的加密形式是
Figure PCTCN2019074061-appb-000012
然后数据拥有者将陷门T Q传给云服务器,T Q包括
Figure PCTCN2019074061-appb-000013
和待获取的目标文档数目K;
搜索阶段
Figure PCTCN2019074061-appb-000014
云服务器执行一个深度优先的算法来获取结果集合R,然后构造一个可验证的对象VO,然后,云服务器返回结果集合R和VO给数据使用者,密文二叉树索引的搜索算法执行过程中,u.PV和Q向量的加密形式之间相关性评分计算下所示:
Figure PCTCN2019074061-appb-000015
其中u.PV为未加密过的文档向量(对于叶子结点,PV就是文档向量),Q为未加密过的查询向量,此计算结果表明索引和陷门之间的相关性评分与明文文档向量和查询之间的相关性评分相等或者成正比关系;
阶段验证阶段Verify(R,VO,SK):
数据使用者使用密钥k f来解密搜索结果并验证搜索结果的正确性和完整性,密文二叉树索引的每一个叶子结点都包含当前文档的消息摘要,云服务器利用获取的top-K个文档的消息摘要生成验证对象并发送到数据使用者,数据使用者接收到top-K个密文文档以及验证对象后,会解密每个文档,然后结合密钥k f生成文档的消息摘要,根据这些新生成的文档的消息摘要生成新的验证对象VO′,通过判定云服务器返回的验证对象和数据使用者新生成的验证对象是否相等,即VO′是否等于VO,数据使用者决定是否接受此次查询的结果。
进一步地,所述的构造索引阶段和所述的生成陷门阶段分别调整如下:
所述的构造索引阶段,由文档构造文档向量的过程中,文档向量中字典中包含的关键词对应的每一个维度用如下的公式来计算:
Figure PCTCN2019074061-appb-000016
公式中,
Figure PCTCN2019074061-appb-000017
表示关键词w i在文档F j中的TF值,
Figure PCTCN2019074061-appb-000018
表示有多少个文档中包含了 关键词w i,N表示文档集合中文档的个数,|F j|表示文档F j的长度,即包含的关键词的数目,而冗余关键词对应的维度的值服从均匀分布U(θ-σ,θ+σ),均匀分布的均值θ和方差σ需要根据实验中的数据确定;
所述的生成陷门阶段,使用用户的历史偏好和提交的查询构建查询的向量表示形式,首先用户提交的查询中的关键词按照重要性递增序排列
Figure PCTCN2019074061-appb-000019
其中,1≤n 1<n 2<…<n l≤m,然后数据使用者随机生成一个超递增序列如下:d 1>0,d 2,…,d l满足
Figure PCTCN2019074061-appb-000020
d i是关键词
Figure PCTCN2019074061-appb-000021
的偏好因子,即查询向量相应关键词处的值,而查询向量中冗余关键词的位置则随机置1,此时搜索的结果如下表示为:
Figure PCTCN2019074061-appb-000022
其中s是因为引入的冗余关键词而引入的扰动值。
进一步地,所述的密钥生成阶段、所述的构造索引阶段、所述的生成陷门阶段分别调整如下:
所述的密钥生成阶段,数据拥有者在初始化时生成一样四元组的安全密钥SK=(S,M 1,M 2,k f),其中S是一个(n+1)比特的随机向量,M 1和M 2是两个(n+1)×(n+1)的可逆矩阵,其中n是生成的字典大小,1是构造查询陷门的需要,k f是一个对称加密密钥;
所述的构造索引阶段,将文档转为文档向量的过程中,文档向量的每一维仅表示当前文档和字典中关键词的包含关系,1表示当前文档包含特定的关键词,0表示当前文档不包含特定的关键词,其中第(n+1)维置1,其他切分规则和构造二叉树索引的机制不变;
所述的生成陷门阶段,假设查询中和“OR”,“AND”,“NO”相关的关键词集合分别是
Figure PCTCN2019074061-appb-000023
同时使用符号
Figure PCTCN2019074061-appb-000024
表示数学意义上的“OR”“AND”,“NO”,匹配规则表示为
Figure PCTCN2019074061-appb-000025
Figure PCTCN2019074061-appb-000026
对于“OR”操作,数据使用者构造一个超递增序列a j(j=1,2,...,l 1),
Figure PCTCN2019074061-appb-000027
来赋值给“AND”搜索关键字的权重值,为了实现“AND”和“NO”操作,同样,数据使用者构造两个超递增序列b j(j=1,2,...,l 2)c j(j=1,2,...,l 3)满足条件式
Figure PCTCN2019074061-appb-000028
和条件式
Figure PCTCN2019074061-appb-000029
Figure PCTCN2019074061-appb-000030
假设
Figure PCTCN2019074061-appb-000031
是根据重要性递增排序的,那么搜索 关键词集合
Figure PCTCN2019074061-appb-000032
那么在Q相关位置中的权重值被设置为
Figure PCTCN2019074061-appb-000033
其他位置的值则被设置为0,同时查询向量中第(n+1)维设置为
Figure PCTCN2019074061-appb-000034
在查询结果,如果对于文档F j,其结果R j>0,那么F j就满足逻辑搜索的要求。
进一步地,先构造明文二叉树索引,其构造过程及基本结构如下:
(1)明文二叉树索引的节点u是一个九元组(P′,P″,PV,CV,N,PL,PR,FD,sig),其中u.PL,u.PR是指向左右节点的指针;u.FD是文档的唯一描述符;u.sig是根据文档内容生成的消息摘要;u.CV表示聚类G u的聚类中心向量,u.N表示聚类G u中文档的数目,聚类G u代表的是以u为根节点的子树中所有的叶子结点相关联的文档;需要说明的是因为u.CV,u.N和u.PV仅仅存在于构造明文二叉树索引阶段,将明文二叉树索引加密时,需要在加密过程中将每个节点中u.CV和u.PV字段都设置为NULL,将u.N字段设置为0。
节点u主要有两种类型,分别是叶子结点和中间节点。
1.如果u是叶子结点,那么u.PL=u.PR=φ;u.FD存储的是文档的文件描述符;u.CV和u.PV都存储当前的文档向量;u.P′和u.P″分别代表u.PV切分后的子向量的加密形式,此时都设置为默认值NULL;u.N=1;u.sig存储当前文档的消息摘要,消息摘要主要用于搜索过程结束后生成验证对象,数据使用者用接收到的验证对象验证搜索结果的完整性和准确性。
2.如果u是一个内部的中间节点,那么u.FD=φ,u.sig=φ,u.PL和u.PR指向节点u的左右孩子节点。u.N=u.PL.N+u.PR.N,而u.PV是由节点u的两个孩子节点的各自的剪枝向量PV生成,u.CV则是从节点u的两个孩子节点的各自的聚类中心向量CV生成的,u.VC聚类中心向量主要用于构造密文二叉树索引的过程中,用于查找最相关的节点。生成规则如下:
Figure PCTCN2019074061-appb-000035
聚类中心向量用于计算节点与节点之间的相关性评分,用于在构造二叉树索引过程中查找最相近的两个节点并构造其父节点,而剪枝向量会生成两个子向量并会利用可逆矩阵进行加密,用于在二叉树深度优先的检索过程中计算和陷门即查询之间的相关性评分以决定是否进入当前子树中进行检索;
(2)构造明文二叉树索引的过程如下:在索引二叉树的构造过程中,当前处理节点集合CPNS代表当前一轮处理的节点集合,待处理节点集合NGNS代表下一轮处理的节点集合。首先初始化每一个文档为一个包含五元组的节点,将所有的文档节点都加入到CPNS中。当CPNS中节点个数大于1的时候,不断的找到两个最相关的节点,即计算所有节点的聚类中心向量的相关分值最大的两个节点,也可以理解为最相似的两个文档。然后按照前面提到规则构造最大相关分值的两个节点的父节点。然后将构造出的父节点加入到NGNS中,然后从CPNS中移除刚找到的两个节点,如此处理直到CPNS中节点个数小于或等于1(因为原来CPNS中节点可能是奇数,所以可能剩余一个节点),然后将NGNS中的节点加入到CPNS中,进行新一轮的处理,如此循环处理,直到将NGNS中的节点都加入到CPNS中后CPNS中仍只剩下一个节点,那么此时终止构造过程,CPNS中剩下的唯一节点就是明文索引二叉树中的根节点,然后返回代表此二叉树的根节点即可。
进一步地,加密明文二叉树索引得到密文二叉树索引的过程如下:
明文二叉树索引包含这个文档集合的全部信息,所以需要先将明文二叉树索引加密成密文二叉树索引然后才能将密文二叉树索引上传到云服务器。其中,代表节点的九元组中u.P′,u.P″是剪枝向量u.PV按如下公式切分后生成的两个剪枝子向量后的加密形式,其中S向量充当切分指示器:
Figure PCTCN2019074061-appb-000036
Figure PCTCN2019074061-appb-000037
表示节点的九元组中u.P′和u.P″向量主要用于搜索阶段,如果节点是叶子结点,那么可以利用和Secure KNN算法相同的机制计算文档和查询向量之间的相关性分值;如果节点是中间节点,那么可以利用u.P′和u.P″和查询陷门之间的相关性分值决定是否进入当前中间节点的子树中进行搜索,即进行剪枝,具体过程见搜索算法描述。
加密明文二叉树索引成密文二叉树索引的过程如下:当前根节点为空,返回;如果根节点不为空,首先需要按照如上的公式切分剪枝向量成剪枝子向量,然后使用可逆矩阵的转置矩阵加密剪枝子向量;为了避免明文文档的集合的信息泄露给云服务器,需要将节点中u.CV和u.PV字段都设置为NULL,同时将u.N字段设置为0;最后,如果当前根节点的左子树不为空,那么继续递归加密左子树,如果当前根节点的右子树不为空,继续递归加密右子树。直到整个所有的节点都加密过,此时返回密文二叉树索引的根节点即可。
进一步地,利用二叉树索引来加速查询的过程如下:
目标结果集合用R表示,threshold则表示当前结果集合中节点和查询的相关性得分的最小值,K表示要获取的文档数目,在检索阶段,如果当前节点是叶子结点,并且R中节点个数小于K-1,那么将当前节点加入到R中,如果R中节点个数等于K-1,那么将当前节点加入到R中并且更新threshold值,如果R中节点个数等于K,并且当前叶子节点和查询之间的相关性评分大于threshold值,那么从R中移除和查询最不相关的节点,然后加入当前叶子节点,同时更新threshold值;如果当前节点是中间节点,那么如果剪枝向量和查询陷门之间的相关性评分小于threshold值,那么当前节点所代表的子树可以直接剪枝掉,不用后续检索,否则进入子树中继续检索,如此索引树遍历完毕,返回节点集合R。
本发明相对于现有技术具有如下的优点及效果:
(1)利用SecureKNN来实现了对称多关键字密文检索,同时具备偏好搜索检索和逻辑检索的功能,还能按照和查询的相关程度排序搜索结果,同时能根据验证对象验证搜索结果的准确性和完整性,为了降低搜索的时间复杂度,数据拥有者预先构造了密文二叉树索引,利用此密文二叉树索引,能有效的剪枝子树以减小搜索空间,这样就提升了搜索的效率。
(2)利用对角矩阵来替代满矩阵,其存储开销和计算开销都降低了一个数量级,矩阵的求逆的时间也极大的减小了,这些都极大的降低了数据拥有者的预处理的开销,同时在半可信(honest-but-curious即诚实但是好奇)的模型下,采用对角矩阵的方案的安全性也没有降低,因此,本发明在提升速度的同时没有降低方案的安全性。
附图说明
图1是本发明公开的支持偏好和逻辑搜索的高效可验证多关键字排序的可搜索加密方法的结构示意图;
图2是聚类过程树形图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例一
本实施例公开了一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,包括如下三个部分:
a)数据拥有者
数据拥有者是数据的所有者,主要是进行可搜索加密的预处理,包括密钥的生成、文档的加密、索引的生成、摘要的生成等几个步骤。为了保证所持有数据的机密性,数据拥有者需要将所持有的文档数据加密得到密文文档集合E,为了使得加密后的文档可搜索同时保证搜索的高效性,数据拥有者需要基于文档集合FS构建安全索引
Figure PCTCN2019074061-appb-000038
同时数据拥有者也会生成消息摘要以便于数据使用者进行搜索结果的完整性和准确性校验,然后,数据拥有者将加密后的密文文档集合E连同文档的消息摘要和安全索引
Figure PCTCN2019074061-appb-000039
上传到云服务器。与此同时,数据拥有者可以授权数据使用者访问其外包数据,即与数据使用者共享密钥,包括用于加密文档的对称密钥和加密陷门的秘密钥。
数据拥有者会生成用户加密文档的对称密码和加密文档向量的秘密钥,即两个可逆矩阵和一个随机的比特向量。
数据拥有者使用对称密钥来加密文档内容,同时使用向量空间模型和TF×IDF来抽象文档的内容以及相关权重,构造明文二叉树索引后,然后递归加密整个二叉树索引,加密过程中根据随机比特向量将文档向量按照一定的规则切分成两个子向量,然后分别使用两个可逆矩阵的转置矩阵来加密两个子向量,同时数据拥有者根据文档内容和密钥生成文档的消息摘要用于搜索阶段构造验证对象以保证可验证性。
为了保证搜索过程的高效性,数据拥有者使用了二叉树来组织索引。构造二叉树 的时候,首先每个文档作为一个叶子结点,然后会选择当前最相关的两个文档,然后按照一定的规则构造其父节点,然后继续选择剩余的节点最相关的两个节点,继续向上规约得到父节点,如此自底向上构造出明文二叉树索引,然后加密明文二叉树索引得到密文二叉树索引。
数据拥有者生成加密文档和密文二叉树索引以及文档消息摘要后,会将这些数据外包到云服务器上,此时云服务器对外提供存储和搜索服务。
b)数据使用者
数据使用者就是和数据所有者共享密钥的用户,其可以向服务器提交查询以进行搜索。当数据使用者想要检索文档时,首先,将查询转为查询陷门T Q,然后,查询陷门T Q以及数据使用者的目标文档数目被提交到云服务器提供商;一旦云服务器接收到查询陷门T Q,云服务器执行计算任务;计算完成,云服务器返回排序后top-K个最相关的文档以及相关的验证对象;最后,数据使用者接收到top-K个最相关的文档和相关的验证对象,执行验证算法来验证搜索结果的准确性和完全性,然后再解密得到搜索结果。
数据使用者会根据提交的查询生成查询的向量表示形式,但是为了支持偏好搜索,需要根据用户的历史偏好信息结合查询信息构造查询向量,为了支持逻辑搜索,需要根据逻辑联结的关键词以及构造的数值序列来构造查询向量。生成了查询向量以后,数据使用者根据随机比特向量按照一定的规则将查询向量分成两个子向量,然后分别使用两个可逆矩阵的逆矩阵来加密两个查询子向量得到加密后的查询陷门。
数据使用者生成查询陷门后,会将查询陷门和待获取的目标文档数目发送到云服务器,获取云服务器返回的按照相关性排好序的top-K个最相关的密文文档和验证对象以后,会通过验证算法验证搜索结果的完整性和准确性,然后解密后得到结果文档集合。
c)云服务器
云服务器向数据拥有者提供“按需计费”的存储和计算服务,向数据使用者提供查询服务。其存储了密文文档和密文索引,一旦云服务器接收到来自数据使用者发送过来的查询陷门T Q和目标数目K,云服务器就会利用密文索引
Figure PCTCN2019074061-appb-000040
和查询陷门T Q进行安全检索,得到top-K个最相关的加密文档,按与查询的相关性大小排序后生成验证对象,然后将top-K个最相关的密文文档和验证对象发送给数据使用者。
云服务器接收到数据拥有者生成的密文二叉树索引,密文文档和消息摘要以及数据使用者生成的加密查询陷门和目标文档数目后,会在密文二叉树索引上按照查询和 索引,即文档和查询之间的相关性评分获取top-K个最相关的文档,一旦检索结束,得到top-K个最相关的文档,会按照相关性评分排序,然后生成验证对象后发送到数据使用者。
下面结合数据拥有者、数据使用者、云服务器介绍一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法的几个基本过程,但是要支持偏好搜索和逻辑搜索,需要对其中的几个阶段各做一些调整,这里先阐述基本方案,紧接着是支持偏好搜索和逻辑搜索的方案以及构造二叉树索引的构造和搜索的执行过程。
(1)密钥生成阶段---数据拥有者
GenKey(1 l(n)):初始化阶段,数据拥有者生成一样四元组的安全密钥SK=(S,M 1,M 2,k f),其中S是一个(n+e)比特的随机向量,M 1和M 2是两个(n+e)×(n+e)的可逆矩阵,其中n是生成的字典大小,e是为了防止陷门的不可链接性而引入的冗余关键词的数目,k f是一个对称加密密钥(如AES,DES)。安全密钥SK仅仅在数据拥有者和数据使用者之间共享,云服务器不知道安全密钥SK的任何信息。
(2)构造索引阶段---数据拥有者
BuildIndex(FS,SK):根据构造索引二叉树算法,对节点u中存储的每一个文档向量u.PV使用如下的公式计算:
Figure PCTCN2019074061-appb-000041
其中
Figure PCTCN2019074061-appb-000042
表示w i在F d中的TF值,
Figure PCTCN2019074061-appb-000043
是关键词w i出现在文档F d的频率,构造出了明文二叉树索引以后,然后加密得到密文二叉树索引,加密过程中,针对剪枝向量(叶子结点中,剪枝向量即文档向量)应用如下的切分规则得到两个随机子向量{P′,P″},其中SK.S充当切分指示器,切分规则如下:
Figure PCTCN2019074061-appb-000044
u.PV向量的加密形式是
Figure PCTCN2019074061-appb-000045
对于索引树中的每一个节点,u.PV被替换成了其加密形式
Figure PCTCN2019074061-appb-000046
(3)生成陷门阶段---数据使用者
GenTrapdoor(S q,k,SK):假定S q={w 1,w 2,...,w t}是用户的查询关键字集合,S q的向量形式是O,每一个维通过如下的公式进行计算:
Figure PCTCN2019074061-appb-000047
然后,进行归一化操作。随后,Q被切分成两个随机子向量{Q′,Q″},SK.S充当切分指示器。切分规则如下:
Figure PCTCN2019074061-appb-000048
Q的加密形式是
Figure PCTCN2019074061-appb-000049
然后数据拥有者将陷门T Q传给云服务器,T Q包 括
Figure PCTCN2019074061-appb-000050
和待获取的目标文档数目K。
(4)搜索阶段------云服务器
Figure PCTCN2019074061-appb-000051
云服务器执行一个深度优先的算法来获取结果集合R,然后构造一个可验证的对象VO,然后,云服务器返回结果集合R和VO给数据使用者。密文二叉树索引的搜索算法执行过程中,u.PV和Q向量的加密形式之间相关性评分计算如下所示:
Figure PCTCN2019074061-appb-000052
其中u.PV为未加密过的文档向量,Q为未加密过的查询向量,此计算结果表明索引和陷门之间的相关性评分与明文文档向量和查询之间的相关性评分相等(或者成正比关系)。
(5)阶段验证阶段
Verify(R,VO,SK):数据使用者使用密钥k f来解密搜索结果并验证搜索结果的正确性和完整性。密文二叉树索引的每一个叶子结点都包含了当前文档的消息摘要,云服务器会利用获取的top-K个文档的消息摘要生成验证对象并发送到数据使用者。数据使用者接收到top-K个密文文档以及验证对象后,会解密每个文档,然后结合密钥k f生成文档的消息摘要,根据这些新生成的文档的消息摘要生成新的验证对象VO′,通过判定服务器返回的验证对象和数据使用者新生成的验证对象是否相等,即VO′是否等于VO,数据使用者决定是否接受此次查询的结果。
要支持偏好搜索,其索引构造阶段和陷门生成阶段需要做一些调整,其他阶段保持不变。
a.索引构造阶段,由文档构造文档向量的过程中,文档向量每一个维度用如下的公式来计算,代表已排好序的字典中该位置处的关键字的权重分值。
Figure PCTCN2019074061-appb-000053
公式中,
Figure PCTCN2019074061-appb-000054
表示关键词w i在文档F j中的TF值,
Figure PCTCN2019074061-appb-000055
表示有多少个文档中包含了关键词w i,N表示文档集合中文档的个数。|F j|表示文档F j的长度,即包含的关键词的数目。而冗余关键词对应的维度的值服从均匀分布U(θ-σ,θ+σ)即可,均匀分布的均值θ和方差σ需要根据实验中的数据确定。我们的中实验均值置0,方差置0.01。
b.生成陷门阶段,根据用户的历史偏好和提交的查询构建查询的向量表示形式。首先用户提交的查询中的关键词根据重要性按递增序排列
Figure PCTCN2019074061-appb-000056
(1≤n 1<n 2<…<n l≤m),然后数据使用者随机生成一个超递增序列(d 1>0,d 2,...,d l满足
Figure PCTCN2019074061-appb-000057
d i是关键词
Figure PCTCN2019074061-appb-000058
的偏好因子,即查询向量相应关键词处的权重值,而查询向量中冗余关键词处的权重值则随机置1。
此时搜索的结果就可以如下表示为:
Figure PCTCN2019074061-appb-000059
其中s是因为引入冗余关键词而引入的总分值的扰动值。
这样的构造能保证如下两点:
(1)搜索关键词集合
Figure PCTCN2019074061-appb-000060
(1≤n 1<n 2<…n l≤m)根据偏好程度按照递增序排列。如果文档F 1比文档F 2包含一个偏好程度更高的关键词,那么文档F 1相比文档F 2就有更高的返回优先级。
(2)搜索关键词集合
Figure PCTCN2019074061-appb-000061
(1≤n 1<n 2<…n l≤m)根据偏好程度按照递增序排列。如果文档F 1,F 2包含了相同偏好程度的关键词,那么如果文档F 1中包含了权重值更高的关键词,那么文档F 1相比文档F 2就有更高的返回优先级。
为了支持逻辑搜索,需要对密钥生成阶段,构造索引阶段,陷门生成阶段做一些调整,其他阶段保持不变。
c.密钥生成阶段
初始化阶段,数据拥有者生成一样四元组的安全密钥SK=(S,M 1,M 2,k f),其中S是一个(n+1)比特的随机向量,M 1和M 2是两个(n+1)×(n+1)的可逆矩阵,其中n是生成的字典大小,1是构造查询陷门的需要,k f是一个对称加密密钥(如AES,DES)。
d.构造索引阶段
构造索引阶段,将文档转为文档向量的过程中,文档向量每一维不再是分值,文档向量的每一维仅表示当前文档和字典中关键词的包含关系。1表示当前文档包含特定的关键词,0表示当前文档不包含特定的关键词,其中第(n+1)维置1,其他切分规则和构造二叉树索引的机制不变,如前所述。
e.生成陷门阶段
假设查询中和“OR”,“AND”,“NO”相关的关键词集合分别是
Figure PCTCN2019074061-appb-000062
Figure PCTCN2019074061-appb-000063
同时使用符号
Figure PCTCN2019074061-appb-000064
表示数学意义上的“OR”“AND”,“NO”。这样,匹配规则就可以表示为
Figure PCTCN2019074061-appb-000065
Figure PCTCN2019074061-appb-000066
对于“OR”操作,数据使用者构造一个超递增序列
Figure PCTCN2019074061-appb-000067
来赋值给“AND”搜索关键字的权重值。为了实现“AND”和“NO”操作,同样,数据使用者构造两个超递增序列b j(j=1,2,...,l 2)c j(j=1,2,...,l 3)满足条件式
Figure PCTCN2019074061-appb-000068
和条件式
Figure PCTCN2019074061-appb-000069
Figure PCTCN2019074061-appb-000070
假设
Figure PCTCN2019074061-appb-000071
是根据重要性递增排序的。那么搜索关键词集 合
Figure PCTCN2019074061-appb-000072
那么在Q相关位置中的权重值可以被设置为
Figure PCTCN2019074061-appb-000073
其他位置的值则被设置为0。同时查询向量中第(n+1)维设置为
Figure PCTCN2019074061-appb-000074
在查询结果,如果对于文档F j,其结果R j>0,那么F j就满足逻辑搜索的要求。
推论的正确性也很容易证明:
因为“NO”联结的关键词
Figure PCTCN2019074061-appb-000075
在查询向量Q中对应的值为-c i并且
Figure PCTCN2019074061-appb-000076
所以如果
Figure PCTCN2019074061-appb-000077
关键词存在于文档中,那么肯定可以推断出P·Q<0以及R j=(P·Q-s)<0.所以,如果R j>0,那么
Figure PCTCN2019074061-appb-000078
都不会存在于文档F j中,即文档F j满足“NO”条件。
又因为如果R j>0,那么
Figure PCTCN2019074061-appb-000079
因为
Figure PCTCN2019074061-appb-000080
以及,那么AND相关的所有关键词都必须存在,OR相关的关键词必须有一个存在,即,文档向量P中所有AND联结的关键字处的值都是置1的,OR联结的关键字处的值至少有一个是1的。所以,F j满足“AND”和“OR”操作。所以,如果R j>0,那么向量P满足“OR”“AND”,“NO”操作。反之构造出来这样的满足上述几个不等式的超递增序列,也可以推断出R j>0。
为了加速查找需要使用树形索引,首先构造明文二叉树索引,然后加密明文二叉树索引得到密文二叉树索引,其能根据节点和陷门之间的相关性分值决定是否剪枝相关的子树以提升搜索的速度,明文二叉树索引基本的结构和构造过程如下:
(1)明文二叉树索引的节点u是一个九元组(P′,P″,PV,CV,N,PL,PR,FD,sig),其中u.PL,u.PR是指向左右节点的指针;u.FD是文档的唯一描述符;u.sig是根据文档内容生成的消息摘要;u.CV表示聚类G u的聚类中心向量,u.N表示聚类G u中文档的数目,聚类G u代表的是以u为根节点的子树中所有的叶子结点相关联的文档;需要说明的是因为u.CV,u.N和u.PV仅仅存在于构造明文二叉树索引阶段,将明文二叉树索引加密时,需要在加密过程中将每个节点中u.CV和u.PV字段都设置为NULL,将u.N字段设置为0。
节点u主要有两种类型,分别是叶子结点和中间节点。
a.如果u是叶子结点,那么u.PL=u.PR=φ;u.FD存储的是文档的文件描述符;u.CV和u.PV都存储当前的文档向量;u.P′和u.P″分别代表u.PV切分后的子向量的加密形式,此时都设置为默认值NULL;u.N=1;u.sig存储当前文档的消息摘要,消息摘要主要用于搜索过程结束后生成验证对象,数据使用者用接收到的验证对象验证搜索结果的完整性和准确性。
b.如果u是一个内部的中间节点,那么u.FD=φ,u.sig=φ,u.PL和u.PR指向节点u的左右孩子节点。u.N=u.PL.N+u.PR.N,而u.PV是由节点u的两个孩子节点的各自的剪枝向量PV生成,u.CV则是从节点u的两个孩子节点的各自的聚类中心向量CV 生成的,u.CV聚类中心向量主要用于构造密文二叉树索引的过程中,用于查找最相关的节点。生成规则如下:
Figure PCTCN2019074061-appb-000081
聚类中心向量用于计算节点与节点之间的相关性评分,用于在构造二叉树索引过程中查找最相近的两个节点并构造其父节点。而剪枝向量会生成两个子向量并会利用可逆矩阵进行加密,用于在二叉树深度优先的检索过程中计算和陷门即查询之间的相关性评分以决定是否进入当前子树中进行检索。
(2)构造明文二叉树索引的过程如下:在索引二叉树的构造过程中,当前处理节点集合CPNS代表当前一轮处理的节点集合,待处理节点集合NGNS代表下一轮处理的节点集合。首先初始化每一个文档为一个包含五元组的节点,将所有的文档节点都加入到CPNS中。当CPNS中节点个数大于1的时候,不断的找到两个最相关的节点,即计算所有节点的聚类中心向量的相关分值最大的两个节点,也可以理解为最相似的两个文档。然后按照前面提到规则构造最大相关分值的两个节点的父节点。然后将构造出的父节点加入到NGNS中,然后从CPNS中移除刚找到的两个节点,如此处理直到CPNS中节点个数小于或等于1(因为原来CPNS中节点可能是奇数,所以可能剩余一个节点),然后将NGNS中的节点加入到CPNS中,进行新一轮的处理,如此循环处理,直到将NGNS中的节点都加入到CPNS中后CPNS中仍只剩下一个节点,那么此时终止构造过程,CPNS中剩下的唯一节点就是明文索引二叉树中的根节点,然后返回代表此二叉树的根节点即可。
加密明文二叉树索引得到密文二叉树索引过程如下:
明文二叉树索引包含这个文档集合的全部信息,所以需要先将明文二叉树索引加密成密文二叉树索引然后才能将密文二叉树索引上传到云服务器。其中,代表节点的九元组中u.P′,u.P″是剪枝向量u.PV按如下公式切分后生成的两个剪枝子向量后的加密形式,其中S向量充当切分指示器:
Figure PCTCN2019074061-appb-000082
Figure PCTCN2019074061-appb-000083
表示节点的九元组中u.P′和u.P″向量主要用于搜索阶段,如果节点是叶子结点,那么可以利用和Secure KNN算法相同的机制计算文档和查询向量之间的相关性分值;如果节点是中间节点,那么可以利用u.P′和u.P″和查询陷门之间的相关性分值决定是否进入当前中间节点的子树中进行搜索,即进行剪枝。
加密明文二叉树索引成密文二叉树索引的过程如下:当前根节点为空,返回;如果根节点不为空,首选需要按照如上的公式切分剪枝向量成剪枝子向量,然后使用可逆矩阵的转置矩阵加密剪枝子向量;为了避免明文文档的集合的信息泄露给云服务器,需要将节点中u.CV和u.PV字段都设置为NULL,同时将u.N字段设置为0;最后,如果当前根节点的左子树不为空,那么继续递归加密左子树,如果当前根节点的右子树不为空,继续递归加密右子树。直到整个所有的节点都加密过,此时返回密文二叉树索引的根节点即可。
利用二叉树索引来加速查询的过程如下:
目标结果集合用R表示,threshold则表示当前结果集合中节点和查询的相关性得分的最小值,K表示要获取的文档数目。在检索阶段,如果当前节点是叶子结点,并且R中节点个数小于K-1,那么将当前节点加入到R中,如果R中节点个数等于K-1,那么将当前节点加入到R中并且更新threshold值,如果R中节点个数等于K,并且当前叶子节 点和查询之间的相关性评分大于threshold值,那么从R中移除和查询最不相关的节点,然后加入当前叶子节点,同时更新threshold值;如果当前节点是中间节点,那么如果剪枝向量和查询陷门之间的相关性评分小于threshold值,那么当前节点所代表的子树可以直接剪枝掉,不用后续检索了,因为其相关性已经小于结果集R中最不相关的节点了,否则进入子树中继续检索。如此索引树遍历完毕。返回节点集合R。
前面所述的用于加密文档向量和陷门的都是一个(n+e)×(n+e)的可逆矩阵,其矩阵的求逆耗时非常长,而且构造索引阶段,每个切分后的文档子向量都需要左乘以一个可逆矩阵的你矩阵,其时间复杂度是O(N 2),如果有m个文档,总的时间复杂度是O(2mN 2)(如果考虑到二叉树索引树的构建,那么时间复杂度是O(log m·m 2·N 2),如果将两个可逆的满矩阵变为两个可逆的对角矩阵,那么其转置矩阵就是原矩阵,其逆矩阵也是一个对角矩阵,并且对角上每一个元素的值都是原矩阵相同位置处的元素的值的倒数。这样,其存储的开销从O(N 2)变为O(N)。构造索引的时间复杂度从O(2mN 2)变为O(2mN)(考虑到二叉树的构建,其总的时间复杂度从O(log m·m 2·N 2)减小为O(log m·m 2·N)),因为对角矩阵的引入,矩阵和向量的乘积的时间复杂度从O(N 2)变为O(N),所以无论是时间复杂度还是空间复杂都有一个数量级的降低,同时,在半可信(honest-but-curious即诚实但是好奇)的模型下,其安全性也可以规约到SecureKNN的安全性。
实施例二
下面以一个具体的例子来说明多关键字偏好搜索的具体过程,逻辑搜索的方案基本相似,前面也给出了其搜索正确性的形式化证明,这里不再赘述。对角矩阵主要是用于减少计算量,计算步骤基本一致,这里也不赘述。
(1)文档集合FS中各个文档的内容如下,这里方便说明程序的流程,各个文档都非常小。整个字典只有6个关键词,引入2个冗余关键词。所以整生成字典大小是8。
f1.txt:python java
f2.txt:java go
f3.txt:python go
f4.txt:cpp
f5.txt:c
f6.txt:javascript
f7.txt:python cpp c
f8.txt:python go java
(2)生成字典排好序后是:[c,cpp,go,java,javascript,mugvnxze,python,pzfv],其中“mugvnxzeh”和“pzfv”是引入的冗余关键词。
(3)各个文档对应生成的文档向量如下,其中冗余关键词的权重值服从均匀分布U(-0.01,0.01)。构造密文二叉树索引的过程中,每个文档对应的叶子结点的中心向量设置为此文档向量。
f1.txt
python:0.5493061443340549
java:0.6496414920651304
[0.000000,0.000000,0.000000,0.649641,0.000000,-0.007514,0.549306,0.003004]
f2.txt
java:0.6496414920651304
go:0.6496414920651304
[0.000000,0.000000,0.649641,0.649641,0.000000,0.008282,0.000000,0.003478]
f3.txt
python:0.5493061443340549
go:0.6496414920651304
[0.000000,0.000000,0.649641,0.000000,0.000000,-0.008594,0.549306,-0.004946]
f4.txt
cpp:1.6094379124341003
[0.000000,1.609438,0.000000,0.000000,0.000000,-0.006176,0.000000,-0.008033]
f5.txt
c:1.6094379124341003
[1.609438,0.000000,0.000000,0.000000,0.000000,0.003996,0.000000,0.007028]
f6.txt
javascript:2.1972245773362196
[0.000000,0.000000,0.000000,0.000000,2.197225,0.002741,0.000000,0.006191]
f7.txt
python:0.3662040962227032
cpp:0.5364793041447
c:0.5364793041447
[0.536479,0.536479,0.000000,0.000000,0.000000,-0.004668,0.366204,0.000613]
f8.txt
python:0.3662040962227032
java:0.4330943280434203
go:0.4330943280434203
[0.000000,0.000000,0.433094,0.433094,0.000000,-0.006085,0.366204,-0.003783]
(4)构造明文二叉树索引的过程中,首先通过文档中心向量计算最相关的两个节点。因为总共有8个节点,所以第一轮需要迭代4次,将8个文档集合分为4个小类。此时的四个小聚类分别是(f3,f1),(f8,f2),(f4,f7),(f5,f6)。然后通过提取中心向量和剪枝向量向上构造父节点。第二轮需要迭代两次,聚类的结果是(f3f1,f8f2),(f4f7,f5f6),第三轮只有两个节点,聚类的结果是只有一个节点(f3f1f8f2,f4f7f5f6),然后根据这两个节点构造根节点然后返回根节点即可,此过程生成的明文二叉树索引的附图-2所示。
(5)加密明文二叉树索引得到密文二叉树索引,即根据切分规则切分每个节点中的剪枝向量得到两个子向量P′、P″,然后使用可逆矩阵的转置矩阵加密两个子向量,并将相关的字段设置为NULL。
(6)提交的查询是″java python go″,需要获取的目标文档个数是2,根据用户的搜索历史建立的用户的兴趣偏好模型,赋予不同的关键词以不同的权重,权重如下:″c″:2,″cpp″:5,″javascript″:1,″python″:8,″java″:7,″go″:10,″scala″:6。那么根据用户提交的查询构造的查询向量Q以及查询向量Q经切分生成的子向量的加密形式Q′和Q″分别如下所示:
[0.000000,0.000000,115.059300,1.000000,0.000000,1.000000,19.450359,0.000000]
[174.797226,-190.718486,-16.424931,118.891982,-10.095257,58.659643,11.118955,-110.229204]
[2546.835577,-1077.082690,1838.242043,389.895225,-2904.909899,-1202.838724,1340.954562,-498.161811]
(7)检索的时候,在密文二叉树索引上采用深度优先算法向下递归检索。首先root根节点和陷门之间的相关性分值为86.08,然后继续向下遍历,遇到中间节点f3f1f8f2, 此节点和陷门之间的相关性分值为86.08;在往下遍历,中间节点f3f1和陷门之间的相关性分值为86.07;继续往下遍历,遇到的第一个节点叶子结点是f3.txt所代表的节点,此节点和陷门之间的相关性分值为85.42,因为是第一个叶子结点,所以直接加入到结果集中;继续遍历,遇到第二个叶子结点是f1.txt所代表的节点,此节点和陷门之间的相关性分值为11.32,加入到结果集合中,此时结果集合中f8.txt和f1.txt所代表的节点,阈值设置为11.32;然后回溯到中间节点f8f2,此中间节点和陷门之间的相关性分值为82.52,大于阈值11.32;然后进入此中间节点的左子树中搜索,遇到的第三个叶子结点是f8.txt文档所代表的节点,此节点和陷门之间的相关性分值为57.38,大于阈值11.32,所以将f1.txt所代表的节点从结果集合中移除,此时结果集合中有f3.txt和f8.txt所代表的节点,更新阈值为57.38;继续向下遍历,遇到第4个叶子节点f2.txt,此节点和陷门之间的相关性分值为75.40,大于阈值57.38,所以移除f8.txt所代表的节点,所以结果集合中只有f3.txt和f2.txt所代表的节点更新阈值为75.40;然后回溯到中间节点f4f7f5f6,此中间节点和陷门之间的相关性分值为7.12,小于阈值75.40,所以此分支可以直接剪枝掉,所以算法运行结束,返回的结果集合中有f3.txt和f2.txt所代表的节点集合,并且按照分值从高到低排列,然后根据节点获取文件描述符,这里假定就是节点的名字。这里简要分析一下搜索结果:f3.txt中包含的内容是“python go”,而f2.txt中包含的内容是“java go”,因为两篇文档中“go”的权重值相同,都是0.64,而f3.txt中的“python”关键词的用户的偏好比f2.txt中包含的“java”的用户偏好值更高,所以f3.txt返回的优先级更高。而f8.txt中内容“java python go”,其中“go”的权重值为0.43,而f2.txt中内容为“java go”,但是其中“go”的权重值更大,为0.64,所以f2.txt返回的优先级比f8.txt高。以上,实验结果和分析是吻合的。在本例子中,剪枝了4个子节点的运算,但是也增加了几个中间节点的运算,但是在较大规模的文档集合中搜索top-K个文档时,是可以通过大量的剪枝来保证搜索的效率。
(8)要实现可验证性,需要将f3.txt和f2.txt所代表的节点的消息摘要生成可验证对象,发送到数据使用者以后,数据使用者需要解密密文文档,重新构造每篇文档的消息摘要,并根据这些消息摘要重新构造可验证对象,通过判定新生成的可验证对象和服务器发回的可验证对象是否相等以决定是否接受此次查询结果。
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。

Claims (10)

  1. 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的加密方法包括下列步骤:
    数据拥有者作为数据的所有者进行可搜索加密的预处理,包括密钥的生成、文档的加密、索引的生成、摘要的生成,数据拥有者将所持有的文档数据加密得到密文文档集合E,数据拥有者基于文档集合FS构建安全索引
    Figure PCTCN2019074061-appb-100001
    同时数据拥有者生成消息摘要,然后,数据拥有者将加密后的密文文档集合E连同文档的消息摘要和密文索引
    Figure PCTCN2019074061-appb-100002
    上传到云服务器,同时,数据拥有者授权数据使用者访问其外包数据,即与数据使用者共享密钥,包括用于加密文档的对称密钥和加密陷门的秘密钥;
    数据使用者作为和数据的所有者共享密钥的用户,向云服务器提交查询以进行搜索,当数据使用者想要检索文档时,首先,将查询转为查询陷门T Q,然后,查询陷门T Q以及数据使用者的目标文档数目被提交到云服务器;一旦云服务器接收到查询陷门T Q,云服务器执行计算任务;计算完成,云服务器返回排序后top-K个最相关的文档以及相关的验证对象;最后,数据使用者接收到top-K个最相关的文档和相关的验证对象,执行验证算法来验证搜索结果的准确性和完全性,然后再解密得到搜索结果;
    云服务器向数据拥有者提供“按需计费”的存储和计算服务,向数据使用者提供查询服务,云服务器存储有密文文档和密文索引,一旦接收到来自数据使用者发送过来的查询陷门T Q和目标数目K,云服务器利用密文索引
    Figure PCTCN2019074061-appb-100003
    和查询陷门T Q进行安全检索,得到top-K个最相关的加密文档,按与查询的相关性大小排序后生成验证对象,然后将top-K个最相关的密文文档和验证对象发送给数据使用者。
  2. 根据权利要求1所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的数据拥有者生成用户加密文档的对称密码和加密文档向量的秘密钥,即两个可逆矩阵和一个随机的比特向量;
    所述的数据拥有者使用对称密钥来加密文档内容,同时使用向量空间模型和TF×IDF来抽象文档的内容以及相关权重,构造明文二叉树索引后,然后递归加密整个二叉树索引,加密过程中根据随机比特向量将文档向量按照一定的规则切分成两个子向量,然后分别使用两个可逆矩阵的转置矩阵来加密两个子向量,同时数据拥有者根据文档内容和密钥生成文档的消息摘要用于搜索阶段构造验证对象以保证可验证性。
  3. 根据权利要求1所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的数据拥有者使用二叉树来组织索引;其中,构造二叉树过程如下:首先每个文档作为一个叶子结点,然后会选择当前最相关的两个文档,然后按照一定的规则构造其父节点,然后继续选择剩余的节点最相关的两个节点,继续向上规约得到父节点,如此自底向上构造出明文二叉树索引,然后再递归加密整个明文二叉树索引得到密文二叉树索引。
  4. 根据权利要求1所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的数据使用者根据提交的查询生成查询的向量表示形式,根据用户的历史偏好信息结合查询信息构造查询向量以支持偏好搜索,根据逻辑联结的关键词以及构造的数值序列来构造查询向量以支持逻辑搜索,在生成查询向量以后,数据使用者根据随机比特向量按照规则将查询向量分成两个子向量,然后分别使用两个可逆矩阵的逆矩阵来加密两个查询子向量得到加密后的查询陷门;
    所述的数据使用者生成查询陷门后,会将查询陷门和待获取的目标文档数目发送到云服务器,获取云服务器返回的按照相关性排好序的top-K个最相关的密文文档和验证对象以后,会通过验证算法验证搜索结果的完整性和准确性,然后解密后得到结果文档集合。
  5. 根据权利要求1所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的云服务器接收到数据拥有者生成的密文二叉树索引、密文文档和消息摘要以及数据使用者生成的加密查询陷门和目标文档数目后,会在密文二叉树索引上按照查询和索引,即文档和查询之间的相关性评分获取top-K个最相关的文档,一旦检索结束,得到top-K个最相关的文档,会按照相关性评分排序,然后生成验证对象后发送到数据使用者。
  6. 根据权利要求1所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的加密方法包括:
    密钥生成阶段GenKey(1 l(n)):
    初始化阶段,数据拥有者生成一样四元组的安全密钥SK=(S,M 1,M 2,k f),其中S是一个(n+e)比特的随机向量,M 1和M 2是两个(n+e)×(n+e)的可逆矩阵,其中n是生成的字典大小,e是为了防止陷门的不可链接性而引入的冗余关键词的数目,k f是一个对称加密密钥,安全密钥SK仅仅在数据拥有者和数据使用者之间共享,云服务器 不知道安全密钥SK的任何信息;
    构造索引阶段BuildIndex(FS,SK):
    根据构造索引二叉树算法,对节点u中存储的每一个文档向量u.PV使用如下的公式计算:
    Figure PCTCN2019074061-appb-100004
    其中
    Figure PCTCN2019074061-appb-100005
    表示w i在F d中的TF值,
    Figure PCTCN2019074061-appb-100006
    是关键词w i出现在文档F d的频率,构造出了明文二叉树索引以后,然后加密得到密文二叉树索引,加密过程中,针对剪枝向量应用如下的切分规则得到两个随机子向量{P′,P″},其中SK.S充当切分指示器,切分规则如下:
    Figure PCTCN2019074061-appb-100007
    u.PV向量的加密形式是
    Figure PCTCN2019074061-appb-100008
    对于索引树中的每一个节点,u.PV被替换成了其加密形式
    Figure PCTCN2019074061-appb-100009
    生成陷门阶段GenTrapdoor(S q,k,SK):
    假定S q={w 1,w 2,...,w t}是用户的查询关键字集合,S q的向量形式是Q,每一个维通过如下的公式进行计算:
    Figure PCTCN2019074061-appb-100010
    然后,进行归一化操作,随后,Q被切分成两个随机子向量{Q′,Q″},SK.S充当切分指示器,切分规则如下:
    Figure PCTCN2019074061-appb-100011
    Q的加密形式是
    Figure PCTCN2019074061-appb-100012
    然后数据拥有者将陷门T Q传给云服务器,T Q包括
    Figure PCTCN2019074061-appb-100013
    和待获取的目标文档数目K;
    搜索阶段
    Figure PCTCN2019074061-appb-100014
    云服务器执行一个深度优先的算法来获取结果集合R,然后构造一个可验证的对象VO,然后,云服务器返回结果集合R和VO给数据使用者,密文二叉树索引的搜索算法执行过程中,u.PV和Q向量的加密形式之间相关性评分计算下所示:
    Figure PCTCN2019074061-appb-100015
    其中u.PV为未加密过的文档向量,Q为未加密过的查询向量,此计算结果表明索引和陷门之间的相关性评分与明文文档向量和查询之间的相关性评分相等或者成正比关系;
    阶段验证阶段Verify(R,VO,SK):
    数据使用者使用密钥k f来解密搜索结果并验证搜索结果的正确性和完整性,密文二叉树索引的每一个叶子结点都包含当前文档的消息摘要,云服务器利用获取的top-K个文档的消息摘要生成验证对象并发送到数据使用者,数据使用者接收到top-K个密文文档以及验证对象后,会解密每个文档,然后结合密钥k f生成文档的消 息摘要,根据这些新生成的文档的消息摘要生成新的验证对象VO′,通过判定云服务器返回的验证对象和数据使用者新生成的验证对象是否相等,即VO′是否等于VO,数据使用者决定是否接受此次查询的结果。
  7. 根据权利要求6所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的构造索引阶段和所述的生成陷门阶段分别调整如下:
    所述的构造索引阶段,由文档构造文档向量的过程中,文档向量中字典中包含的关键词对应的每一个维度用如下的公式来计算:
    Figure PCTCN2019074061-appb-100016
    公式中,
    Figure PCTCN2019074061-appb-100017
    表示关键词w i在文档F j中的TF值,
    Figure PCTCN2019074061-appb-100018
    表示有多少个文档中包含了关键词w i,N表示文档集合中文档的个数,|F j|表示文档F j的长度,即包含的关键词的数目,而冗余关键词对应的维度的值服从均匀分布U(θ-σ,θ+σ),均匀分布的均值θ和方差σ需要根据实验中的数据确定;
    所述的生成陷门阶段,使用用户的历史偏好和提交的查询构建查询的向量表示形式,首先用户提交的查询中的关键词按照重要性递增序排列
    Figure PCTCN2019074061-appb-100019
    其中,1≤n 1<n 2<…<n l≤m,然后数据使用者随机生成一个超递增序列如下:d 1>0,d 2,...,d l满足
    Figure PCTCN2019074061-appb-100020
    d i是关键词
    Figure PCTCN2019074061-appb-100021
    的偏好因子,即查询向量相应关键词处的值,而查询向量中冗余关键词的位置则随机置1,此时搜索的结果如下表示为:
    Figure PCTCN2019074061-appb-100022
    其中s是因为引入的冗余关键词而引入的扰动值。
  8. 根据权利要求6所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,所述的密钥生成阶段、所述的构造索引阶段、所述的生成陷门阶段分别调整如下:
    所述的密钥生成阶段,数据拥有者在初始化时生成一样四元组的安全密钥KS=(S,M 1,M 2,k f),其中S是一个(n+1)比特的随机向量,M 1和M 2是两个(n+1)×(n+1)的可逆矩阵,其中n是生成的字典大小,1是构造查询陷门的需要,k f是一个对称加密密钥;
    所述的构造索引阶段,将文档转为文档向量的过程中,文档向量的每一维仅表示当前文档和字典中关键词的包含关系,1表示当前文档包含特定的关键词,0表示当前文档不包含特定的关键词,其中第(n+1)维置1,其他切分规则和构造二叉树索引的机制不变;
    所述的生成陷门阶段,假设查询中和“OR”,“AND”,“NO”相关的关键词集合 分别是
    Figure PCTCN2019074061-appb-100023
    同时使用符号∨,∧,
    Figure PCTCN2019074061-appb-100024
    表示数学意义上的“OR”“AND”,“NO”,匹配规则表示为
    Figure PCTCN2019074061-appb-100025
    Figure PCTCN2019074061-appb-100026
    对于“OR”操作,数据使用者构造一个超递增序列a j(j=1,2,...,l 1),
    Figure PCTCN2019074061-appb-100027
    来赋值给“AND”搜索关键字的权重值,为了实现“AND”和“NO”操作,同样,数据使用者构造两个超递增序列b j(j=1,2,...,l 2)c j(j=1,2,...,l 3)满足条件式
    Figure PCTCN2019074061-appb-100028
    和条件式
    Figure PCTCN2019074061-appb-100029
    Figure PCTCN2019074061-appb-100030
    假设
    Figure PCTCN2019074061-appb-100031
    是根据重要性递增排序的,那么搜索关键词集合
    Figure PCTCN2019074061-appb-100032
    那么在Q相关位置中的权重值被设置为
    Figure PCTCN2019074061-appb-100033
    其他位置的值则被设置为0,同时查询向量中第(n+1)维设置为
    Figure PCTCN2019074061-appb-100034
    在查询结果,如果对于文档F j,其结果R j>0,那么F j就满足逻辑搜索的要求。
  9. 根据权利要求6所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,先构造明文二叉树索引,然后加密明文二叉树索引得到密文二叉树索引,密文二叉树索引能根据节点和陷门之间的相关性分值决定是否剪枝相关的子树以提升搜索的速度,明文二叉树索引的构造过程及基本结构如下:
    (1)明文二叉树索引的节点u是一个九元组(P′,P″,PV,CV,N,PL,PR,FD,sig),其中u.PL,u.PR是指向左右节点的指针;u.FD是文档的唯一描述符;u.sig是根据文档内容生成的消息摘要;u.CV表示聚类C u的聚类中心向量,u.N表示聚类C u中文档的数目,聚类C u代表的是以u为根节点的子树中所有的叶子结点相关联的文档;需要说明的是因为u.CV,u.N和u.PV仅仅存在于构造明文二叉树索引阶段,将明文二叉树索引加密时,需要在加密过程中将每个节点中u.CV和u.PV字段都设置为NULL,将u.N字段设置为0;
    节点u主要有两种类型,分别是叶子结点和中间节点:
    a.如果u是叶子结点,那么u.PL=u.PR=φ;u.FD存储的是文档的文件描述符;u.CV和u.PV都存储当前的文档向量;u.P′和u.P″分别代表u.PV切分后的子向量的加密形式,此时都设置为默认值NULL,u.N=1;u.sig存储当前文档的消息摘要,消息摘要主要用于搜索过程结束后生成验证对象,数据使用者用接收到的验证对象验证搜索结果的完整性和准确性;
    b.如果u是一个内部的中间节点,那么u.FD=φ,u.sig=φ,u.PL和u.PR指向节点u的左右孩子节点,u.N=u.PL.N+u.PR.N,而u.PV是由节点u的两个孩子节点的各自的剪枝向量PV生成,u.CV则是从节点u的两个孩子节点的各自的聚类中心向量CV生成的,u.VC聚类中心向量主要用于构造密文二叉树索引的过程中,用于查找最相关的节点;生成规则如下:
    Figure PCTCN2019074061-appb-100035
    聚类中心向量用于计算节点与节点之间的相关性评分,用于在构造明文二叉树索引过程中查找最相近的两个节点并构造其父节点,而剪枝向量会生成两个子向量并会利用可逆矩阵进行加密,用于在二叉树深度优先的检索过程中计算和陷门即查询之间的相关性评分以决定是否进入当前子树中进行检索;
    (2)在索引二叉树的构造过程中,当前处理节点集合CPNS代表当前一轮处理的节点集合,待处理节点集合NGNS代表下一轮处理的节点集合;首先初始化每一个文档为一个包含五元组的节点,将所有的文档节点都加入到CPNS中;当CPNS中节点个数大于1的时候,不断的找到两个最相关的节点,即计算所有节点的聚类中心向量的相关分值最大的两个节点,也可以理解为最相似的两个文档,然后按照前面提到规则构造最大相关分值的两个节点的父节点,然后将构造出的父节点加入到NGNS中,然后从CPNS中移除刚找到的两个节点,如此处理直到CPNS中节点个数小于或等于1,然后将NGNS中的节点加入到CPNS中,进行新一轮的处理,如此循环处理,直到将NGNS中的节点都加入到CPNS中后CPNS中仍只剩下一个节点,那么此时终止构造过程,CPNS中剩下的唯一节点就是明文索引二叉树中的根节点,然后返回代表此二叉树的根节点即可。
  10. 根据权利要求6所述的一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法,其特征在于,利用二叉树索引来加速查询的过程如下:
    目标结果集合用R表示,threshold则表示当前结果集合中节点和查询的相关性得分的最小值,K表示要获取的文档数目,在检索阶段,如果当前节点是叶子结点,并且R中节点个数小于K-1,那么将当前节点加入到R中,如果R中节点个数等于K-1,那么将当前节点加入到R中并且更新threshold值,如果R中节点个数等于K,并且当前叶子节点和查询之间的相关性评分大于threshold值,那么从R中移除和查询最不相关的节点,然后加入当前叶子节点,同时更新threshold值;如果当前节点是中间节点,那么如果剪枝向量和查询陷门之间的相关性评分小于threshold值,那么当前节点所代表的子树可以直接剪枝掉,不用后续检索,否则进入子树中继续检索,如此索引树遍历完毕,返回节点集合R。
PCT/CN2019/074061 2018-02-28 2019-01-31 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法 WO2019165880A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810169347.7 2018-02-28
CN201810169347.7A CN108388807B (zh) 2018-02-28 2018-02-28 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法

Publications (1)

Publication Number Publication Date
WO2019165880A1 true WO2019165880A1 (zh) 2019-09-06

Family

ID=63069587

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/074061 WO2019165880A1 (zh) 2018-02-28 2019-01-31 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法

Country Status (2)

Country Link
CN (1) CN108388807B (zh)
WO (1) WO2019165880A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127498A (zh) * 2022-11-28 2023-05-16 中国民用航空总局第二研究所 一种密文检索结果可验证的多关键字可搜索加密方法
CN116127498B (zh) * 2022-11-28 2024-06-07 中国民用航空总局第二研究所 一种密文检索结果可验证的多关键字可搜索加密方法

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388807B (zh) * 2018-02-28 2020-05-22 华南理工大学 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法
CN110858251B (zh) * 2018-08-22 2020-07-21 阿里巴巴集团控股有限公司 数据查询方法和装置
CN110162617B (zh) * 2018-09-29 2022-11-04 腾讯科技(深圳)有限公司 提取摘要信息的方法、装置、语言处理引擎和介质
CN109492410B (zh) * 2018-10-09 2020-09-01 华南农业大学 数据可搜索加密和关键词搜索方法、系统及终端、设备
CN109471723B (zh) * 2018-10-23 2023-10-27 三六零科技集团有限公司 一种用于对任务的处理结果进行验证的方法及系统
CN109740362B (zh) * 2019-01-03 2021-02-26 中国科学院软件研究所 一种基于熵编码的密文索引生成与检索方法及系统
CN109885640B (zh) * 2019-01-08 2021-05-11 南京邮电大学 一种基于α叉索引树的多关键词密文排序检索方法
CN109885650B (zh) * 2019-01-08 2021-05-11 南京邮电大学 一种外包云环境隐私保护密文排序检索方法
CN109815723A (zh) * 2019-02-28 2019-05-28 东北大学 一种基于后缀树的可搜索加密系统及方法
CN109992995B (zh) * 2019-03-05 2021-05-14 华南理工大学 一种支持位置保护和查询隐私的可搜索加密方法
US11048816B2 (en) * 2019-04-02 2021-06-29 Sap Se Secure database utilizing dictionary encoding
CN110069944A (zh) * 2019-04-03 2019-07-30 南方电网科学研究院有限责任公司 一种可搜索加密的数据检索方法及系统
CN110120871B (zh) * 2019-05-23 2021-09-28 福建师范大学 一种私钥和密文长度固定的广播加密方法和系统
CN110908959A (zh) * 2019-10-30 2020-03-24 西安电子科技大学 一种支持多关键字和结果排序的动态可搜索加密方法
CN110928980B (zh) * 2019-11-15 2023-05-30 中山大学 一种面向移动云计算的密文数据存储与检索方法
CN111026754B (zh) * 2019-12-05 2022-12-02 中国科学院软件研究所 一种安全高效的圆形范围数据上传、查询方法及相应存储介质与电子装置
CN113094573A (zh) * 2020-01-09 2021-07-09 中移(上海)信息通信科技有限公司 多关键词排序可搜索加密方法、装置、设备及存储介质
CN111274247B (zh) * 2020-01-17 2023-04-14 西安电子科技大学 一种基于密文时空数据的可验证范围查询方法
CN111404679B (zh) * 2020-03-10 2023-08-08 上海市大数据中心 一种面向大数据的安全认证的密文检索方法
CN111400624A (zh) * 2020-03-17 2020-07-10 广东电网有限责任公司 一种多功能排序系统
CN112199420A (zh) * 2020-10-16 2021-01-08 成都房联云码科技有限公司 一种房产隐私字段信息模糊搜索方法
CN112311781B (zh) * 2020-10-23 2021-11-12 西安电子科技大学 一种前后向安全且具有可恢复关键字屏蔽的加密方法
CN112328733B (zh) * 2020-10-28 2022-10-04 浙江工商大学 基于MinHash函数的中文多关键字模糊排序可搜索加密方法
CN112328606B (zh) * 2020-11-30 2023-02-21 齐鲁工业大学 基于区块链的关键字可搜索加密方法
CN114676449B (zh) * 2022-05-26 2022-10-18 南京畅洋科技有限公司 一种基于可验证数据库的物联网数据可搜索加密方法
CN115622700B (zh) * 2022-11-28 2023-03-31 南方电网数字电网研究院有限公司 用电数据加密搜索方法、装置、计算机设备和存储介质
CN117349898B (zh) * 2023-12-05 2024-03-08 中国电子科技集团公司第十研究所 一种访问模式隐藏的密文k近邻查询方法与系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615692A (zh) * 2015-01-23 2015-05-13 重庆邮电大学 一种支持动态更新及多关键字安全排序的可搜索加密方法
CN105812141A (zh) * 2016-03-07 2016-07-27 东北大学 一种面向外包加密数据的可验证交集运算方法及系统
CN106326360A (zh) * 2016-08-10 2017-01-11 武汉科技大学 一种云环境中密文数据的模糊多关键词检索方法
CN108388807A (zh) * 2018-02-28 2018-08-10 华南理工大学 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391502B (zh) * 2016-05-16 2020-08-04 阿里巴巴集团控股有限公司 时间间隔的数据查询方法、装置及索引构建方法、装置
CN106997384B (zh) * 2017-03-24 2020-01-14 福州大学 一种排序可验证的语义模糊可搜索加密方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615692A (zh) * 2015-01-23 2015-05-13 重庆邮电大学 一种支持动态更新及多关键字安全排序的可搜索加密方法
CN105812141A (zh) * 2016-03-07 2016-07-27 东北大学 一种面向外包加密数据的可验证交集运算方法及系统
CN106326360A (zh) * 2016-08-10 2017-01-11 武汉科技大学 一种云环境中密文数据的模糊多关键词检索方法
CN108388807A (zh) * 2018-02-28 2018-08-10 华南理工大学 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HONGWEI L I: "Enabling Fine-Grained Multi-Keyword Search Supporting Cla- ssified Sub-Dictionaries over Encrypted Cloud Data", IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, vol. 13, no. 3, 30 June 2016 (2016-06-30), pages 315 - 319, XP011610236 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127498A (zh) * 2022-11-28 2023-05-16 中国民用航空总局第二研究所 一种密文检索结果可验证的多关键字可搜索加密方法
CN116127498B (zh) * 2022-11-28 2024-06-07 中国民用航空总局第二研究所 一种密文检索结果可验证的多关键字可搜索加密方法

Also Published As

Publication number Publication date
CN108388807B (zh) 2020-05-22
CN108388807A (zh) 2018-08-10

Similar Documents

Publication Publication Date Title
WO2019165880A1 (zh) 一种支持偏好搜索和逻辑搜索的高效可验证的多关键字排序可搜索加密方法
CN110224986B (zh) 一种基于隐藏策略cp-abe的高效可搜索访问控制方法
US11023477B2 (en) Method and system for fuzzy keyword search over encrypted data
CN106815350B (zh) 一种云环境中动态的密文多关键词模糊搜索方法
US10235335B1 (en) Systems and methods for cryptographically-secure queries using filters generated by multiple parties
Chuah et al. Privacy-aware bedtree based solution for fuzzy multi-keyword search over encrypted data
CN111026788B (zh) 一种混合云中基于同态加密的多关键词密文排序检索方法
CN109145079B (zh) 基于个人兴趣用户模型的云端可搜索加密方法
CN109992995B (zh) 一种支持位置保护和查询隐私的可搜索加密方法
WO2022099495A1 (zh) 云计算环境中的密文搜索方法及系统、设备
US20180189511A1 (en) Method and System for Range Search on Encrypted Data
CN112332979B (zh) 云计算环境中的密文搜索方法及系统、设备
CN112328606B (zh) 基于区块链的关键字可搜索加密方法
CN115314295B (zh) 一种基于区块链的可搜索加密技术方法
CN109739945A (zh) 一种基于混合索引的多关键词密文排序检索方法
CN110727951B (zh) 具有隐私保护的轻量级外包文件多关键词检索方法及系统
Yi et al. Private searching for single and conjunctive keywords on streaming data
CN106874379B (zh) 一种面向密文云存储的多维区间检索方法与系统
CN110990518A (zh) 一种智能电网非结构化数据安全方法
Zhang et al. Efficient searchable symmetric encryption supporting dynamic multikeyword ranked search
Chi et al. Privacy-enhancing range query processing over encrypted cloud databases
CN113158245A (zh) 一种文档搜索的方法、系统、设备及可读存储介质
YueJuan et al. A Searchable Ciphertext Retrieval Method Based on Counting Bloom Filter over Cloud Encrypted Data
Kamini et al. Encrypted multi-keyword ranked search supporting gram based search technique
Etemad et al. Verifiable dynamic searchable encryption

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19760606

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17/12/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 19760606

Country of ref document: EP

Kind code of ref document: A1