CN108363689B - Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud - Google Patents

Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud Download PDF

Info

Publication number
CN108363689B
CN108363689B CN201810122376.8A CN201810122376A CN108363689B CN 108363689 B CN108363689 B CN 108363689B CN 201810122376 A CN201810122376 A CN 201810122376A CN 108363689 B CN108363689 B CN 108363689B
Authority
CN
China
Prior art keywords
document
vector
retrieval
keyword
cloud server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810122376.8A
Other languages
Chinese (zh)
Other versions
CN108363689A (en
Inventor
戴华
朱向洋
杨庚
白双杰
史经启
孙彦珺
王敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201810122376.8A priority Critical patent/CN108363689B/en
Publication of CN108363689A publication Critical patent/CN108363689A/en
Application granted granted Critical
Publication of CN108363689B publication Critical patent/CN108363689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Hardware Design (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a privacy protection multi-keyword Top-k ciphertext retrieval method and system facing to mixed cloud. The invention mainly solves the problem of low retrieval efficiency. The scheme is as follows: the data providing end generates a keyword dictionary sequence by utilizing the correlation among the keywords through a clustering technology; each document generates a high-dimensional document vector and a low-dimensional document filtering vector, and then the ciphertext document and the encrypted document vector are outsourced to an untrusted public cloud server, and the plaintext document filtering vector is stored in a trusted private cloud server. During retrieval, the candidate document set is calculated through the private cloud server, and then the Top-k document calculation of the retrieval result is achieved through the public cloud server. The gathering characteristic of related keywords in the keyword dictionary sequence improves the filtering effect of the private cloud server, and the size of the candidate document set is reduced. The method is simple in flow, high in safety and easy to implement, and can realize efficient multi-keyword ciphertext retrieval processing in a mixed cloud environment through less calculation overhead.

Description

Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud
Technical Field
The invention relates to user data privacy protection, in particular to a hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method and system.
Background
The idea of serving IT resources is becoming more and more popular, and IT shows a trend of "all services" (XaaS), which becomes a core concept of cloud computing. However, while cloud computing is developing vigorously, cloud security is also becoming a problem of widespread concern. In a Cloud environment, since a user cannot directly control data placed in a remote Cloud Server (CS), there is a fear that own outsourced data is illegally acquired or abused by a Cloud service provider, especially for sensitive data with high privacy requirements, such as electronic medical records, bank transaction data, user mails, and the like. Although cloud service providers claim that they provide some security countermeasures to deal with privacy disclosure problems, such as access control technology, firewall technology, intrusion detection technology, and the like, user concerns about data security problems are undoubtedly major issues that inhibit further development of cloud computing.
A common method for protecting data privacy is to outsource data to a public cloud server after data encryption processing, but this severely restricts the use of outsourced data. In the field of information retrieval research, the conventional multi-keyword retrieval is mainly oriented to plaintext data and cannot be directly applied to the field of ciphertext retrieval. Downloading all encrypted data from the cloud to the local for decryption is obviously an impractical and resource-wasting processing method. Therefore, it is a challenging problem to research and solve a ciphertext data retrieval mechanism with a privacy protection function in a cloud environment, which has become one of the hot issues of concern in the field of cloud computing research in recent years.
In the prior art, most methods adopt public cloud service by default, and a series of multi-keyword ciphertext retrieval processing methods in encrypted cloud environment are provided on the basis of the assumption that public cloud provides service in a semi-honest model mode, but the methods have one or more problems of low retrieval efficiency, inaccurate retrieval result, complex index tree construction and the like.
Aiming at the problems, the Chinese patent application with application number 201710181664.6 discloses a fast multi-keyword semantic sorting search method for protecting data privacy in cloud computing, a private cloud server is added, a document vector is created for each document, a corresponding identification vector is created at the same time, an encrypted document vector is outsourced to a public cloud server, a plaintext identification vector is stored in the private cloud server, document set primary filtering operation is realized through the private cloud server, the number of document vectors related to retrieval vector score computing is reduced, and retrieval computing cost is reduced. Therefore, how to improve the filtering effect of the private cloud server plays an important role in improving the multi-keyword ciphertext retrieval efficiency supporting privacy protection in the hybrid cloud.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a privacy protection multi-keyword Top-k ciphertext retrieval method and system facing to a mixed cloud.
The technical scheme is as follows: the privacy protection multi-keyword Top-k ciphertext retrieval method facing the mixed cloud comprises the following steps:
(1) the data providing end extracts a keyword set from the provided document set and generates a keyword dictionary sequence through clustering and partitioning; generating a corresponding plaintext document vector for each document in the document set according to the keyword dictionary sequence, and blocking the plaintext document vectors according to the blocking condition of the keyword dictionary sequence to form document filtering vectors; encrypting the plaintext document vector to form an encrypted document vector, and encrypting each document in the document set to form an encrypted document set; finally, the document filtering vector is transmitted to a private cloud server, and the encrypted document vector and the encrypted document set are transmitted to a public cloud server;
(2) the data retrieval end generates retrieval vectors according to a plurality of keywords provided by a user, generates a retrieval trapdoor by adopting a security algorithm after normalization, and transmits the retrieval trapdoor and the number k of documents to be retrieved by the user to a public cloud server; generating a retrieval filtering vector for a plurality of keywords provided by a user according to the blocking condition of the keywords in the keyword dictionary sequence, and transmitting the retrieval filtering vector to a private cloud server;
(3) the private cloud server respectively performs AND operation on the received retrieval filtering vector and the document filtering vector of each document, if all bits of the vector obtained through operation are not all 0, the corresponding document number is added to the candidate document set, and the candidate document set is transmitted to the public cloud server;
(4) the public cloud server respectively calculates a security inner product between an encrypted document vector corresponding to each document in the candidate document set and the retrieval trapdoor according to the received candidate document set, the retrieval trapdoor and the number k of the retrieved documents, selects k ciphertext documents most relevant to the keywords provided by the user in the candidate document set according to the security inner product, and returns the k ciphertext documents to the data retrieval end;
(5) and the data retrieval end decrypts the received k ciphertext documents to obtain the most relevant k plaintext documents.
Further, the step (1) specifically comprises:
(1-1) extracting keywords from the provided document set DS by the data providing terminal to obtain a keyword set { w1,w2,…,wn};
(1-2) clustering the keywords in the keyword set according to the correlation relationship to obtain a plurality of clustering sub-clusters { c1,c2,…,ct};
(1-3) taking each sub-cluster as a block, thereby obtaining t blocks, b1,b2,…,btAnd generating a keyword dictionary sequence W ═ { W (b) according to the blocks1,1),w(b1,2),…,w(b2,1),w(b2,2),…,w(bt,1),w(bt2), … }, wherein w (b)jX) denotes belonging to the partition bjThe xth keyword in each block is unordered; block bj={w(bj,x)|0<x≤|bj|};
(1-4) adopting TF-IDF algorithm and space vector model according to the keyword in dictionary sequenceThe position of the keyword is the document set DS ═ D1,D2,…,DmEvery document D iniGenerating a corresponding plaintext document vector ViAnd carrying out normalization processing; wherein, ViIs n, each bit takes the value of the key word corresponding to the bit in the document DiThe word frequency TF value of (1);
(1-5) according to the block situation of the keyword dictionary sequence, the plaintext document vector ViDividing the document into t blocks, wherein the block boundaries are the same as those of the keyword dictionary sequence, and obtaining each document DiDocument filter vector DF ofi={b1,b2,…,bt}; wherein, if ViMiddle block bjAll the positions of the corresponding keywords are taken as 0, and b isjThe value of the block is 0, otherwise bjThe value of a block is 1, DFiIs a vector with 0/1 values of each bit of the t dimension;
(1-6) generating an encryption Key SK (S, M)1,M2,kf) (ii) a Wherein S is a random vector with 0/1 values of each bit, and M is1And M2Is two nxn invertible matrices, n is the length of the keyword dictionary sequence, kfIs a document encryption key;
(1-7) pairing each plaintext document vector V with the generated encryption key by the secure KNN techniqueiEncrypting to obtain corresponding encrypted document vector
Figure BDA0001572489800000031
Wherein, when the jth element S [ j ] in the random vector S]When equal to 0, Vi′+Vi″=ViWhen S [ j ]]When 1, Vi′=Vi″=Vi
(1-8) encrypting each document in the document set DS through a symmetric encryption algorithm to obtain an encrypted document set ES ═ e1,e2,…,em};
And (1-9) transmitting the document filtering vector to a private cloud server for storage, and transmitting the encrypted document vector and the encrypted document set to a public cloud server for storage.
Further, the step (2) specifically comprises:
(2-1) the data retrieval end provides a plurality of keywords w according to the user1,w2,…,wxGenerating a retrieval vector Q by adopting a TF-IDF algorithm and a space vector model, and normalizing; wherein the jth element of Q, Q [ j]Is the j-th keyword wjThe inverse document frequency IDF value in the document set DS provided by the data providing end;
(2-2) generating a retrieval trapdoor by adopting a safe KNN algorithm based on the retrieval vector Q
Figure BDA0001572489800000032
Wherein, when the j bit S [ j ] of random vector in the encryption key generated by the data providing terminal]When equal to 0, Q' [ j ]]=Q″[j]=Q[j]When S [ j ]]When 1, Q' [ j]+Q″[j]=Q[j];
(2-3) generating a retrieval filtering vector QF according to the block condition of the keywords in the keyword dictionary sequence, wherein the QF is equal to { b }1,b2,…,btQF is a vector with a t-dimension each bit value of 0/1, if the block is bjAll the corresponding key words in the retrieval vector Q have the value of 0, and then QF [ j]0, otherwise QF [ j]=1;
And (2-4) transmitting the retrieval trapdoor and the number k of the documents to be retrieved by the user to a public cloud server, and transmitting the retrieval filtering vector to a private cloud server.
Further, the step (3) specifically comprises:
(3-1) the private cloud server filtering the received retrieval filtering vector QF and the document filtering vector DF of each documentiRespectively perform AND operation if QF&DFiIf all bits of the vector obtained by the operation (2) are not all 0, then DF is determinediCorresponding document number DidiAdding the document into the candidate document set to obtain a candidate document set CDS ═ d1,d2,…};
And (3-2) sending the candidate document set CDS to a public cloud server.
The privacy protection multi-keyword Top-k ciphertext retrieval system facing the hybrid cloud comprises a data providing end, a data retrieval end, a private cloud server and a public cloud server, wherein:
the data providing end is used for extracting a keyword set from the provided document set and generating a keyword dictionary sequence through clustering and partitioning; generating a corresponding plaintext document vector for each document in the document set according to the keyword dictionary sequence, and blocking the plaintext document vectors according to the blocking condition of the keyword dictionary sequence to form document filtering vectors; encrypting the plaintext document vector to form an encrypted document vector, and encrypting each document in the document set to form an encrypted document set; transmitting the document filtering vector to a private cloud server, and transmitting the encrypted document vector and the encrypted document set to a public cloud server;
the data retrieval end is used for generating retrieval vectors according to a plurality of keywords provided by a user, generating a retrieval trapdoor by adopting a security algorithm after normalization, and transmitting the retrieval trapdoor and the number k of documents to be retrieved by the user to the public cloud server; generating a retrieval filtering vector for a plurality of keywords provided by a user according to the blocking condition of the keywords in the keyword dictionary sequence, and transmitting the retrieval filtering vector to the private cloud server;
the private cloud server is used for respectively carrying out AND operation on the received retrieval filtering vector and the document filtering vector of each document, if all bits of the vector obtained by the operation are not all 0, adding the corresponding document number to the candidate document set, and transmitting the candidate document set to the public cloud server;
the public cloud server is used for respectively calculating a security inner product between an encrypted document vector corresponding to each document in the candidate document set and the retrieval trapdoor according to the received candidate document set, the retrieval trapdoor and the number k of the retrieved documents, selecting k ciphertext documents most relevant to the keywords provided by the user in the candidate document set according to the security inner product, and returning the k ciphertext documents to the data retrieval end;
and the data retrieval end is also used for decrypting the received k ciphertext documents to obtain the most relevant k plaintext documents.
Further, the data providing end specifically includes:
a keyword extraction module for extracting from the provided document set DSKeywords to obtain a keyword set { w1,w2,…,wn};
A clustering module for clustering the keywords in the keyword set according to the correlation relationship to obtain a plurality of clustering sub-clusters { c1,c2,…,ct};
A keyword dictionary generating module for using each sub-cluster as a block to obtain t blocks, b1,b2,…,btAnd generating a keyword dictionary sequence W ═ { W (b) according to the blocks1,1),w(b1,2),…,w(b2,1),w(b2,2),…,w(bt,1),w(bt2), … }; wherein w (b)jX) denotes belonging to the partition bjThe xth keyword in each block is unordered; block bj={w(bj,x)|0<x≤|bj|};
A plaintext document vector generation module for generating a document set DS (D) according to the position of the keyword in the keyword dictionary sequence by adopting a TF-IDF algorithm and a space vector model1,D2,…,DmEvery document D iniGenerating a corresponding plaintext document vector ViAnd carrying out normalization processing; wherein, ViIs n, each bit takes the value of the key word corresponding to the bit in the document DiThe word frequency TF value of (1);
a document filtering vector generation module for generating a plaintext document vector V according to the block condition of the keyword dictionary sequenceiDividing the document into t blocks, wherein the block boundaries are the same as those of the keyword dictionary sequence, and obtaining each document DiDocument filter vector DF ofi={b1,b2,…,bt}; wherein, if ViMiddle block bjAll the positions of the corresponding keywords are taken as 0, and b isjThe value of the block is 0, otherwise bjThe value of a block is 1, DFiIs a vector with 0/1 values of each bit of the t dimension;
a key generation module for generating an encryption key SK (S, M)1,M2,kf) (ii) a Wherein S is a value of 0/1 per bitRandom vector of, M1And M2Is two nxn invertible matrices, n is the length of the keyword dictionary sequence, kfIs a document encryption key;
a document vector encryption module for encrypting each plaintext document vector V by using the generated encryption key through a secure KNN technologyiEncrypting to obtain corresponding encrypted document vector
Figure BDA0001572489800000051
Wherein, when the jth element S [ j ] in the random vector S]When equal to 0, Vi′+Vi″=ViWhen S [ j ]]When 1, Vi′=Vi″=Vi
A document encryption module, configured to encrypt each document in the document set DS by using a symmetric encryption algorithm to obtain an encrypted document set ES ═ e1,e2,…,em};
And the transmission module is used for transmitting the document filtering vector to the private cloud server for storage, and transmitting the encrypted document vector and the encrypted document set to the public cloud server for storage.
Further, the data retrieval end specifically includes:
a search vector generation module for generating a plurality of keywords { w }according to a user1,w2,…,wxGenerating a retrieval vector Q by adopting a TF-IDF algorithm and a space vector model, and normalizing; wherein the jth element of Q, Q [ j]Is the j-th keyword wjThe inverse document frequency IDF value in the document set DS provided by the data providing end;
a retrieval trapdoor generation module used for generating the retrieval trapdoor by adopting a safe KNN algorithm based on the retrieval vector Q
Figure BDA0001572489800000052
Wherein, when the j bit S [ j ] of random vector in the encryption key generated by the data providing terminal]When equal to 0, Q' [ j ]]=Q″[j]=Q[j]When S [ j ]]When 1, Q' [ j]+Q″[j]=Q[j];
A search filter vector generation module for generating a search filter vector according to the relation in the keyword dictionary sequenceGenerating a search filtering vector QF according to the block condition of the key word, wherein the QF is equal to { b }1,b2,…,btQF is a vector with a t-dimension each bit value of 0/1, if the block is bjAll the corresponding key words in the retrieval vector Q have the value of 0, and then QF [ j]0, otherwise QF [ j]=1;
And the transmission module is used for transmitting the retrieval trapdoor and the number k of the documents to be retrieved by the user to the public cloud server and transmitting the retrieval filtering vector to the private cloud server.
Further, the private cloud server specifically includes:
and an operation module for receiving the retrieval filter vector QF and the document filter vector DF of each documentiRespectively perform AND operation if QF&DFiIf all bits of the vector obtained by the operation (2) are not all 0, then DF is determinediCorresponding document number DidiAdding the document into the candidate document set to obtain a candidate document set CDS ═ d1,d2,…};
And the transmission module is used for sending the candidate document set CDS to the public cloud server.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:
1. has high safety
The invention realizes ciphertext retrieval according to multiple keywords in an untrusted public cloud environment, realizes safe inner product calculation through a safe KNN technology, can realize that an inner product value between two encrypted vectors is equal to an inner product value between two plaintext vectors, does not need to decrypt a retrieval trapdoor in the public cloud environment, does not need to decrypt an encrypted document vector, and even does not need to decrypt an encrypted document. In the public cloud part, the whole process is operated under the ciphertext, and the Top-k result is finally obtained. Therefore, the safe KNN technology can realize the calculation of the Top-k retrieval result according to the multiple keywords and protect the data privacy of the data owner. The secure KNN technique has been widely applied in the field of multi-keyword ciphertext retrieval.
2. High accuracy
The mixed cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method is divided into two steps when a data retrieval end provides interested multi-keywords for retrieval, and comprises the steps of firstly, generating a candidate document set CDS by a private cloud server, and then searching a Top-k result which is most relevant to the interested multi-keywords in the candidate document set by a public cloud server. When the private cloud generates the candidate document set, any document D in the whole document set DSiIf the Top-k document contains 1 or more interesting multi-keywords provided by data users, the Top-k document is added to the candidate document set, so that the Top-k document meeting the condition is not in the candidate document set; when the public cloud server obtains the candidate document set sent by the private cloud server, the Top-k result is obtained according to the inner product calculation result between the encrypted document vector of each document in the candidate document set and the retrieval trapdoor strictly, so that the private cloud server and the public cloud server can perform accurate sequencing on the retrieval result in a cooperative mode and return the Top-k document as the retrieval result to the data retrieval end.
3. The retrieval efficiency is high
The mixed cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method provided by the invention aims at the problem that the efficiency of the searchable encryption method mainly based on the application of the current safe KNN calculation, TF-IDF, space vector models and other technologies is not high, a trusted private cloud server is added, a method for generating document filter vectors by document vector blocks is provided, the document filter vectors are uploaded to the private cloud server, because the dimension of the document filter vectors is small, the private cloud server can obtain a candidate document set through less operation overhead according to the retrieval filter vectors provided by a data user, a large number of irrelevant documents are quickly filtered out (the filtered documents cannot be the final Top-k result), the candidate document set is much smaller than the original document set, and therefore, the public cloud server only needs to perform inner product calculation among a small number of encryption vectors, the computing overhead of the public cloud server can be greatly saved. In addition, in view of the fact that interested multiple keywords input by a user are often related, in order to improve the filtering effect of the private cloud server, the candidate document set is further compressed, the positions of the keywords in the keyword dictionary sequence are not randomly placed, clustering is performed according to the keyword correlation, then multiple sub-clusters are obtained, and the keywords in each sub-cluster are located in the same block in the keyword dictionary sequence.
Drawings
FIG. 1 is an architecture diagram of a hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method provided by the present invention;
FIG. 2 is a schematic flow chart of a hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method provided by the invention;
FIG. 3 is a schematic diagram of a keyword dictionary sequence constructed by clustering keywords into 10 small clusters, where the corresponding keyword dictionary sequence is 10 blocks, and each block contains an indefinite number of keywords and has the same number of keywords as the corresponding small clusters;
FIG. 4 is a schematic illustration of a document vector and a document filter vector, and a retrieval vector and a retrieval filter vector before normalization processing;
FIG. 5 is a schematic diagram of a retrieval process, in which first a private cloud server obtains a candidate document set by an AND operation between a document filter vector and a retrieval filter vector, and then sends the candidate document set to a public cloud server; and the public cloud server obtains the Top-k document by calculating the relevancy score between the document vector and the retrieval vector in the candidate document set. For simplicity of drawing, the document vector and the retrieval vector are not subjected to normalization and encryption processes.
Detailed Description
Example 1
The embodiment provides a hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method, as shown in fig. 1 and fig. 2, including the following steps:
(1) the data providing end extracts a keyword set from the provided document set and generates a keyword dictionary sequence through clustering and partitioning; generating a corresponding plaintext document vector for each document in the document set according to the keyword dictionary sequence, and blocking the plaintext document vectors according to the blocking condition of the keyword dictionary sequence to form document filtering vectors; encrypting the plaintext document vector to form an encrypted document vector, and encrypting each document in the document set to form an encrypted document set; and finally, transmitting the document filtering vector to a private cloud server, and transmitting the encrypted document vector and the encrypted document set to a public cloud server.
The method specifically comprises the following steps:
(1-1) extracting keywords from the provided document set DS by the data providing terminal to obtain a keyword set { w1,w2,…,wn};
(1-2) clustering the keywords in the keyword set according to the correlation relationship to obtain a plurality of clustering sub-clusters { c1,c2,…,ct};
(1-3) taking each sub-cluster as a block, thereby obtaining t blocks, b1,b2,…,btAnd generating a keyword dictionary sequence W ═ { W (b) according to the blocks1,1),w(b1,2),…,w(b2,1),w(b2,2),…,w(bt,1),w(bt2), … }, wherein w (b)jX) denotes belonging to the partition bjThe xth keyword in each block is unordered; block bj={w(bj,x)|0<x≤|bjL }; because of the clustering property of keywords, keywords having strong correlation in a dictionary sequence of keywords are clustered in the same block. For example, in fig. 3, the keyword sets are clustered together into 10 small natural clusters, and the number of keywords in each cluster is not fixed, so that the keyword dictionary includes 10 keyword blocks, the number of keywords included in each keyword block is the same as the number of keywords included in the corresponding cluster, and then a keyword dictionary sequence is generated according to the blocks;
(1-4) adopting TF-IDF algorithm and space vector model according to keyword wordsThe position of the keyword in the dictionary sequence is the document set DS ═ D1,D2,…,DmEvery document D iniGenerating a corresponding plaintext document vector ViAnd carrying out normalization processing; wherein, ViIs n, each bit takes the value of the key word corresponding to the bit in the document DiThe word frequency TF value of (1);
(1-5) according to the block situation of the keyword dictionary sequence, the plaintext document vector ViDividing the document into t blocks, wherein the block boundaries are the same as those of the keyword dictionary sequence, and obtaining each document DiDocument filter vector DF ofi={b1,b2,…,bt}; wherein, if ViMiddle block bjAll the positions of the corresponding keywords are taken as 0, and b isjThe value of the block is 0, otherwise bjThe value of a block is 1, DFiIs a vector with 0/1 values of each bit of the t dimension; for example, in fig. 4, a specific example of a document vector and a corresponding document filter vector is given, and in view of drawing simplicity, the document vector is not normalized, and the document is DiCorresponding document vector is ViThe document filtering vector DF is formed according to the position of the block boundary in the keyword dictionary sequenceiAs shown in fig. 4;
(1-6) generating an encryption Key SK (S, M)1,M2,kf) (ii) a Wherein S is an n-dimensional random column vector with 0/1 values per bit, and M is1And M2Is two nxn invertible matrices, n is the length of the keyword dictionary sequence, kfIs a document encryption key; SK only provides DO, DU usage, privacy to CS.
(1-7) pairing each plaintext document vector V with the generated encryption key by the secure KNN techniqueiEncrypting to obtain corresponding encrypted document vector
Figure BDA0001572489800000081
Wherein, when the jth element S [ j ] in the random vector S]When equal to 0, Vi′+Vi″=ViWhen S [ j ]]When 1, Vi′=Vi″=Vi
(1-8) encrypting each document in the document set DS through a symmetric encryption algorithm to obtain an encrypted document set ES ═ e1,e2,…,em};
And (1-9) transmitting the document filtering vector to a private cloud server for storage, and transmitting the encrypted document vector and the encrypted document set to a public cloud server for storage.
(2) The data retrieval end generates retrieval vectors according to a plurality of keywords provided by a user, generates a retrieval trapdoor by adopting a security algorithm after normalization, and transmits the retrieval trapdoor and the number k of documents to be retrieved by the user to a public cloud server; and generating a retrieval filtering vector for a plurality of keywords provided by the user according to the blocking condition of the keywords in the keyword dictionary sequence, and transmitting the retrieval filtering vector to the private cloud server.
The method specifically comprises the following steps:
(2-1) the data retrieval end provides a plurality of keywords w according to the user1,w2,…,wxGenerating a retrieval vector Q by adopting a TF-IDF algorithm and a space vector model, and normalizing; wherein the jth element of Q, Q [ j]Is the j-th keyword wjThe inverse document frequency IDF value in the document set DS provided by the data providing end;
(2-2) generating a retrieval trapdoor by adopting a safe KNN algorithm based on the retrieval vector Q
Figure BDA0001572489800000082
Wherein, when the j bit S [ j ] of random vector in the encryption key generated by the data providing terminal]When equal to 0, Q' [ j ]]=Q″[j]=Q[j]When S [ j ]]When 1, Q' [ j]+Q″[j]=Q[j];
(2-3) generating a retrieval filtering vector QF according to the block condition of the keywords in the keyword dictionary sequence, wherein the QF is equal to { b }1,b2,…,btQF is a vector with a t-dimension each bit value of 0/1, if the block is bjAll the corresponding key words in the retrieval vector Q have the value of 0, and then QF [ j]0, otherwise QF [ j]=1;
And (2-4) transmitting the retrieval trapdoor and the number k of the documents to be retrieved by the user to a public cloud server, and transmitting the retrieval filtering vector to a private cloud server.
(3) And respectively performing AND operation on the received retrieval filtering vector and the document filtering vector of each document by the private cloud server, if all the bits of the vector obtained by the operation are not all 0, adding the corresponding document number to the candidate document set, and transmitting the candidate document set to the public cloud server.
The method specifically comprises the following steps:
(3-1) the private cloud server filtering the received retrieval filtering vector QF and the document filtering vector DF of each documentiRespectively perform AND operation if QF&DFiIf all bits of the vector obtained by the operation (2) are not all 0, then DF is determinediCorresponding document number DidiAdding the document into the candidate document set to obtain a candidate document set CDS ═ d1,d2,…};
And (3-2) sending the candidate document set CDS to a public cloud server. Fig. 5 shows a specific query example, where the private cloud server finds document numbers corresponding to document filter vectors whose operation results are not all 0 by performing and operation on the retrieval filter vector and the document filter vector, so as to obtain a candidate document set CDS { Did ═1,Did5,Did6And then sending the CDS to a public cloud server.
(4) The public cloud server respectively calculates a security inner product between an encrypted document vector corresponding to each document in the candidate document set and the retrieval trapdoor according to the received candidate document set, the retrieval trapdoor and the number k of the retrieved documents, selects k ciphertext documents most relevant to the keywords provided by the user in the candidate document set according to the security inner product, and returns the k ciphertext documents to the data retrieval end.
For example, in fig. 5, the public cloud server receives CDS (CDS) { Did) } as a candidate document set sent by the private cloud server1,Did5,Did6At this point the search space for Top-k documents is not already the corpus DS ═ D1,D2,…,D10Becomes a candidate document set { D }1,D5,D6The search space is changed from the original 10 documents to the current 3 documents, so that the inner product between the vectors is calculated for 3 times onlyAnd (4) calculating, namely calculating the dot product between the encrypted document vector and the encrypted retrieval vector corresponding to each document one by one in the candidate document set to obtain 3 relevancy scores, selecting the largest k corresponding encrypted documents, and returning the k corresponding encrypted documents to the data retrieval end.
(5) And the data retrieval end decrypts the received k ciphertext documents by adopting the symmetric key to obtain the most relevant k plaintext documents.
Example 2
The embodiment provides a mixed cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval system, which comprises a data providing end, a data retrieving end, a private cloud server and a public cloud server, wherein:
the data providing end is used for extracting a keyword set from the provided document set and generating a keyword dictionary sequence through clustering and partitioning; generating a corresponding plaintext document vector for each document in the document set according to the keyword dictionary sequence, and blocking the plaintext document vectors according to the blocking condition of the keyword dictionary sequence to form document filtering vectors; encrypting the plaintext document vector to form an encrypted document vector, and encrypting each document in the document set to form an encrypted document set; transmitting the document filtering vector to a private cloud server, and transmitting the encrypted document vector and the encrypted document set to a public cloud server;
the data retrieval end is used for generating retrieval vectors according to a plurality of keywords provided by a user, generating a retrieval trapdoor by adopting a security algorithm after normalization, and transmitting the retrieval trapdoor and the number k of documents to be retrieved by the user to the public cloud server; generating a retrieval filtering vector for a plurality of keywords provided by a user according to the blocking condition of the keywords in the keyword dictionary sequence, and transmitting the retrieval filtering vector to the private cloud server;
the private cloud server is used for respectively carrying out AND operation on the received retrieval filtering vector and the document filtering vector of each document, if all bits of the vector obtained by the operation are not all 0, adding the corresponding document number to the candidate document set, and transmitting the candidate document set to the public cloud server;
the public cloud server is used for respectively calculating a security inner product between an encrypted document vector corresponding to each document in the candidate document set and the retrieval trapdoor according to the received candidate document set, the retrieval trapdoor and the number k of the retrieved documents, selecting k ciphertext documents most relevant to the keywords provided by the user in the candidate document set according to the security inner product, and returning the k ciphertext documents to the data retrieval end;
and the data retrieval end is also used for decrypting the received k ciphertext documents to obtain the most relevant k plaintext documents.
Further, the data providing end specifically includes:
a keyword extraction module for extracting keywords from the provided document set DS to obtain a keyword set { w1,w2,…,wn};
A clustering module for clustering the keywords in the keyword set according to the correlation relationship to obtain a plurality of clustering sub-clusters { c1,c2,…,ct};
A keyword dictionary generating module for using each sub-cluster as a block to obtain t blocks, b1,b2,…,btAnd generating a keyword dictionary sequence W ═ { W (b) according to the blocks1,1),w(b1,2),…,w(b2,1),w(b2,2),…,w(bt,1),w(bt2), … }; wherein w (b)jX) denotes belonging to the partition bjThe xth keyword in each block is unordered; block bj={w(bj,x)|0<x≤|bj|};
A plaintext document vector generation module for generating a document set DS (D) according to the position of the keyword in the keyword dictionary sequence by adopting a TF-IDF algorithm and a space vector model1,D2,…,DmEvery document D iniGenerating a corresponding plaintext document vector ViAnd carrying out normalization processing; wherein, ViIs n, each bit takes the value of the key word corresponding to the bit in the document DiThe word frequency TF value of (1);
a document filtering vector generating module for generating a document filtering vector according to the block condition of the keyword dictionary sequence,vector V of plaintext documentiDividing the document into t blocks, wherein the block boundaries are the same as those of the keyword dictionary sequence, and obtaining each document DiDocument filter vector DF ofi={b1,b2,…,bt}; wherein, if ViMiddle block bjAll the positions of the corresponding keywords are taken as 0, and b isjThe value of the block is 0, otherwise bjThe value of a block is 1, DFiIs a vector with 0/1 values of each bit of the t dimension;
a key generation module for generating an encryption key SK (S, M)1,M2,kf) (ii) a Wherein S is a random vector with 0/1 values of each bit, and M is1And M2Is two nxn invertible matrices, n is the length of the keyword dictionary sequence, kfIs a document encryption key;
a document vector encryption module for encrypting each plaintext document vector V by using the generated encryption key through a secure KNN technologyiEncrypting to obtain corresponding encrypted document vector
Figure BDA0001572489800000101
Wherein, when the jth element S [ j ] in the random vector S]When equal to 0, Vi′+Vi″=ViWhen S [ j ]]When 1, Vi′=Vi″=Vi
A document encryption module, configured to encrypt each document in the document set DS by using a symmetric encryption algorithm to obtain an encrypted document set ES ═ e1,e2,…,em};
And the transmission module is used for transmitting the document filtering vector to the private cloud server for storage, and transmitting the encrypted document vector and the encrypted document set to the public cloud server for storage.
Further, the data retrieval end specifically includes:
a search vector generation module for generating a plurality of keywords { w }according to a user1,w2,…,wxGenerating a retrieval vector Q by adopting a TF-IDF algorithm and a space vector model, and normalizing; wherein the jth element of Q, Q [ j]Is the j-th bitKeyword wjThe inverse document frequency IDF value in the document set DS provided by the data providing end;
a retrieval trapdoor generation module used for generating the retrieval trapdoor by adopting a safe KNN algorithm based on the retrieval vector Q
Figure BDA0001572489800000111
Wherein, when the j bit S [ j ] of random vector in the encryption key generated by the data providing terminal]When equal to 0, Q' [ j ]]=Q″[j]=Q[j]When S [ j ]]When 1, Q' [ j]+Q″[j]=Q[j];
A search filtering vector generating module for generating a search filtering vector QF according to the block condition of the keywords in the keyword dictionary sequence, wherein the QF is ═ b1,b2,…,btQF is a vector with a t-dimension each bit value of 0/1, if the block is bjAll the corresponding key words in the retrieval vector Q have the value of 0, and then QF [ j]0, otherwise QF [ j]=1;
And the transmission module is used for transmitting the retrieval trapdoor and the number k of the documents to be retrieved by the user to the public cloud server and transmitting the retrieval filtering vector to the private cloud server.
Further, the private cloud server specifically includes:
and an operation module for receiving the retrieval filter vector QF and the document filter vector DF of each documentiRespectively perform AND operation if QF&DFiIf all bits of the vector obtained by the operation (2) are not all 0, then DF is determinediCorresponding document number DidiAdding the document into the candidate document set to obtain a candidate document set CDS ═ d1,d2,…};
And the transmission module is used for sending the candidate document set CDS to the public cloud server.
The system corresponds to the method of embodiment 1 one to one, and other parts are not described again, so that reference may be made to embodiment 1.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the appended claims are to be accorded the full scope of the invention.

Claims (8)

1. A mixed cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method is characterized by comprising the following steps:
(1) the data providing end extracts a keyword set from the provided document set and generates a keyword dictionary sequence through clustering and partitioning; generating a corresponding plaintext document vector for each document in the document set according to the keyword dictionary sequence, and blocking the plaintext document vectors according to the blocking condition of the keyword dictionary sequence to form document filtering vectors; encrypting the plaintext document vector to form an encrypted document vector, and encrypting each document in the document set to form an encrypted document set; finally, the document filtering vector is transmitted to a private cloud server, and the encrypted document vector and the encrypted document set are transmitted to a public cloud server;
(2) the data retrieval end generates retrieval vectors according to a plurality of keywords provided by a user, generates a retrieval trapdoor by adopting a security algorithm after normalization, and transmits the retrieval trapdoor and the number k of documents to be retrieved by the user to a public cloud server; generating a retrieval filtering vector for a plurality of keywords provided by a user according to the blocking condition of the keywords in the keyword dictionary sequence, and transmitting the retrieval filtering vector to a private cloud server;
(3) the private cloud server respectively performs AND operation on the received retrieval filtering vector and the document filtering vector of each document, if all bits of the vector obtained through operation are not all 0, the corresponding document number is added to the candidate document set, and the candidate document set is transmitted to the public cloud server;
(4) the public cloud server respectively calculates a security inner product between an encrypted document vector corresponding to each document in the candidate document set and the retrieval trapdoor according to the received candidate document set, the retrieval trapdoor and the number k of the retrieved documents, selects k ciphertext documents most relevant to the keywords provided by the user in the candidate document set according to the security inner product, and returns the k ciphertext documents to the data retrieval end;
(5) and the data retrieval end decrypts the received k ciphertext documents to obtain the most relevant k plaintext documents.
2. The hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method according to claim 1, wherein: the step (1) specifically comprises the following steps:
(1-1) extracting keywords from the provided document set DS by the data providing terminal to obtain a keyword set { w1,w2,…,wnN is the number of keywords;
(1-2) clustering the keywords in the keyword set according to the correlation relationship to obtain a plurality of clustering sub-clusters { c1,c2,…,ct};
(1-3) taking each sub-cluster as a block, thereby obtaining t blocks, b1,b2,…,btAnd generating a keyword dictionary sequence W ═ { W (b) according to the blocks1,1),w(b1,2),…,w(b2,1),w(b2,2),…,w(bt,1),w(bt2), … }, wherein w (b)jX) denotes belonging to the partition bjThe xth keyword in each block is unordered; block bj={w(bj,x)|0<x≤|bj|};
(1-4) adopting TF-IDF algorithm and space vector model, and according to the positions of keywords in the keyword dictionary sequence, obtaining the document set DS ═ D1,D2,…,DmEvery document D iniGenerating a corresponding plaintext document vector ViAnd carrying out normalization processing; where m is the number of documents in the document set DS, ViIs n, each bit takes the value of the key word corresponding to the bit in the document DiThe word frequency TF value of (1);
(1-5) according to the block situation of the keyword dictionary sequence, the plaintext document vector ViDividing the document into t blocks, wherein the block boundaries are the same as those of the keyword dictionary sequence, and obtaining each document DiDocument filter vector DF ofi={b1,b2,…,bt}; wherein, if ViMiddle block bjAll the positions of the corresponding keywords are taken as 0, and b isjThe value of the block is 0, otherwise bjThe value of a block is 1, DFiIs a vector with 0/1 values of each bit of the t dimension;
(1-6) generating an encryption Key SK (S, M)1,M2,kf) (ii) a Wherein S is a random vector with 0/1 values of each bit, and M is1And M2Is two nxn invertible matrices, n is the length of the keyword dictionary sequence, kfIs a document encryption key;
(1-7) pairing each plaintext document vector V with the generated encryption key by the secure KNN techniqueiEncrypting to obtain corresponding encrypted document vector
Figure FDA0002885250170000022
Wherein, when the jth element S [ j ] in the random vector S]When equal to 0, Vi′+Vi″=ViWhen S [ j ]]When 1, Vi′=Vi″=Vi
(1-8) encrypting each document in the document set DS through a symmetric encryption algorithm to obtain an encrypted document set ES ═ e1,e2,…,em};
And (1-9) transmitting the document filtering vector to a private cloud server for storage, and transmitting the encrypted document vector and the encrypted document set to a public cloud server for storage.
3. The hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method according to claim 1, wherein: the step (2) specifically comprises the following steps:
(2-1) the data retrieval end provides a plurality of keywords w according to the user1,w2,…,wxGenerating a retrieval vector Q by adopting a TF-IDF algorithm and a space vector model, and normalizing; wherein the jth element of Q, Q [ j]Is the j-th keyword wjThe inverse document frequency IDF value in the document set DS provided by the data providing end;
(2-2) generating a retrieval trapdoor by adopting a safe KNN algorithm based on the retrieval vector Q
Figure FDA0002885250170000021
Wherein when the data providing end generatesBit j of random vector in encryption key]When equal to 0, Q' [ j ]]=Q″[j]=Q[j]When S [ j ]]When 1, Q' [ j]+Q″[j]=Q[j];
(2-3) generating a retrieval filtering vector QF according to the block condition of the keywords in the keyword dictionary sequence, wherein the QF is equal to { b }1,b2,…,btQF is a vector with a t-dimension each bit value of 0/1, if the block is bjAll the corresponding key words in the retrieval vector Q have the value of 0, and then QF [ j]0, otherwise QF [ j]=1;
And (2-4) transmitting the retrieval trapdoor and the number k of the documents to be retrieved by the user to a public cloud server, and transmitting the retrieval filtering vector to a private cloud server.
4. The hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval method according to claim 1, wherein: the step (3) specifically comprises the following steps:
(3-1) the private cloud server filtering the received retrieval filtering vector QF and the document filtering vector DF of each documentiRespectively perform AND operation if QF&DFiIf all bits of the vector obtained by the operation (2) are not all 0, then DF is determinediCorresponding document number DidiAdding the document into the candidate document set to obtain a candidate document set CDS ═ d1,d2,…};
And (3-2) sending the candidate document set CDS to a public cloud server.
5. A mixed cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval system is characterized by comprising a data providing end, a data retrieval end, a private cloud server and a public cloud server, wherein:
the data providing end is used for extracting a keyword set from the provided document set and generating a keyword dictionary sequence through clustering and partitioning; generating a corresponding plaintext document vector for each document in the document set according to the keyword dictionary sequence, and blocking the plaintext document vectors according to the blocking condition of the keyword dictionary sequence to form document filtering vectors; encrypting the plaintext document vector to form an encrypted document vector, and encrypting each document in the document set to form an encrypted document set; transmitting the document filtering vector to a private cloud server, and transmitting the encrypted document vector and the encrypted document set to a public cloud server;
the data retrieval end is used for generating retrieval vectors according to a plurality of keywords provided by a user, generating a retrieval trapdoor by adopting a security algorithm after normalization, and transmitting the retrieval trapdoor and the number k of documents to be retrieved by the user to the public cloud server; generating a retrieval filtering vector for a plurality of keywords provided by a user according to the blocking condition of the keywords in the keyword dictionary sequence, and transmitting the retrieval filtering vector to the private cloud server;
the private cloud server is used for respectively carrying out AND operation on the received retrieval filtering vector and the document filtering vector of each document, if all bits of the vector obtained by the operation are not all 0, adding the corresponding document number to the candidate document set, and transmitting the candidate document set to the public cloud server;
the public cloud server is used for respectively calculating a security inner product between an encrypted document vector corresponding to each document in the candidate document set and the retrieval trapdoor according to the received candidate document set, the retrieval trapdoor and the number k of the retrieved documents, selecting k ciphertext documents most relevant to the keywords provided by the user in the candidate document set according to the security inner product, and returning the k ciphertext documents to the data retrieval end;
and the data retrieval end is also used for decrypting the received k ciphertext documents to obtain the most relevant k plaintext documents.
6. The hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval system according to claim 5, wherein: the data providing end specifically comprises:
a keyword extraction module for extracting keywords from the provided document set DS to obtain a keyword set { w1,w2,…,wnN is the number of keywords;
a clustering module for clustering the keywords in the keyword set according to the correlation relationship to obtain a plurality of clustering sub-clusters { c1,c2,…,ct};
A keyword dictionary generating module for using each sub-cluster as a block to obtain t blocks, b1,b2,…,btAnd generating a keyword dictionary sequence W ═ { W (b) according to the blocks1,1),w(b1,2),…,w(b2,1),w(b2,2),…,w(bt,1),w(bt2), … }; wherein w (b)jX) denotes belonging to the partition bjThe xth keyword in each block is unordered; block bj={w(bj,x)|0<x≤|bj|};
A plaintext document vector generation module for generating a document set DS (D) according to the position of the keyword in the keyword dictionary sequence by adopting a TF-IDF algorithm and a space vector model1,D2,…,DmEvery document D iniGenerating a corresponding plaintext document vector ViAnd carrying out normalization processing; where m is the number of documents in the document set DS, ViIs n, each bit takes the value of the key word corresponding to the bit in the document DiThe word frequency TF value of (1);
a document filtering vector generation module for generating a plaintext document vector V according to the block condition of the keyword dictionary sequenceiDividing the document into t blocks, wherein the block boundaries are the same as those of the keyword dictionary sequence, and obtaining each document DiDocument filter vector DF ofi={b1,b2,…,bt}; wherein, if ViMiddle block bjAll the positions of the corresponding keywords are taken as 0, and b isjThe value of the block is 0, otherwise bjThe value of a block is 1, DFiIs a vector with 0/1 values of each bit of the t dimension;
a key generation module for generating an encryption key SK (S, M)1,M2,kf) (ii) a Wherein S is a random vector with 0/1 values of each bit, and M is1And M2Is two nxn invertible matrices, n is the length of the keyword dictionary sequence, kfIs a document encryption key;
a document vector encryption module for employing the generated encryption by secure KNN techniqueKey pair vector V for each plaintext documentiEncrypting to obtain corresponding encrypted document vector
Figure FDA0002885250170000041
Wherein, when the jth element S [ j ] in the random vector S]When equal to 0, Vi′+Vi″=ViWhen S [ j ]]When 1, Vi′=Vi″=Vi
A document encryption module, configured to encrypt each document in the document set DS by using a symmetric encryption algorithm to obtain an encrypted document set ES ═ e1,e2,…,em};
And the transmission module is used for transmitting the document filtering vector to the private cloud server for storage, and transmitting the encrypted document vector and the encrypted document set to the public cloud server for storage.
7. The hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval system according to claim 5, wherein: the data retrieval end specifically comprises:
a search vector generation module for generating a plurality of keywords { w }according to a user1,w2,…,wxGenerating a retrieval vector Q by adopting a TF-IDF algorithm and a space vector model, and normalizing; wherein the jth element of Q, Q [ j]Is the j-th keyword wjThe inverse document frequency IDF value in the document set DS provided by the data providing end;
a retrieval trapdoor generation module used for generating the retrieval trapdoor by adopting a safe KNN algorithm based on the retrieval vector Q
Figure FDA0002885250170000042
Wherein, when the j bit S [ j ] of random vector in the encryption key generated by the data providing terminal]When equal to 0, Q' [ j ]]=Q″[j]=Q[j]When S [ j ]]When 1, Q' [ j]+Q″[j]=Q[j];
A search filtering vector generating module for generating a search filtering vector QF according to the block condition of the keywords in the keyword dictionary sequence, wherein the QF is ═ b1,b2,…,btQF is a vector with a t-dimension each bit value of 0/1, if the block is bjAll the corresponding key words in the retrieval vector Q have the value of 0, and then QF [ j]0, otherwise QF [ j]=1;
And the transmission module is used for transmitting the retrieval trapdoor and the number k of the documents to be retrieved by the user to the public cloud server and transmitting the retrieval filtering vector to the private cloud server.
8. The hybrid cloud-oriented privacy protection multi-keyword Top-k ciphertext retrieval system according to claim 5, wherein: the private cloud server specifically includes:
and an operation module for receiving the retrieval filter vector QF and the document filter vector DF of each documentiRespectively perform AND operation if QF&DFiIf all bits of the vector obtained by the operation (2) are not all 0, then DF is determinediCorresponding document number DidiAdding the document into the candidate document set to obtain a candidate document set CDS ═ d1,d2,…};
And the transmission module is used for sending the candidate document set CDS to the public cloud server.
CN201810122376.8A 2018-02-07 2018-02-07 Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud Active CN108363689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810122376.8A CN108363689B (en) 2018-02-07 2018-02-07 Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810122376.8A CN108363689B (en) 2018-02-07 2018-02-07 Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud

Publications (2)

Publication Number Publication Date
CN108363689A CN108363689A (en) 2018-08-03
CN108363689B true CN108363689B (en) 2021-03-19

Family

ID=63005057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810122376.8A Active CN108363689B (en) 2018-02-07 2018-02-07 Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud

Country Status (1)

Country Link
CN (1) CN108363689B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109194666B (en) * 2018-09-18 2021-06-01 东北大学 LBS-based security kNN query method
CN109271485B (en) * 2018-09-19 2022-03-08 南京邮电大学 Cloud environment encrypted document sequencing and searching method supporting semantics
CN109739945B (en) * 2018-12-13 2022-11-08 南京邮电大学 Multi-keyword ciphertext sorting and searching method based on mixed index
CN110727951B (en) * 2019-10-14 2021-08-27 桂林电子科技大学 Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function
CN110895611B (en) * 2019-11-26 2021-04-02 支付宝(杭州)信息技术有限公司 Data query method, device, equipment and system based on privacy information protection
CN112597268B (en) * 2020-12-22 2022-09-20 南京邮电大学 Retrieval filtering threshold value selection method for cloud environment ciphertext retrieval efficiency optimization
CN114189391B (en) * 2022-02-14 2022-04-29 浙江易天云网信息科技有限公司 Privacy data control and management method suitable for hybrid cloud
CN116521743A (en) * 2023-06-27 2023-08-01 北京中科江南信息技术股份有限公司 Ciphertext retrieval method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765848A (en) * 2015-04-17 2015-07-08 中国人民解放军空军航空大学 Symmetrical searchable encryption method for supporting result high-efficiency sequencing in hybrid cloud storage
CN105681280A (en) * 2015-12-29 2016-06-15 西安电子科技大学 Searchable encryption method based on Chinese in cloud environment
CN106815350A (en) * 2017-01-19 2017-06-09 安徽大学 Dynamic ciphertext multi-key word searches for method generally in a kind of cloud environment
CN107634829A (en) * 2017-09-12 2018-01-26 南京理工大学 Encrypted electronic medical records system and encryption method can search for based on attribute

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012049679A (en) * 2010-08-25 2012-03-08 Sony Corp Terminal apparatus, server, data processing system, data processing method and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765848A (en) * 2015-04-17 2015-07-08 中国人民解放军空军航空大学 Symmetrical searchable encryption method for supporting result high-efficiency sequencing in hybrid cloud storage
CN105681280A (en) * 2015-12-29 2016-06-15 西安电子科技大学 Searchable encryption method based on Chinese in cloud environment
CN106815350A (en) * 2017-01-19 2017-06-09 安徽大学 Dynamic ciphertext multi-key word searches for method generally in a kind of cloud environment
CN107634829A (en) * 2017-09-12 2018-01-26 南京理工大学 Encrypted electronic medical records system and encryption method can search for based on attribute

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
公共云存储服务数据安全及隐私保护技术综述;李晖 等;《计算机研究与发展》;20140731;全文 *
面向云环境的多关键词密文排序检索研究综述;戴华 等;《计算机科学》;20190131;全文 *

Also Published As

Publication number Publication date
CN108363689A (en) 2018-08-03

Similar Documents

Publication Publication Date Title
CN108363689B (en) Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud
US11567950B2 (en) System and method for confidentiality-preserving rank-ordered search
Yuan et al. SEISA: Secure and efficient encrypted image search with access control
Zhang et al. SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems
US8819408B2 (en) Document processing method and system
US9197613B2 (en) Document processing method and system
CN108959567B (en) Safe retrieval method suitable for large-scale images in cloud environment
Dai et al. A privacy-preserving multi-keyword ranked search over encrypted data in hybrid clouds
CN115314295B (en) Block chain-based searchable encryption technical method
Al Sibahee et al. Efficient encrypted image retrieval in IoT-cloud with multi-user authentication
Boucenna et al. Secure inverted index based search over encrypted cloud data with user access rights management
Gong et al. A privacy-preserving image retrieval method based on improved bovw model in cloud environment
CN113779597B (en) Method, device, equipment and medium for storing and similar searching of encrypted document
CN109740378B (en) Security pair index structure resisting keyword privacy disclosure and retrieval method thereof
Ren et al. Privacy-preserving ranked multi-keyword search leveraging polynomial function in cloud computing
EP2775420A1 (en) Semantic search over encrypted data
Zhang et al. A verifiable and dynamic multi-keyword ranked search scheme over encrypted cloud data with accuracy improvement
Li et al. Paillier-based fuzzy multi-keyword searchable encryption scheme with order-preserving
CN111966778B (en) Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index
CN113158245A (en) Method, system, equipment and readable storage medium for searching document
Manasrah et al. A privacy-preserving multi-keyword search approach in cloud computing
Huang et al. Efficient privacy-preserving content-based image retrieval in the cloud
Salmani et al. Leakless privacy-preserving multi-keyword ranked search over encrypted cloud data
CN113158209A (en) Top-k query why-not problem processing method for protecting privacy
Xu et al. Achieving fine-grained multi-keyword ranked search over encrypted cloud data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180803

Assignee: NUPT INSTITUTE OF BIG DATA RESEARCH AT YANCHENG

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2021980013920

Denomination of invention: Hybrid cloud oriented privacy protection multi keyword Top-k ciphertext retrieval method and system

Granted publication date: 20210319

License type: Common License

Record date: 20211202