CN109063509A - It is a kind of that encryption method can search for based on keywords semantics sequence - Google Patents

It is a kind of that encryption method can search for based on keywords semantics sequence Download PDF

Info

Publication number
CN109063509A
CN109063509A CN201810890114.6A CN201810890114A CN109063509A CN 109063509 A CN109063509 A CN 109063509A CN 201810890114 A CN201810890114 A CN 201810890114A CN 109063509 A CN109063509 A CN 109063509A
Authority
CN
China
Prior art keywords
document
vector
keyword
cloud server
owned cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810890114.6A
Other languages
Chinese (zh)
Inventor
肖婷婷
韩德志
王军
毕坤
段乐天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201810890114.6A priority Critical patent/CN109063509A/en
Publication of CN109063509A publication Critical patent/CN109063509A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

Encryption method can search for based on keywords semantics sequence the invention discloses a kind of, comprise the steps of: that data owner generates encryption key and is sent to authorized user, then extracting keywords are concentrated from plain text document, document markup vector is constructed according to keyword and is sent to privately owned Cloud Server, and creation Security Index tree is sent to publicly-owned Cloud Server;The inquiry label vector that search key generates is sent to privately owned Cloud Server by authorized user, and the inquiry trapdoor of generation is sent to publicly-owned Cloud Server;Privately owned Cloud Server matches inquiry label vector with document markup vector, it would be possible to which the Security Index set of identifiers for meeting user query request is sent to publicly-owned Cloud Server;Publicly-owned Cloud Server calculates Documents Similarity, sorts to calculated result, and the highest ciphertext document sets of similarity are returned to authorized user.The present invention while ensureing that data-privacy is safe, effectively improves recall precision by realizing multi-key word Semantic Ranking method.

Description

It is a kind of that encryption method can search for based on keywords semantics sequence
Technical field
It is the present invention relates to cloud security field, in particular to a kind of that encryption method can search for based on keywords semantics sequence.
Background technique
With the rapid development of cloud, the data of oneself are contracted out to Cloud Server by more and more users, but cloud Server also produces very big prestige while providing the data storage service of high quality for user to the data-privacy of user The side of body.In order to guarantee that the sensitive data of user is not leaked, data owner adopts mostly before uploading the data to Cloud Server With cryptographic technique, data encryption is stored, then, some search techniques about encryption data are come into being.By data encryption After be stored in Cloud Server, either Cloud Server administrator or external attacker can not obtain data true content, Protect the privacy of user.But this brings great challenge to the information retrieval of ciphertext data, especially in untrusted environment Under, the search plan how to provide highly effective and safe to ciphertext causes common concern.
It is to each word in document using symmetric key using the double-deck knot when can search for encryption technology and being suggested earliest Structure encryption, but the mode efficiency of full-text search is too low, rear being suggested the public key based on keyword can search for encryption system again, make Obtaining any user for possessing public key can be to server storing data, and the user for only possessing private key just may search for key Word, but this structure based on Bilinear map makes recall precision very low, is not suitable for the cloud computing environment of large-scale data.
In order to more preferably meet searching request, keyword is searched for scheme generally and is suggested, and by constructing keyword fuzzy set, comes Input misspelling and format inconsistent situation when tolerating user's search, but a disadvantage is that can not search for and keywords semantics phase The document of pass.
Although some encryption methods are supported semantic ambiguity search, multi-key word search, are searched parallel in existing research method The search methods such as rope, similarity retrieval, but semantic similarity cannot be participated in the scoring of document.
Summary of the invention
Encryption method can search for based on keywords semantics sequence the purpose of the present invention is to provide a kind of, in band keyword On the basis of semantic ambiguity search, by being extracted to keyword, index vector is constructed, uses keywords semantics as document Feature realizes multi-key word sequence searching ciphertext, allows users to be quickly found out most related data, alleviate network flow The expense of aspect improves the accuracy of search.In addition, including a public affairs that is, in system model by using mixing cloud model There are Cloud Server and a privately owned Cloud Server, maximally utilise server resource, reduces the computing cost of terminal, simultaneously Make model safety with higher.
In order to reach the goals above, encryption method can search for based on keywords semantics sequence the present invention provides a kind of, The method includes the steps of:
S1, data owner concentrate extracting keywords from plain text document, obtain keyword set, it is bright to regenerate an encryption The key SK of literary document sets, and the key SK of the encrypting plaintext document sets of generation is sent to authorized user;
S2, data owner construct document markup vector according to keyword set and create Security Index tree, then will be literary Shelves label vector is sent to privately owned Cloud Server, and Security Index tree is sent to publicly-owned Cloud Server;
S3, data owner encrypt plain text document collection using key SK, obtain ciphertext document sets, should by what is obtained Ciphertext document sets are sent to publicly-owned Cloud Server;
S4, authorized user input the keyword set to be searched for, obtain inquiry label according to the keyword set to be searched for Vector sum trapdoor, is then sent to privately owned Cloud Server for the inquiry label vector, which is sent to publicly-owned Cloud Server;
The document markup of inquiry label vector and data owner's transmission that S5, privately owned Cloud Server send authorized user Vector is matched, and obtaining may be comprising the candidate index set of identifiers of keyword, then by the candidate index identifier collection Conjunction is sent to publicly-owned Cloud Server;
S6, publicly-owned Cloud Server receive the candidate index set of identifiers that privately owned Cloud Server is sent, according to candidate rope The Security Index tree for drawing set of identifiers and data owner's transmission uses corresponding encrypted document index vector sum authorization The trapdoor that family is sent calculates the similarity score of document and sequence, returns to k ciphertext documents before authorized user, wherein safety cable Each leaf node for drawing tree corresponds to a document index vector, and the document index vector stored in leaf node is per one-dimensional Value be TF value that the dimensional vector corresponds to keyword;Wherein, k is the ciphertext number of documents for meeting authorized user's demand, the TF value Indicate the frequency that a certain given keyword occurs in a document;
S7, authorized user use key SK, and the preceding k ciphertext documents returned to publicly-owned Cloud Server are decrypted, and obtain Corresponding plain text document.
Preferably, following steps are further included in the step S1:
S1.1, data owner are from plain text document collection F=(f1,f2,…,fm) in extracting keywords, obtain keyword set W=(w1,w2,…,wn), wherein w1,w2,…,wnThe each keyword respectively extracted;
A n-dimensional vector S is randomly generated in S1.2, data owner, and vector S is randomly generated per one-dimensional value, takes Value is only 1 or 0;
The invertible matrix M of two n × n dimension is randomly generated in S1.3, data owner1And M2, two matrixes are per one-dimensional value It is randomly generated, n is the keyword number in keyword set;
S1.4, data owner randomly choose two key sk1And sk2;Wherein the key SK of encrypting plaintext document sets F is One five-tuple, is expressed as { S, M1,M2,sk1,sk2}。
Preferably, following steps are further included in the step S2:
S2.1, building document markup vector B: for each document f in plain text document collection Fi, firstly generate a n Balakrishnan Shelves vector D=(D1,D2,…,Di,…,Dn), wherein i ∈ { 1,2 ..., n }, the n are the keyword number in keyword set, D in document vector DiValue be set as corresponding TF value of the keyword in current plain text document collection F;If plain text document collection F is not Comprising the keyword, then by DiIt is set as 0;A keyword in every one-dimensional corresponding keyword set of document markup vector, will Document vector D is divided into u block, if some block is all 0, mark value bbi=0, otherwise bbi=1, obtain document markup vector B =(bb1,bb2,…,bbi,…,bbu), wherein i ∈ { 1,2 ..., u };
S2.2, the corresponding n Balakrishnan shelves index vector of each leaf node for constructing Security Index tree I: Security Index tree I V=(V1,V2,…,Vi,…,Vn), for each document f in plain text document collection FiA leaf node is generated, due to literary in plain text There is m document in shelves collection F, then has m leaf node in Security Index tree I, the corresponding text of each document is stored in leaf node Shelves index vector V, if the corresponding document f of the leaf nodeiComprising the keyword, then V in ViValue be 1, be otherwise 0;For peace Each intermediate node v in full index tree I, stores a n Balakrishnan shelves index vector V=(Vv[1],Vv[2],…,Vv [i],…,Vv[n]), if the document index vector stored in the left child of intermediate node v or right child nodes is not equal to 0, Vv[i] =1, otherwise Vv[i]=0, i is keyword wiSequence in keyword set W;If Vv[i]=1, then it represents that at least have one Item is from intermediate node v to a certain comprising keyword wiLeaf node path;
S2.3, data owner encrypt the document index vector V in Security Index tree I leaf node: for every Document index vector V=(V in a leaf node1,V2,…,Vi,…,Vn) in ViIt is split into two stochastic variable { Vi', Vi"};The splitting method are as follows: the n-dimensional vector S for using data owner to be randomly generated is as division indicator, if the jth of S is tieed up Value is 0, then by Vi' [j] and Vi" value of [j] is set as and Vi[j] is identical, if the jth dimension value of S is 1, is randomly provided Vi' [j] and Vi" [j] value, guarantee sum of the two be equal to Vi[j], then with key SK to Vi' and Vi" encryption, it obtains encrypted Document index vectorEncrypted document index vector V is stored in the leaf of document index vector V In node, and delete corresponding Vi;Wherein, since V is split into two stochastic variables, key SK is to Vi' encrypted result is Vi', to Vi" encrypted result isVi", M1And M2For the invertible matrix for two n × n dimension that data owner is randomly generated;
Security Index tree I is sent to publicly-owned Cloud Server by S2.4, data owner, and document markup vector B is sent to Privately owned Cloud Server.
Preferably, following steps are further included in the step S4:
S4.1, authorized user input the keyword set W'=(w to be searched for1',w'2,…,w'n), and be the searching request Generate a n dimension inquiry label vector Q=(Q1,Q2,…,Qi,…,Qn), wherein inquiry label vector Q=(Q1,Q2,…, Qi,…,Qn) per a keyword in one-dimensional corresponding keyword set, i.e. QiKeyword W in corresponding Wi, wherein i ∈ 1, 2 ..., n }, if keyword WiIn the keyword set W ' to be searched for, then by QiIt is set as the dimension and corresponds to keyword in document The IDF value of concentration, otherwise by QiIt is set as 0;Wherein, which removed by calculating the number of documents comprising a certain given word It is obtained with number of documents all in document sets;
S4.2, it generates trapdoor: inquiry label vector Q is split into two random vectors { Q', Q " };What if the jth of S was tieed up Value is 0, then is randomly provided Q'jAnd Q'j' value, guarantee sum of the two be equal to QjIf the value that the jth of S is tieed up is 1, by Q'jWith Q'j' value be set as and QjIt is identical;Inquiry label vector Q is encrypted using code key SK, generates trapdoorDue to inquiry label vector Q be split into two random vector Q' and Q ", then encrypted query mark to The process for measuring Q is exactly to be encrypted respectively to two stochastic variables after division with code key SK, and the result encrypted is respectivelyQ' andQ ", wherein M1And M2For the invertible matrix for two n × n dimension that data owner is randomly generated;Authorized user will Trapdoor T is sent to publicly-owned Cloud Server, and inquiry label vector Q is sent to privately owned Cloud Server.
Preferably, following steps are further included in the step S5:
S5.1, privately owned Cloud Server receive authorized user transmission inquiry label vector Q after, successively with inquiry label to Whether corresponding value is 0 in each document markup vector B for going matched data owner to send in amount Q, if it is 0, Illustrate there is no the keyword to be searched in the keyword set that data owner provides, if it is 1, by corresponding document index Vector is recorded;
S5.2, privately owned Cloud Server obtain including the key to be searched for by recording corresponding document index vector Word candidate index set of identifiers SID=..., sidi,…,sidj,…,sidz..., wherein i, j, z ∈ { 1,2 ..., n }, Then by candidate index set of identifiers SID=..., sidi,…,sidj,…,sidz... } and it is sent to publicly-owned Cloud Server.
Preferably, it is further included in the step S6:
Candidate index set of identifiers SID=that publicly-owned Cloud Server is sent according to privately owned Cloud Server ..., sidi,…, sidj,…,sidz..., find the encrypted document index vector that corresponding data owner sendsIt is sent out with authorized user The trapdoor T sent calculates the similarity score SC of document, and Documents Similarity score SC calculation formula is as follows:
Wherein, i ∈ (1,2 ..., n);
The Documents Similarity score being calculated is ranked up by publicly-owned Cloud Server, return to before authorized user k it is close Document, wherein the value of k is to meet the ciphertext number of documents of authorized user's demand.
Compared with prior art, the invention has the benefit that the present invention is to the keyword set to be searched for of authorized user It extracts, constructs the index tree that can search for of encryption, and semantic similarity is introduced into the scoring of document, thus when authorization is used When family wishes to search the relevant document of keywords semantics that it to be searched for but can not search out accurate keyword, the present invention can Think that it is matched to the highest document of semantic relevancy, and return to authorized user, realizes keywords semantics sorted search, have The recall precision for improving to effect authorized user, since the present invention inquires label vector by document markup vector and authorized user Matching, has filtered a large amount of irrelevant documents.It is this that encryption method can search for before guaranteeing safety based on keywords semantics sequence It puts, reduces the time of creation index, effectively improve the recall precision of user, can also more accurately return to and award The Semantic Ranking search result that power user most meets.The present invention is using mixing cloud model simultaneously, because private clound is with higher Safety, thus this method can utilize server resource to greatest extent, while realizing high efficiency retrieval, it is ensured that The safety of data will not reveal the relevant information of any keyword.
Detailed description of the invention
A kind of flow chart that can search for encryption method based on keywords semantics sequence of Fig. 1 present invention.
Specific embodiment
The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.
As shown in Figure 1, the encryption method that can search for of the invention based on keywords semantics sequence can search for rope by construction Draw tree, the impact of performance is significantly improved in terms of the semantic ambiguity sequence of keyword, the recall precision of authorized user obtains very big It is promoted, which comprises the steps of:
Step S1, system initialization: data owner is from plain text document collection F=(f1,f2,…,fm) in extracting keywords, Obtain keyword set W=(w1,w2,…,wn), regenerate the key SK of an encrypting plaintext document sets, and by the encryption of generation The key SK of plain text document collection is sent to authorized user, wherein w1,w2,…,wnThe each keyword respectively extracted;
Step S2, data owner is according to keyword set W=(w1,w2,…,wn) construct document markup vector B and create Then document markup vector B is sent to privately owned Cloud Server by Security Index tree I, Security Index tree I is sent to public cloud clothes Business device;
Step S3, data owner is using key SK to plain text document collection F=(f1,f2,…,fm) encrypted, it obtains close Literary document sets C=(c1,c2,…,cm), the obtained ciphertext document sets are sent to publicly-owned Cloud Server;
Step S4, authorized user inputs the keyword set W'=(w to be searched for1',w'2,…,w'n), according to what is searched for Keyword set obtains inquiry label vector Q and trapdoor T, and inquiry label vector Q is then sent to privately owned Cloud Server, will be fallen into Door T is sent to publicly-owned Cloud Server, wherein w1',w'2,…,w'nThe each keyword respectively to be searched for;
Step S5, text of the privately owned Cloud Server to the authorized user inquiry label vector Q sent and data owner's transmission Shelves label vector B is matched, and obtaining may be comprising the candidate index set of identifiers SID of keyword, then by candidate index Set of identifiers SID is sent to publicly-owned Cloud Server;
Step S6, publicly-owned Cloud Server receives the candidate index set of identifiers SID=that privately owned Cloud Server is sent {…,sidi,…,sidj,…,sidz..., the safety cable sent according to candidate index set of identifiers SID and data owner Draw tree I, by corresponding encrypted document index vectorThe similarity score of document is calculated with the trapdoor T that authorized user sends And sort, k ciphertext documents before authorized user are returned to, wherein each leaf node of Security Index tree I is one corresponding Document index vectorThe every one-dimensional value of the document index vector stored in leaf node is the TF that the dimensional vector corresponds to keyword Value, the value of k is to meet the ciphertext number of documents of authorized user's demand;
Wherein the TF value indicates the frequency that a certain given keyword occurs in a document.
Step S7, authorized user uses key SK, and the preceding k ciphertext documents returned to publicly-owned Cloud Server are decrypted, Obtain corresponding plain text document.
Illustratively, in the step S1, data owner generates the specific generating process of encrypting plaintext document sets key SK Are as follows:
Step S1.1, data owner is from plain text document collection F=(f1,f2,…,fm) in extracting keywords, obtain keyword Set W=(w1,w2,…,wn);
Step S1.2, a n-dimensional vector S is randomly generated in data owner, which is all randomly generated per one-dimensional value , value is only 1 or 0;
Step S1.3, the invertible matrix M of two n × n dimension is randomly generated in data owner1And M2, two matrixes are per one-dimensional What value was also randomly generated;
Step S1.4, data owner randomly chooses two key sk1And sk2
Therefore, the key SK of encrypting plaintext document sets is a five-tuple, is expressed as { S, M1,M2,sk1,sk2, above-mentioned n It is the keyword number in keyword set.
Illustratively, specifically include following procedure in the step S2:
Step S2.1, it constructs document markup vector B: each document f is concentrated for plain text documenti, firstly generate a n dimension Document vector D=(D1,D2,…,Di,…,Dn), wherein i ∈ { 1,2 ..., n }, n here are the keyword in keyword set Number, the D in document vector DiValue be set as corresponding TF value of the keyword in current plain text document collection F, if plain text document Collect F and do not include the keyword, then by DiIt is set as 0.A key in every one-dimensional corresponding keyword set of document markup vector Document vector D is divided into u block by word, if some block is all 0, mark value bbi=0, otherwise bbi=1, obtain document markup Vector B=(bb1,bb2,…,bbi,…,bbu), wherein i ∈ { 1,2 ..., u };
Step S2.2, the corresponding n of each leaf node of building Security Index tree I: Security Index tree I ties up document index Vector V=(V1,V2,…,Vi,…,Vn), for each document f in plain text document collection FiA leaf node is generated, because bright There is m document in literary document sets F, so having m leaf node in Security Index tree I.Each document pair is stored in leaf node The document index vector V answered, if the corresponding document f of the leaf nodeiComprising the keyword, then V in ViValue be 1, otherwise for 0.For each intermediate node v in Security Index tree, a n Balakrishnan shelves index vector V=(V is storedv[1],Vv[2],…, Vv[i],…,Vv[n]), if the document index vector stored in the left child of intermediate node v or right child nodes is not equal to 0, Vv[i]=1, otherwise Vv[i]=0, i is keyword wiSequence in keyword set W.If Vv[i]=1, then it represents that at least There are one from intermediate node v to a certain comprising keyword wiLeaf node path;
Step S2.3, data owner encrypts the document index vector V in Security Index tree I leaf node: right The document index vector V=(V in each leaf node1,V2,…,Vi,…,Vn) in ViIt is split into two stochastic variables {Vi',Vi"}.Divide program it is specific as follows: the n-dimensional vector S for using data owner to be randomly generated as divide indicator, if The jth dimension value of S is 0, then by Vi' [j] and Vi" value of [j] is set as and Vi[j] is identical, if the jth dimension value of S is 1, at random V is seti' [j] and Vi" [j] value, but to guarantee sum of the two be equal to Vi[j].Then V is encrypted with key SK, due to V Two stochastic variables are split into, wherein key SK is to Vi' encrypted result isVi', to Vi" encrypted result isVi", from And obtain encrypted document index vectorWherein M1And M2Two be randomly generated for data owner The invertible matrix of a n × n dimension, by encrypted document index vectorIt is stored in the leaf node of document index vector V, and Delete corresponding Vi
Step S2.4, Security Index tree I is sent to publicly-owned Cloud Server by data owner, and document markup vector B is sent out Give privately owned Cloud Server.
Illustratively, specifically include following procedure in the step S4:
Step S4.1, authorized user inputs the keyword set W'=(w to be searched for1',w'2,…,w'n), and be the search Request generates a n dimension inquiry label vector Q=(Q1,Q2,…,Qi,…,Qn), wherein inquiry label vector Q=(Q1,Q2,…, Qi,…,Qn) per a keyword in one-dimensional corresponding keyword set, i.e. QiKeyword W in corresponding Wi, wherein i ∈ 1, 2 ..., n }, if keyword WiIn the keyword set W ' to be searched for, then by QiIt is set as the dimension and corresponds to keyword in document The IDF value of concentration, otherwise by QiBe set as 0, wherein the IDF value be by calculate comprising a certain given word number of documents divided by All number of documents obtain in document sets;
Step S4.2, it generates trapdoor: inquiry label vector Q is split into two random vectors { Q', Q " }.If the jth of S The value of dimension is 0, then is randomly provided Q'jAnd Q'j' value, guarantee sum of the two be equal to QjIt, will if the value that the jth of S is tieed up is 1 Q'jAnd Q'j' value be set as and QjIt is identical.Inquiry label vector Q is encrypted using code key SK, due to inquiring label vector Q Two random vector Q' and Q " are split into, so the process of encrypted query label vector Q is exactly to use code key SK respectively to division Two stochastic variables afterwards are encrypted, and the result encrypted is respectivelyWithGenerate trapdoorWherein M1And M2For the invertible matrix for two n × n dimension that data owner is randomly generated.Authorization is used Trapdoor T is sent to publicly-owned Cloud Server by family, and inquiry label vector Q is sent to privately owned Cloud Server.
Illustratively, specifically include following procedure in the step S5:
Step S5.1, after privately owned Cloud Server receives the inquiry label vector Q of authorized user's transmission, successively with inquiry mark Whether corresponding value is 0 in each document markup vector B for going matched data owner to send in note vector Q, if it is 0, then illustrate there is no the keyword to be searched in the keyword set that data owner provides, if it is 1, by corresponding document Index vector is recorded;
Step S5.2, privately owned Cloud Server obtains including to be searched for by recording corresponding document index vector Keyword candidate index set of identifiers SID=..., sidi,…,sidj,…,sidz..., wherein i, j, z ∈ 1, 2 ..., n }, then by candidate index set of identifiers SID=..., sidi,…,sidj,…,sidz... } and it is sent to public cloud Server.
Illustratively, specifically include following procedure in the step S6:
Candidate index set of identifiers SID=that publicly-owned Cloud Server is sent according to privately owned Cloud Server ..., sidi,…, sidj,…,sidz..., find the encrypted document index vector V and authorized user's transmission that corresponding data owner sends Trapdoor T, calculate the similarity score SC of document, Documents Similarity score SC calculation formula is as follows:
Wherein, i ∈ (1,2 ..., n).
The Documents Similarity score being calculated is ranked up by publicly-owned Cloud Server, return to before authorized user k it is close Document, wherein the value of k is to meet the ciphertext number of documents of authorized user's demand.
In the present invention, to detect the performance that can search for Encryption Model to sort based on keywords semantics, the present invention is proposed Method and the public key encryption methods of traditional several support multiple key word retrievals compare.The sheet obtained by experimental result It invents the public key encryption method proposed and conventional public-key encryption method comparison result is as shown in table 1 below:
More than a kind of encryption method Character Comparison table of table
As it can be seen from table 1 if having higher requirement, additional storage overhead to authorized user's search result accuracy It is difficult to avoid that.It is proposed by the present invention based on key in the case where authorized user intentionally gets more accurate search result Word Semantic Ranking can search for encryption method, compared with the method that traditional support multiple key is searched for, not only may be implemented more The function of keywords semantics sorted search also can search for index tree by building, further improve the search effect of authorized user Rate considerably reduces calculating and the storage overhead of index, ensure that the accuracy of authorized user's search result.
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (6)

1. a kind of can search for encryption method based on keywords semantics sequence, which is characterized in that the method includes the steps of:
S1, data owner concentrate extracting keywords from plain text document, obtain keyword set, regenerate an encrypting plaintext text The key SK of shelves collection, and the key SK of the encrypting plaintext document sets of generation is sent to authorized user;
S2, data owner construct document markup vector according to keyword set and create Security Index tree, then by document mark Note vector is sent to privately owned Cloud Server, and Security Index tree is sent to publicly-owned Cloud Server;
S3, data owner encrypt plain text document collection using key SK, obtain ciphertext document sets, the ciphertext that will be obtained Document sets are sent to publicly-owned Cloud Server;
S4, authorized user input the keyword set to be searched for, obtain inquiry label vector according to the keyword set to be searched for And trapdoor, the inquiry label vector is then sent to privately owned Cloud Server, which is sent to publicly-owned Cloud Server;
The document markup vector of inquiry label vector and data owner's transmission that S5, privately owned Cloud Server send authorized user It is matched, obtains then to send out the candidate index set of identifiers comprising the candidate index set of identifiers of keyword Give publicly-owned Cloud Server;
S6, publicly-owned Cloud Server receive the candidate index set of identifiers that privately owned Cloud Server is sent, according to candidate index mark The Security Index tree for knowing symbol set and data owner's transmission sends out corresponding encrypted document index vector sum authorized user The trapdoor sent calculates the similarity score of document and sequence, returns to k ciphertext documents before authorized user, wherein Security Index tree Each leaf node correspond to a document index vector, the document index vector stored in leaf node is per one-dimensional value It is the TF value that the dimensional vector corresponds to keyword;Wherein, k is the ciphertext number of documents for meeting authorized user's demand, which indicates The frequency that a certain given keyword occurs in a document;
S7, authorized user use key SK, and the preceding k ciphertext documents returned to publicly-owned Cloud Server are decrypted, and obtain corresponding Plain text document.
2. a kind of as described in claim 1 can search for encryption method based on keywords semantics sequence, which is characterized in that
Following steps are further included in the step S1:
S1.1, data owner are from plain text document collection F=(f1,f2,…,fm) in extracting keywords, obtain keyword set W= (w1,w2,…,wn), wherein w1,w2,…,wnThe each keyword respectively extracted;
A n-dimensional vector S is randomly generated in S1.2, data owner, and vector S is randomly generated per one-dimensional value, and value is only It can be 1 or 0;
The invertible matrix M of two n × n dimension is randomly generated in S1.3, data owner1And M2, two matrixes per one-dimensional value be also with What machine generated, n is the keyword number in keyword set;
S1.4, data owner randomly choose two key sk1And sk2;Wherein the key SK of encrypting plaintext document sets F is one Five-tuple is expressed as { S, M1,M2,sk1,sk2}。
3. a kind of as claimed in claim 2 can search for encryption method based on keywords semantics sequence, which is characterized in that
Following steps are further included in the step S2:
S2.1, building document markup vector B: for each document f in plain text document collection Fi, firstly generate a n Balakrishnan shelves vector D=(D1,D2,…,Di,…,Dn), wherein i ∈ { 1,2 ..., n }, the n are the keyword number in keyword set, document to Measure the D in DiValue be set as corresponding TF value of the keyword in current plain text document collection F;It should if plain text document collection F does not include Keyword, then by DiIt is set as 0;Document markup vector per a keyword in one-dimensional corresponding keyword set, by document to Amount D is divided into u block, if some block is all 0, mark value bbi=0, otherwise bbi=1, obtain document markup vector B= (bb1,bb2,…,bbi,…,bbu), wherein i ∈ { 1,2 ..., u };
S2.2, the corresponding n Balakrishnan shelves index vector V=of each leaf node for constructing Security Index tree I: Security Index tree I (V1,V2,…,Vi,…,Vn), for each document f in plain text document collection FiA leaf node is generated, due to plain text document There is m document in collection F, then has m leaf node in Security Index tree I, the corresponding document of each document is stored in leaf node Index vector V, if the corresponding document f of the leaf nodeiComprising the keyword, then V in ViValue be 1, be otherwise 0;For safety Each intermediate node v in index tree I stores a n Balakrishnan shelves index vector V=(Vv[1],Vv[2],…,Vv[i],…, Vv[n]), if the document index vector stored in the left child of intermediate node v or right child nodes is not equal to 0, Vv[i]=1, it is no Then Vv[i]=0, i is keyword wiSequence in keyword set W;If Vv[i]=1, then it represents that at least have one therefrom Intermediate node v is to a certain comprising keyword wiLeaf node path;
S2.3, data owner encrypt the document index vector V in Security Index tree I leaf node: for each leaf Document index vector V=(V in child node1,V2,…,Vi,…,Vn) in ViIt is split into two stochastic variable { Vi',Vi"}; The splitting method are as follows: the n-dimensional vector S for using data owner to be randomly generated is as division indicator, if the jth dimension value of S is 0, then by Vi' [j] and Vi" value of [j] is set as and Vi[j] is identical, if the jth dimension value of S is 1, is randomly provided Vi' [j] and Vi" [j] value, guarantee sum of the two be equal to Vi[j], then with key SK to Vi' and Vi" encryption, obtain encrypted document rope The amount of guiding intoBy encrypted document index vectorIt is stored in the leaf node of document index vector V In, and delete corresponding Vi;Wherein, since V is split into two stochastic variables, key SK is to Vi' encrypted result isIt is right Vi" encrypted result isM1And M2For the invertible matrix for two n × n dimension that data owner is randomly generated;
Security Index tree I is sent to publicly-owned Cloud Server by S2.4, data owner, document markup vector B is sent to privately owned Cloud Server.
4. a kind of as claimed in claim 3 can search for encryption method based on keywords semantics sequence, which is characterized in that
Following steps are further included in the step S4:
S4.1, authorized user input the keyword set W'=(w ' to be searched for1,w'2,…,w'n), and generated for the searching request One n dimension inquiry label vector Q=(Q1,Q2,…,Qi,…,Qn), wherein inquiry label vector Q=(Q1,Q2,…,Qi,…,Qn) Per a keyword in one-dimensional corresponding keyword set, i.e. QiKeyword W in corresponding Wi, wherein i ∈ { 1,2 ..., n }, If keyword WiIn the keyword set W ' to be searched for, then by QiIt is set as the dimension and corresponds to IDF of the keyword in document sets Value, otherwise by QiIt is set as 0;Wherein, which is by calculating the number of documents comprising a certain given word divided by document sets What all number of documents obtained;
S4.2, it generates trapdoor: inquiry label vector Q is split into two random vectors { Q', Q " };If the value that the jth of S is tieed up is 0, then it is randomly provided Q'jWith Q "jValue, guarantee sum of the two be equal to QjIf the value that the jth of S is tieed up is 1, by Q'jWith Q "j's Value is set as and QjIt is identical;Inquiry label vector Q is encrypted using code key SK, generates trapdoorBy Two random vector Q' and Q " are split into inquiry label vector Q, then the process of encrypted query label vector Q is exactly to use code key SK respectively encrypts two stochastic variables after division, and the result encrypted is respectivelyWithWherein M1 And M2For the invertible matrix for two n × n dimension that data owner is randomly generated;Trapdoor T is sent to public cloud clothes by authorized user Inquiry label vector Q is sent to privately owned Cloud Server by business device.
5. a kind of as claimed in claim 4 can search for encryption method based on keywords semantics sequence, which is characterized in that
Following steps are further included in the step S5:
After S5.1, privately owned Cloud Server receive the inquiry label vector Q of authorized user's transmission, successively with inquiry label vector Q In each go matched data owner send document markup vector B in corresponding value whether be 0, if it is 0, illustrate There is no the keyword to be searched in the keyword set that data owner provides, if it is 1, by corresponding document index vector It records;
By recording corresponding document index vector, obtaining may be comprising the keyword to be searched for for S5.2, privately owned Cloud Server Candidate index set of identifiers SID=..., sidi,…,sidj,…,sidz..., wherein i, j, z ∈ { 1,2 ..., n }, then By candidate index set of identifiers SID=..., sidi,…,sidj,…,sidz... } and it is sent to publicly-owned Cloud Server.
6. a kind of as claimed in claim 5 can search for encryption method based on keywords semantics sequence, which is characterized in that
It is further included in the step S6:
Candidate index set of identifiers SID=that publicly-owned Cloud Server is sent according to privately owned Cloud Server ..., sidi,…, sidj,…,sidz..., find the encrypted document index vector that corresponding data owner sendsIt is sent out with authorized user The trapdoor T sent calculates the similarity score SC of document, and Documents Similarity score SC calculation formula is as follows:
Wherein, i ∈ (1,2 ..., n);
The Documents Similarity score being calculated is ranked up by publicly-owned Cloud Server, returns to k ciphertext texts before authorized user Shelves, wherein the value of k is to meet the ciphertext number of documents of authorized user's demand.
CN201810890114.6A 2018-08-07 2018-08-07 It is a kind of that encryption method can search for based on keywords semantics sequence Pending CN109063509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810890114.6A CN109063509A (en) 2018-08-07 2018-08-07 It is a kind of that encryption method can search for based on keywords semantics sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810890114.6A CN109063509A (en) 2018-08-07 2018-08-07 It is a kind of that encryption method can search for based on keywords semantics sequence

Publications (1)

Publication Number Publication Date
CN109063509A true CN109063509A (en) 2018-12-21

Family

ID=64832170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810890114.6A Pending CN109063509A (en) 2018-08-07 2018-08-07 It is a kind of that encryption method can search for based on keywords semantics sequence

Country Status (1)

Country Link
CN (1) CN109063509A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457574A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Information recommendation method, device and the storage medium compared based on data
CN110851481A (en) * 2019-11-08 2020-02-28 青岛大学 Searchable encryption method, device, equipment and readable storage medium
CN111431705A (en) * 2020-03-06 2020-07-17 电子科技大学 Reverse password firewall method suitable for searchable encryption
CN111756777A (en) * 2020-08-28 2020-10-09 腾讯科技(深圳)有限公司 Data transmission method, data processing device, data processing apparatus, and computer storage medium
CN111859421A (en) * 2020-07-08 2020-10-30 中国软件与技术服务股份有限公司 Multi-keyword ciphertext storage and retrieval method and system based on word vector
CN112257455A (en) * 2020-10-21 2021-01-22 西安电子科技大学 Semantic-understanding ciphertext space keyword retrieval method and system
CN112272188A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Searchable encryption method for protecting data privacy of e-commerce platform
CN112328626A (en) * 2020-10-28 2021-02-05 浙江工商大学 Searchable encryption method facing cloud environment and supporting fuzzy keyword sequencing
CN113094573A (en) * 2020-01-09 2021-07-09 中移(上海)信息通信科技有限公司 Multi-keyword sequencing searchable encryption method, device, equipment and storage medium
CN113779597A (en) * 2021-08-19 2021-12-10 深圳技术大学 Method, device, equipment and medium for storing and similar retrieving of encrypted document
CN113821704A (en) * 2020-06-18 2021-12-21 华为技术有限公司 Method and device for constructing index, electronic equipment and storage medium
CN114398650A (en) * 2021-12-16 2022-04-26 西安电子科技大学 Searchable encryption system and method supporting multi-keyword subset retrieval
CN114417109A (en) * 2021-12-29 2022-04-29 电子科技大学广东电子信息工程研究院 Ciphertext searching method, device and system based on security gateway

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951411A (en) * 2017-03-24 2017-07-14 福州大学 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
CN106997384A (en) * 2017-03-24 2017-08-01 福州大学 A kind of semantic ambiguity that can verify that sorts can search for encryption method
CN108171071A (en) * 2017-12-01 2018-06-15 南京邮电大学 A kind of multiple key towards cloud computing can sort cipher text retrieval method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951411A (en) * 2017-03-24 2017-07-14 福州大学 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
CN106997384A (en) * 2017-03-24 2017-08-01 福州大学 A kind of semantic ambiguity that can verify that sorts can search for encryption method
CN108171071A (en) * 2017-12-01 2018-06-15 南京邮电大学 A kind of multiple key towards cloud computing can sort cipher text retrieval method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨旸等: "云计算中保护数据隐私的快速多关键词语义排序搜索方案", 《计算机学报》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457574A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Information recommendation method, device and the storage medium compared based on data
CN110851481A (en) * 2019-11-08 2020-02-28 青岛大学 Searchable encryption method, device, equipment and readable storage medium
CN110851481B (en) * 2019-11-08 2022-06-28 青岛大学 Searchable encryption method, device and equipment and readable storage medium
CN113094573A (en) * 2020-01-09 2021-07-09 中移(上海)信息通信科技有限公司 Multi-keyword sequencing searchable encryption method, device, equipment and storage medium
CN111431705A (en) * 2020-03-06 2020-07-17 电子科技大学 Reverse password firewall method suitable for searchable encryption
CN111431705B (en) * 2020-03-06 2021-08-06 电子科技大学 Reverse password firewall method suitable for searchable encryption
CN113821704B (en) * 2020-06-18 2024-01-16 华为云计算技术有限公司 Method, device, electronic equipment and storage medium for constructing index
CN113821704A (en) * 2020-06-18 2021-12-21 华为技术有限公司 Method and device for constructing index, electronic equipment and storage medium
CN111859421A (en) * 2020-07-08 2020-10-30 中国软件与技术服务股份有限公司 Multi-keyword ciphertext storage and retrieval method and system based on word vector
CN111756777B (en) * 2020-08-28 2020-11-17 腾讯科技(深圳)有限公司 Data transmission method, data processing device, data processing apparatus, and computer storage medium
CN111756777A (en) * 2020-08-28 2020-10-09 腾讯科技(深圳)有限公司 Data transmission method, data processing device, data processing apparatus, and computer storage medium
CN112257455A (en) * 2020-10-21 2021-01-22 西安电子科技大学 Semantic-understanding ciphertext space keyword retrieval method and system
CN112257455B (en) * 2020-10-21 2024-04-30 西安电子科技大学 Semantic understanding ciphertext space keyword retrieval method and system
CN112328626A (en) * 2020-10-28 2021-02-05 浙江工商大学 Searchable encryption method facing cloud environment and supporting fuzzy keyword sequencing
CN112328626B (en) * 2020-10-28 2022-09-30 浙江工商大学 Searchable encryption method facing cloud environment and supporting fuzzy keyword sequencing
CN112272188A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Searchable encryption method for protecting data privacy of e-commerce platform
CN112272188B (en) * 2020-11-02 2022-03-11 重庆邮电大学 Searchable encryption method for protecting data privacy of e-commerce platform
CN113779597A (en) * 2021-08-19 2021-12-10 深圳技术大学 Method, device, equipment and medium for storing and similar retrieving of encrypted document
CN113779597B (en) * 2021-08-19 2023-08-18 深圳技术大学 Method, device, equipment and medium for storing and similar searching of encrypted document
CN114398650A (en) * 2021-12-16 2022-04-26 西安电子科技大学 Searchable encryption system and method supporting multi-keyword subset retrieval
CN114417109A (en) * 2021-12-29 2022-04-29 电子科技大学广东电子信息工程研究院 Ciphertext searching method, device and system based on security gateway
CN114417109B (en) * 2021-12-29 2024-05-17 电子科技大学广东电子信息工程研究院 Ciphertext searching method, device and system based on security gateway

Similar Documents

Publication Publication Date Title
CN109063509A (en) It is a kind of that encryption method can search for based on keywords semantics sequence
Chen et al. An efficient privacy-preserving ranked keyword search method
CN106951411B (en) The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
Fu et al. Enabling central keyword-based semantic extension search over encrypted outsourced data
Cao et al. Privacy-preserving multi-keyword ranked search over encrypted cloud data
Wang et al. Achieving usable and privacy-assured similarity search over outsourced cloud data
CN106997384B (en) Semantic fuzzy searchable encryption method capable of verifying sequencing
Sun et al. Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking
Murugesan et al. Providing privacy through plausibly deniable search
Guo et al. Secure multi-keyword ranked search over encrypted cloud data for multiple data owners
Wang et al. Privacy-preserving ranked multi-keyword fuzzy search on cloud encrypted data supporting range query
CN109739945B (en) Multi-keyword ciphertext sorting and searching method based on mixed index
Li et al. Enabling efficient fuzzy keyword search over encrypted data in cloud computing
Yu et al. Privacy-preserving multikeyword similarity search over outsourced cloud data
Boucenna et al. Secure inverted index based search over encrypted cloud data with user access rights management
Yang et al. Cloud information retrieval: Model description and scheme design
CN108549701A (en) Cloud environment encrypts outsourcing data semantic extended search method and system
CN115495792B (en) Fuzzy keyword searchable encryption method and system with privacy protection function
Jivane Time efficient privacy-preserving multi-keyword ranked search over encrypted cloud data
Raghavendra et al. Split keyword fuzzy and synonym search over encrypted cloud data
Wang et al. Fault-tolerant Verifiable Keyword Symmetric Searchable Encryption in Hybrid Cloud.
CN111966778B (en) Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index
Manasrah et al. A privacy-preserving multi-keyword search approach in cloud computing
Nepolean et al. Privacy preserving ranked keyword search over encrypted cloud data
Li et al. Diverse multi-keyword ranked search over encrypted cloud data supporting range query

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221