CN106951411B

CN106951411B - The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing

Info

Publication number: CN106951411B
Application number: CN201710181664.6A
Authority: CN
Inventors: 杨旸; 刘佳; 叶少珍; 蔡圣暐; 杨书略; 邹剑
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2019-10-15
Anticipated expiration: 2037-03-24
Also published as: CN106951411A

Abstract

The present invention relates to the quick multi-key word Semantic Ranking searching methods that data-privacy is protected in a kind of cloud computing, the concept of domain weighted scoring are introduced in the scoring of document, assign different weights to the keyword in title, abstract etc. not same area and be distinguish；Semantic expansion is carried out to search key, computing semantic similarity, by semantic similarity, domain weighted scoring and relevance score triplicity, a kind of three factor sort methods are devised, Cloud Server can accurately sort to search result and return to search user；For can search for the not high defect of encipherment scheme search efficiency, a kind of vector piecemeal indicia matched algorithm is devised, the document vector piecemeal that vector space model is created generates the lesser label vector of dimension.The present invention can be improved search efficiency, reduce the index creation time, and realize semantic ciphertext keyword search.

Description

The quick multi-key word Semantic Ranking search of data-privacy is protected in a kind of cloud computing Method

Technical field

The present invention relates to field of cloud calculation, protect the quick multi-key word of data-privacy semantic in especially a kind of cloud computing Sorted search method.

Background technique

With the rapid development of cloud computing technology, sensitive data is more and more stored in cloud, such as Email, individual Health records, personal video and photo, corporate financial data and public document etc..The data that Cloud Server provides high quality are deposited Storage service, stores data into cloud, it is possible to reduce the data storage and maintenance expense of user.But data owner and cloud clothes Business device can not make outer bag data be in danger in the same trust domain, in order to protect the personal secrets of user, by data It is a kind of common solution that Cloud Server is then stored into after encryption.However data no longer have original spy after encryption Property, when user needs certain data, required data can not be directly told in ciphertext, the data volume very little the case where Under, all ciphertext data can be downloaded to local, search for oneself desired data after decryption in plaintext.However as cloud The sharp increase of end data scale, this way for wasting plenty of time expense and bandwidth power consumption are obviously no longer satisfied use The actual demand at family, therefore, the document that needs how are searched in a large amount of ciphertexts become a problem.

After the propositions such as Song can search for encipherment scheme by key stream encryption data and realization, Wang etc. proposes single pass Keyword sorted search scheme encrypts the tf-idf (term frequency-inverse document frequency) of document by order-preserving encryption, realizes to searching The accurate sequence of hitch fruit.Cao etc., which proposes multi-key word sequence, can search for encipherment scheme, introduce vector space model and peace Full KNN (secure k-nearest neighbor) method, encrypts index vector by matrix, and to index vector Inner product similarity is calculated with locating vector to realize the sorted search of multi-key word.Li et al. propose support boolean queries can Encipherment scheme is searched for, "or", "AND", the weight of " non-" query word will be executed and be set as three groups of positive number increasing sequence a_i,b_j,c_k, and And meet super incremental, i.e. ∑ a between sequence_i< b₁,∑a_i+∑b_j< c₁, will then execute the weight c of " non-" query word_kIt is replaced as Corresponding opposite number-c_k, then document of the relevance score greater than 0 is the relevant documentation for meeting boolean queries.But above-mentioned side Case does not account for the height sparsity of document vector, often only will appear a small amount of keyword, therefore document in a document It will appear a large amount of 0 in vector, in retrieval, Cloud Server can not know which is relevant documentation, thus will be to all texts Shelves carry out the calculating and sequence of similarity score, waste a large amount of time, reduce the search efficiency of scheme.

The case where searching request of user's input often will appear misspelling or format mismatching, above scheme can not Meet the actual demand of user.Therefore, what Li et al. realized keyword searches for scheme generally, crucial with the method construct of asterisk wildcard Word fuzzy set, and using editing distance as measuring similarity standard.Liu et al. proposes the side of searching for generally for saving memory space Case constructs fuzzy set by dictionary, but the accuracy searched for decreases.Chai et al. be put forward for the first time " half it is honest and The Cloud Server model of curiosity ", provider server may only perform part and search to save calculation amount and bandwidth resources Rope operation and returning part search result, therefore, article propose based on search tree index structure can verify that can search for plus Close scheme.Wang etc. utilizes asterisk wildcard and index tree, realizes and efficiently searches for scheme generally.Chuah etc. searches for improve Efficiency realizes the multi-key word based on BedTree and searches for scheme generally.Wang etc. is by local sensitivity Hash and the safe side KNN Method (secure k-nearest neighbor) combines, and realizes the new multi-key word of one kind and searches for scheme generally.Wang et al. On the basis of keyword is searched for generally, by constructing notation index tree, the keyword that proposing can verify that searches for scheme generally. But the above scheme of searching for generally only only accounts for similar on key word character symbol, there is no in view of in keywords semantics It is similar.Therefore, Fu etc., which carries out synonym expansion to document keyword, realizes support synonym by way of calculating inner product The multi-key word sorted search scheme of inquiry.Xia etc. is that document sets create inverted index, crucial using semantic base expanding query Word, and relevance score is encrypted by one-to-many order-preserving encryption function, realize the search of multi-key word Semantic Ranking Scheme.However, these semantic ambiguity search plans do not participate in semantic similarity in the scoring of document, and have ignored not The weight difference of keyword in same area.

For in current ciphertext keyword search scheme, search efficiency is low, the index creation time is long, ranking results are not smart The problems such as true, the invention proposes one kind, and search efficiency can be improved, reduces the index creation time, and realizes that semantic ciphertext is closed Keyword search plan.

Summary of the invention

In view of this, the purpose of the present invention is to propose to protect the quick multi-key word of data-privacy semantic in a kind of cloud computing Sorted search method can be improved search efficiency, reduce the index creation time, and realizes semantic ciphertext keyword search.

The present invention is realized using following scheme: the quick multi-key word Semantic Ranking of data-privacy is protected in a kind of cloud computing Searching method, including data owner, authorized user, privately owned Cloud Server and publicly-owned Cloud Server, specifically include following step It is rapid:

Step S1: data owner is from plain text document set F=(f₁,f₂,…,f_m) in extracting keywords, obtain keyword Set W=(w₁,w₂,…,w_n)；

Step S2: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension can Inverse matrix { M₁,M₂, key SK is by four-tuple { S, M₁,M₂, u } and composition, u is a positive integer and u | n；Then, data possess Person generates the key sk of an encrypted document, and key SK, sk are sent to authorized user；

Step S3: creation index: data owner is every document f_iGenerate document vector

If document f_iIn comprising close Keyword w_j, then enable

Otherwise it enables

Then by document vector

Divide u block, obtains document markup vector b_i= (bb₁,bb₂,…,bb_u) and e_i=(b_i,sid_i), then calculate keyword w_jIn document f_iIn word frequency weight wf_t,f, inverse document Frequency idf_tWith domain weight score Z_ijAfterwards, will

In 1 value be set to Z_ij·(wf_t,f×idf_t)；Then right

Carry out dimension Extension, wherein the position (n+1) is arranged to a random number ε, the position (n+2) is arranged to 1, then

It is expressed as

Using safe KNN algorithm for encryption document vector

It obtains

I_i=(sid_i,V_i)；Wherein,

For vector

Division vector；Finally, data owner is by E=(e₁,e₂,···,e_m) it is sent to private clound clothes Business device gathers index

It is uploaded to publicly-owned Cloud Server；

Step S4: data owner is using symmetric encipherment algorithm to collection of document F=(f₁,f₂,…,f_m) encrypted, it obtains To ciphertext set C=(c₁,c₂,…,c_m) and be uploaded to publicly-owned Cloud Server；

Step S5: when authorized user's search, η search key Γ=(q building trapdoor: is inputted first₁,q₂,…, q_s,…,q_η), semantic extension is then carried out, former word q is calculated_sAnd the semantic similarity between word and sequence are expanded, choose most phase The preceding σ expansion word closed expands word as final, obtains semanteme and expands set Q=(q₁,q₂,…,q_η,δ₁,…,δ_σ) and its it is corresponding Semantic similarity score SC=(sc₁,sc₂,…,sc_η,sc_η+1,…,sc_η+σ)；Query vector is created according to semantic extension collection Q

If w_j∈ Q, then enable

Otherwise

It will

It is divided into u block, obtains inquiry label vectorThen by query vector

In 1 be set to corresponding semantic similarity score sc_j, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with big In r pairs of random number of 0

Scaling, and it is extended to (n+2) dimension, the position (n+2) is arranged to η+σ, therefore

It is expressed as

Using safe KNN algorithm for encryption query vector

Obtain trapdoorFinally, authorized user will Inquire label vector

Privately owned Cloud Server is issued, by trapdoor T_QUpload to publicly-owned Cloud Server；

Step S6: privately owned Cloud Server receives the inquiry label vector of authorized user's transmission

Afterwards, by itself and document markup Vector b_i=(bb₁,bb₂,…,bb_u) matched, obtaining may be comprising the candidate index set of identifiers of search key SID_∈=(, sid_i,···,sid_j,···,sid_z,···)；

Privately owned Cloud Server is by SID_∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of index_i Find corresponding Security Index I_i, by corresponding V_iWith trapdoor T_QThe similarity score for calculating document, after the sequence of all scores, K documents are to user before returning；

Step S7: the key sk that authorized user uses data owner to distribute solves the top-k piece ciphertext of return It is close, obtain required plain text document collection.

Further, further include that piecemeal is carried out to document vector sum query vector respectively, generate the lesser document mark of dimension Remember that vector sum inquires label vector, the matching of label vector is inquired by document markup vector sum, quickly filters out a large amount of unrelated Document.

Further, to document vector carry out piecemeal specifically includes the following steps:

Step S11: document vectorIt is divided into u block, if some block is all 0, mark value bb_s=0, otherwise bb_s=1, it obtains Document markup vector b_i=(bb₁,bb₂,…,bb_u)；

Step S12: traversing all documents, obtains document markup vector set B=(b₁,b₂,...,b_m)。

Further, by document markup vector sum inquire label vector matching specifically includes the following steps:

Step S21: privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, it successively uses

In it is every Remove matching e for one 1_iIn corresponding piece, i.e. the mark value bb of block_sIt whether is 0；

Step S22: the mark value bb of block_sIf 0, then illustrate corresponding piece of keyword that do not search for of the document, if For 1 by corresponding index identifier sid_iIt records, obtaining may be comprising the candidate index identifier collection of search key Close SID_∈=(, sid_i,···,sid_j,···,sid_z,···)；

Step S23: privately owned Cloud Server is by SID_∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the mark of index Know symbol sid_iFind corresponding Security Index I_i, by corresponding V_iWith trapdoor T_QThe similarity score for calculating document, by all scores After sequence, k documents are to user before returning.

It further, further include being segmented document vector, the matrix multiple that each section is reduced with dimension respectively is specific to wrap Include following steps:

Step S31: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension Invertible matrix { M₁,M₂, key SK is by four-tuple { S, M₁,M₂, u } and composition, u is a positive integer and u | n；

Step S32: data owner is when constructing index, according to safe KNN algorithm by document vectorIt splits into

WithAfterwards, will

WithH sections are respectively classified into, is obtained

Then it uses Key SK is encrypted, and is obtainedWith corresponding index I_i=(sid_i,V_i)；Finally, data owner is by E=(e₁,e₂,···,e_m) it is sent to privately owned Cloud Server, by indexed set It closesIt is uploaded to publicly-owned Cloud Server；

Step S33: authorized user also uses operation similar when constructing index in step S32 when generating trapdoor, obtains Trapdoor

Finally, authorized user will inquire label vector

Step S34: for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:

。

Further, the sequence uses three factor sort methods, assigns not to the keyword in document not same area Same weight, by semantic similarity, domain weighted scoring and relevance score triplicity, specifically includes the following steps:

Step S41: computing semantic similarity: the Resnik algorithm based on the information content is used；

Step S42: computational domain weighted scoring: given a series of document, it is assumed that every document has l domain, corresponding power Weight coefficient is g respectively₁,…g_l∈ [0,1], they meet:

Enable s_iTo inquire and the matching score in i-th of domain of document, wherein whether 1 and 0 respectively indicate and match, then, domain Weighted scoring is defined as:

Step S43: relevance score: the sublinear change of scale side based on tf-idf weight calculation method and tf is calculated Method calculates relevance score.

Compared with prior art, the invention has the following beneficial effects:

1, the present invention can be realized efficient search efficiency: for can search for, encipherment scheme is inefficient to be lacked the present invention Fall into, devise a kind of vector piecemeal indicia matched algorithm, respectively to document vector sum query vector carry out piecemeal, generate dimension compared with Small document markup vector sum inquires label vector.The matching of label vector is inquired by document markup vector sum, is quickly filtered Fall a large amount of irrelevant documents, reduces the time for calculating Documents Similarity score and sequence, improve effectiveness of retrieval.

2, the present invention can reduce the index creation time: can search for the dimension of the document vector of encipherment scheme creation usually very Greatly, so time major expenses being mutually multiplied by document vector sum matrix of scheme constructs index.The present invention devise it is a kind of to Sectional encryption algorithm is measured, document vector is segmented, the matrix multiple that each section is greatly reduced with dimension respectively, this makes this side The index construct time of case greatly reduces.

3, the present invention can return exactly to ranking results: the concept of domain weighted scoring is introduced commenting for document for the first time by the present invention In point, different weights is assigned to the keyword in document not same area, solves and does not consider keyword in existing scheme The defect of location information, while realizing semantic ambiguity retrieval.The present invention divides semantic similarity, domain weighted scoring and the degree of correlation Number triplicity, devises a kind of three factor sort methods, and Cloud Server accurately can be sorted and be returned to search result Back to search user.

Detailed description of the invention

Fig. 1 is the system framework figure of the embodiment of the present invention.

Fig. 2 is the creation process of the present embodiment index.

Fig. 3 is the building process of trapdoor in the embodiment of the present invention.

Fig. 4 is in the embodiment of the present invention as number of documents m=10, keyword number n=100, label vector dimension u=10 When, the process of vector piecemeal labeling algorithm.

Fig. 5 is in the embodiment of the present invention as number of documents m=10, and document markup vector matches calculation with inquiry label vector The process of method.

Fig. 6 is the main process of vector sectional encryption algorithm in the embodiment of the present invention.

Specific embodiment

The present invention will be further described with reference to the accompanying drawings and embodiments.

Fig. 1 is the system framework of the embodiment of the present invention, includes four entities: data owner, authorized user, private clound clothes Business device and publicly-owned Cloud Server.Index and the trapdoor encryption of the present embodiment (can refer to Wong W using safe KNN algorithm for encryption K,Cheung W L,Kao B,et al.Secure kNN computation on encrypted databases// Proceedings of the ACM Sigmod International Conference on Management of Data.New York,USA,2009:139-152.).The basic procedure of the present embodiment is as follows:

(1) Setup: data owner is from plain text document set F=(f₁,f₂,…,f_m) in extracting keywords, obtain key Set of words W=(w₁,w₂,…,w_n)。

(2) KeyGen (n): data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) Invertible matrix { the M of dimension₁,M₂, key SK is by four-tuple { S, M₁,M₂, u } and composition, u is a positive integer and u | n.Then, number The key sk that an encrypted document is generated according to owner, is sent to authorized user for key SK, sk.

(3) it BuildIndex (F, W, SK): Fig. 2 is counted for the creation process of the present embodiment index based on vector space model It is every document f according to owner_iGenerate document vector

If document f_iIn include keyword w_j, then

OtherwiseThen by document vector

Divide u block, obtains document markup vector b_i=(bb₁,bb₂,…,bb_u),e_i=(b_i, sid_i).Then keyword w is calculated_jIn document f_iIn word frequency weight wf_t,f, inverse document frequency idf_tWith domain weight score Z_ijAfterwards, It will

In 1 value be set to (Z_ij·(wf_t,f×idf_t)).Then right

Dimension extension is carried out, wherein the position (n+1) is arranged to One random number ε, the position (n+2) are arranged to 1, then

It is expressed as

Using safe KNN algorithm for encryption document vectorIt obtainsI_i=(sid_i,V_i).Finally, Data owner is by E=(e₁,e₂,···,e_m) it is sent to privately owned Cloud Server, index is gathered

It uploads To publicly-owned Cloud Server.

(4) Encrypt (F, sk): data owner is using symmetric encipherment algorithm to collection of document F=(f₁,f₂,…,f_m) It is encrypted, obtains ciphertext set C=(c₁,c₂,…,c_m) and be uploaded to publicly-owned Cloud Server.

(5) Trapdoor (Γ, SK): Fig. 3 for trapdoor in the embodiment of the present invention building process, when authorized user searches for When, η search key Γ=(q is inputted first₁,q₂,…,q_s,…,q_η).Then semantic extension is carried out, former word q is calculated_sWith The semantic similarity between word and sequence are expanded, maximally related preceding σ expansion word is chosen as final and expands word, obtain semanteme and open up Open up set Q=(q₁,q₂,…,q_η,δ₁,…,δ_σ) and its corresponding semantic similarity score SC=(sc₁,sc₂,…,sc_η, sc_η+1,…,sc_η+σ).Query vector is created according to semantic extension collection QIf w_j∈ Q then will

Otherwise

It will

It is divided into u block, obtains inquiry label vector

Then by vector

In 1 be set to corresponding semantic similarity score sc_j, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with r pairs of random number greater than 0

Scaling, and it is extended to (n+2) dimension, (n+ 2) position is arranged to (η+σ), thereforeIt is expressed as

Using safe KNN algorithm for encryption query vector

Obtain trapdoor

Finally, authorized user will Inquire label vector

Privately owned Cloud Server is issued, by trapdoor T_QUpload to publicly-owned Cloud Server.

(6)Query

Privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, By itself and document markup vector b_i=(bb₁,bb₂,…,bb_u) matched, obtaining may be comprising the candidate rope of search key Draw set of identifiers SID_∈=(, sid_i,···,sid_j,···,sid_z,···)。

Privately owned Cloud Server is by SID_∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of index_i Find corresponding Security Index I_i, by corresponding V_iWith trapdoor T_QThe similarity score for calculating document, after the sequence of all scores, K documents are to user before returning.

(7) Decrypt (C, sk): the key sk that authorized user uses data owner to distribute, it is close to a top-k piece for return Text is decrypted, and obtains required plain text document collection.

Preferably, the present embodiment additionally uses vector piecemeal indicia matched algorithm, it is specific as follows:

The wordbook that can search for encipherment scheme creation is generally very big, this make MRSE scheme (Cao N, Wang C, Li M, et al.Privacy-preserving multi-keyword ranked search over encrypted cloud data.IEEE Transactions on Parallel and Distributed Systems,2014,25(1):829- 837) dimension of the document vector created in is usually very big.In retrieval, Cloud Server can not know which is relevant documentation, because And all documents are carried out with the calculating and sequence of similarity score, waste a large amount of time.

The present embodiment carries out piecemeal to document vector sum query vector respectively, generates the lesser document markup vector sum of dimension Inquire label vector.The matching that label vector is inquired by document markup vector sum, quickly filters out a large amount of irrelevant documents, reduces The time for calculating Documents Similarity score and sequence, improve effectiveness of retrieval.

(1) vector piecemeal labeling algorithm:

Document vector

It is divided into u block, if some block is all 0, mark value bb_s=0, otherwise bb_s=1, obtain document markup Vector b_i=(bb₁,bb₂,…,bb_u)；

All documents are traversed, document markup vector set B=(b is obtained₁,b₂,...,b_m)。

Fig. 4 expression is as number of documents m=10, keyword number n=100, label vector dimension u=10, vector piecemeal mark Remember the process of algorithm.

Keyword number n=100, then document vector

Dimension be 100 dimension.It will

It is divided into u=10 block, each piece there are 10 This block is labeled as bb if 10 elements are all 0 by element_s=0, as long as having an element in 10 elements is 1, this block Label is just 1, obtains document markup vector b₁=(1,0,0,0,0,1,0,0,1,0)；

10 documents are successively obtained into document markup vector b by this operation₁To b₁₀。

(2) matching algorithm of label vector and document markup vector is inquired:

Privately owned Cloud Server receives the inquiry label vector of authorized user's transmission

Afterwards, it successively uses

In each 1 go With e_iIn corresponding piece, i.e. the mark value bb of block_sIt whether is 0；

The mark value bb of block_sIf 0, then illustrate corresponding piece of keyword that do not search for of the document, it will be right if it is 1 The index identifier sid answered_iIt records, obtaining may be comprising the candidate index set of identifiers SID of search key_∈= (···,sid_i,···,sid_j,···,sid_z,···)；

Fig. 5 expression is as number of documents m=10, the process of document markup vector and inquiry label vector matching algorithm.

When the label vector of searching keyword is

When, it will

In second 1 with 10 documents The value of label vector corresponding position compares, and obtains b₃Corresponding document may record its index comprising the keyword of inquiry Identifier sid₃；

Successively will

In 1 candidate index identifier is obtained compared with the value of the label vector corresponding position of 10 documents Set SID_∈=(sid₃,sid₈,sid₅)；

Privately owned Cloud Server is by candidate index set of identifiers SID_∈It is sent to publicly-owned Cloud Server, carries out similarity score Calculating and sorting operation.

Preferably, the present embodiment additionally uses vector sectional encryption algorithm, it is specific as follows:

MRSE scheme (Cao N, Wang C, Li M, et al.Privacy-preserving multi-keyword ranked search over encrypted cloud data.IEEE Transactions on Parallel and Distributed Systems, 2014,25 (1): 829-837) creation document vector dimension it is usually very big, so scheme Construct time major expenses being mutually multiplied by document vector sum matrix of index.Document vector is segmented by the present embodiment, will be each The matrix multiple that section is greatly reduced with dimension respectively, this greatly reduces the index construct time of this scheme.

Fig. 6 is the main process of vector sectional encryption algorithm, and steps are as follows:

1) the vector S of (n+2) bit and the reversible square of two (n+2) × (n+2) dimension is randomly generated in data owner Battle array { M₁,M₂, key SK is by four-tuple { S, M₁,M₂, u } and composition, u is a positive integer and u | n.

2) data owner is when constructing index, according to safe KNN algorithm by document vector

It splits intoWith

Afterwards, It will

With

H sections are respectively classified into, is obtained

Then key SK is used It is encrypted, is obtained

With corresponding index I_i= (sid_i,V_i).Finally, data owner is by E=(e₁,e₂,···,e_m) it is sent to privately owned Cloud Server, index is gathered

It is uploaded to publicly-owned Cloud Server.

3) similar operation, obtains trapdoor when authorized user is also indexed using building when generating trapdoor

Finally, authorized user will inquire label vector

Hair To privately owned Cloud Server, by trapdoor T_QUpload to publicly-owned Cloud Server.

4) for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:

Preferably, the present embodiment additionally uses three factor sort methods.It is specific as follows:

The present embodiment realizes semantic ambiguity retrieval, when authorized user wishes to search the semantic relevant text of searching keyword Shelves, or when can not input accurate keyword due to various reasons, semantic relevant document can also be matched to and returned to Authorized user meets the search need of user.

The present embodiment for the first time introduces the concept of domain weighted scoring in the scoring of document, in document not same area Keyword assigns different weights, by semantic similarity, domain weighted scoring and relevance score triplicity, proposes one kind three Factor sort method, so that ranking results are more accurate.

(1) semantic similarity

The present embodiment (can refer to Resnik P.Using using the Resnik algorithm based on the information content Information Content to Evaluate Semantic Similarity in a Taxonomy.// Proceedings of the 14th International Joint Conference on Artificial Intelligence.Montreal, Canada, 1995:448-453.) computing semantic similarity, steps are as follows:

1) two concept c are calculated using Resnik algorithm₁And c₂Similarity, calculation formula are as follows:

sim(c₁, c₂)=- log p (lso (c₁, c₂))=IC (lso (c₁, c₂))

Wherein lso (c₁,c₂) indicate concept c₁And c₂The public father node of bottommost layer in WordNet is-a tree.

P (c) is the probability for occurring the noun of concept c in WordNet corpus, and calculation method is as follows:

Wherein, N indicates the number of noun in WordNet corpus, and freq (c) indicates the list in corpus comprising concept c Word number, calculation formula are as follows:

Wherein, words (c) indicates the set of letters comprising concept c.

2) Resnik algorithm calculates two word w₁And w₂Similarity, calculation formula are as follows:

Wherein, s (w₁) and s (w₂) respectively indicate word w₁And w₂The concept set for including, a word may include several A concept.

(2) domain weighted scoring

Domain weighted scoring (can be referred to Manning C D, Raghavan P, Sch ü tze for the first time by the present embodiment H.Introduction to information retrieval.Cambridge:Cambridge University Press, 2008.) concept introduces in the scoring of document, gives a series of document, it is assumed that every document has l domain, corresponding power Weight coefficient is g respectively₁,…g_l∈ [0,1], they meet:

Enable s_iTo inquire and the matching score (1 and 0 respectively indicate whether match) in i-th of domain of document, then, domain weighting Methods of marking can be with is defined as:

(3) relevance score

Based on tf-idf weight calculation method, and with reference to the sublinear scale transformation method of tf (Jin L I can be referred to, Chen X.Efficient multi-user keyword search over encrypted data in cloud Computing.Computing&Informatics, 2013,32 (4): 723-738), steps are as follows:

1) word frequency weight wf is calculated_t,f:

2) inverse document frequency idf is calculated_t:

Semantic similarity, domain weighted scoring and relevance score triplicity are designed a kind of three factors row by the present embodiment Sequence method, so that ranking results are more accurate.The calculation formula of publicly-owned Cloud Server shelves score above is as follows:

Wherein, Z_ij、(wf_t,f×idf_t) and sc_jRespectively document f_iMiddle keyword w_jDomain weighted scoring, relevance score And semantic similarity.

The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims

1. protecting the quick multi-key word Semantic Ranking searching method of data-privacy in a kind of cloud computing, it is characterised in that: including Data owner, authorized user, privately owned Cloud Server and publicly-owned Cloud Server, specifically includes the following steps:

Step S2: the vector S of (n+2) bit and the reversible square of two (n+2) × (n+2) dimension is randomly generated in data owner Battle array { M₁,M₂, key SK is by four-tuple { S, M₁,M₂, u } and composition, u is a positive integer and u | n；Then, data owner is raw At the key sk of an encrypted document, key SK, sk are sent to authorized user；

If document f_iIn include keyword w_j, then enable

Otherwise it enables

Then by document vector

Divide u block, obtains document markup vector b_i=(bb₁, bb₂,…,bb_u) and e_i=(b_i,sid_i), then calculate keyword w_jIn document f_iIn word frequency weight wf_t,f, inverse document frequency idf_tWith domain weight score Z_ijAfterwards, will

In 1 value be set to Z_ij·(wf_t,f×idf_t)；Then rightDimension extension is carried out, Wherein the position (n+1) is arranged to a random number ε, and the position (n+2) is arranged to 1, then

It is expressed as

sid_iIt indicates Meaning is index identifier；

Using safe KNN algorithm for encryption document vector

It obtainsI_i=(sid_i,V_i)；Wherein,

For vector

Division vector；Finally, data owner is by E=(e₁,e₂,…,e_m) it is sent to privately owned cloud service Device gathers index

It is uploaded to publicly-owned Cloud Server；

Step S4: data owner is using symmetric encipherment algorithm to collection of document F=(f₁,f₂,…,f_m) encrypted, it obtains close Collected works close C=(c₁,c₂,…,c_m) and be uploaded to publicly-owned Cloud Server；

Step S5: when authorized user's search, η search key Γ=(q building trapdoor: is inputted first₁,q₂,…,q_s,…, q_η), semantic extension is then carried out, former word q is calculated_sAnd the semantic similarity between word and sequence are expanded, before selection is maximally related σ expansion word expands word as final, obtains semantic expansion set Q=(q₁,q₂,…,q_η,δ₁,…,δ_σ) and its corresponding semanteme Similarity score SC=(sc₁,sc₂,…,sc_η,sc_η+1,…,sc_η+σ)；Query vector is created according to semantic extension collection Q

If w_j ∈ Q, then enable

Otherwise

It will

It is divided into u block, obtains inquiry label vector

Then by query vector

In 1 It is set to corresponding semantic similarity score sc_j, then will

It is extended to (n+1) dimension and the position (n+1) is set as 1, with greater than 0 R pairs of random numberScaling, and it is extended to (n+2) dimension, the position (n+2) is arranged to η+σ, therefore

It is expressed as

Using safe KNN algorithm for encryption query vector

Obtain trapdoor

Finally, authorized user will inquire Label vector

Privately owned Cloud Server is issued, by trapdoor T_QUpload to publicly-owned Cloud Server；Wherein,

Indicate query vector

Point A vector after splitting,

Indicate query vectorAnother vector after division；

Afterwards, by itself and document markup vector b_i=(bb₁,bb₂,…,bb_u) matched, obtaining may be comprising the candidate index set of identifiers SID of search key_∈= (…,sid_i,…,sid_j,…,sid_z,…)；

Privately owned Cloud Server is by SID_∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of index_iIt finds Corresponding Security Index I_i, by corresponding V_iWith trapdoor T_QThe similarity score for calculating document returns after the sequence of all scores Preceding k document is to user；

Step S7: the key sk that authorized user uses data owner to distribute is decrypted the top-k piece ciphertext of return, obtains Obtain required plain text document collection.

2. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1 Method, it is characterised in that: further include respectively to document vector sum query vector carry out piecemeal, generate the lesser document markup of dimension to Amount and inquiry label vector, the matching of label vector is inquired by document markup vector sum, quickly filters out a large amount of irrelevant documents.

3. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 2 Method, it is characterised in that: to document vector carry out piecemeal specifically includes the following steps:

Step S11: document vector

It is divided into u block, if some block is all 0, mark value bb_s=0, otherwise bb_s=1, obtain document Label vector b_i=(bb₁,bb₂,…,bb_u)；

4. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 2 Method, it is characterised in that: by document markup vector sum inquire label vector matching specifically includes the following steps:

Step S21: privately owned Cloud Server receives the inquiry label vector of authorized user's transmission

Afterwards, it successively uses

In each 1 Remove matching e_iIn corresponding piece, i.e. the mark value bb of block_sIt whether is 0；

Step S22: the mark value bb of block_sIf 0, then illustrate the keyword that the corresponding document of the block is not searched for, if it is 1 By corresponding index identifier sid_iIt records, obtaining may be comprising the candidate index set of identifiers SID of search key_∈ =(..., sid_i,…,sid_j,…,sid_z,…)；

Step S23: privately owned Cloud Server is by SID_∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier of index sid_iFind corresponding Security Index I_i, by corresponding V_iWith trapdoor T_QThe similarity score for calculating document, all scores are sorted Afterwards, k documents are to user before returning.

5. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1 Method, it is characterised in that: further include being segmented document vector, the matrix multiple that each section is reduced with dimension respectively specifically includes Following steps:

Step S31: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension it is reversible Matrix { M₁,M₂, key SK is by four-tuple { S, M₁,M₂, u } and composition, u is a positive integer and u | n；

Step S32: data owner is when constructing index, according to safe KNN algorithm by document vector

It splits into

With

Afterwards, willWith

H sections are respectively classified into, is obtained

Then using close Key SK is encrypted, and is obtained

With corresponding index I_i =(sid_i,V_i)；Finally, data owner is by E=(e₁,e₂,…,e_m) it is sent to privately owned Cloud Server, index is gathered

It is uploaded to publicly-owned Cloud Server；Wherein, matrix M_1h,M_2hIndicate dimension be ((n+2)/h) × ((n+2)/ H) invertible matrix；

Finally, authorized user will inquire label vector

Hair To privately owned Cloud Server, by trapdoor T_QUpload to publicly-owned Cloud Server；

6. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1 Method, it is characterised in that: the sequence uses three factor sort methods, assigns to the keyword in document not same area different Weight, by semantic similarity, domain weighted scoring and relevance score triplicity, specifically includes the following steps:

Step S42: computational domain weighted scoring: given a series of document, it is assumed that every document has l domain, corresponding weight system Number is g respectively₁,…g_l∈ [0,1], they meet:

Enable s_iTo inquire and the matching score in i-th of domain of document, wherein whether 1 and 0 respectively indicate and match, then, domain weighting is commented Point is defined as:

Step S43: relevance score is calculated: based on the sublinear scale transformation method of tf-idf weight calculation method and tf Calculate relevance score.