CN106951411B - The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing - Google Patents
The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing Download PDFInfo
- Publication number
- CN106951411B CN106951411B CN201710181664.6A CN201710181664A CN106951411B CN 106951411 B CN106951411 B CN 106951411B CN 201710181664 A CN201710181664 A CN 201710181664A CN 106951411 B CN106951411 B CN 106951411B
- Authority
- CN
- China
- Prior art keywords
- vector
- document
- cloud server
- sid
- owned cloud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Abstract
The present invention relates to the quick multi-key word Semantic Ranking searching methods that data-privacy is protected in a kind of cloud computing, the concept of domain weighted scoring are introduced in the scoring of document, assign different weights to the keyword in title, abstract etc. not same area and be distinguish;Semantic expansion is carried out to search key, computing semantic similarity, by semantic similarity, domain weighted scoring and relevance score triplicity, a kind of three factor sort methods are devised, Cloud Server can accurately sort to search result and return to search user;For can search for the not high defect of encipherment scheme search efficiency, a kind of vector piecemeal indicia matched algorithm is devised, the document vector piecemeal that vector space model is created generates the lesser label vector of dimension.The present invention can be improved search efficiency, reduce the index creation time, and realize semantic ciphertext keyword search.
Description
Technical field
The present invention relates to field of cloud calculation, protect the quick multi-key word of data-privacy semantic in especially a kind of cloud computing
Sorted search method.
Background technique
With the rapid development of cloud computing technology, sensitive data is more and more stored in cloud, such as Email, individual
Health records, personal video and photo, corporate financial data and public document etc..The data that Cloud Server provides high quality are deposited
Storage service, stores data into cloud, it is possible to reduce the data storage and maintenance expense of user.But data owner and cloud clothes
Business device can not make outer bag data be in danger in the same trust domain, in order to protect the personal secrets of user, by data
It is a kind of common solution that Cloud Server is then stored into after encryption.However data no longer have original spy after encryption
Property, when user needs certain data, required data can not be directly told in ciphertext, the data volume very little the case where
Under, all ciphertext data can be downloaded to local, search for oneself desired data after decryption in plaintext.However as cloud
The sharp increase of end data scale, this way for wasting plenty of time expense and bandwidth power consumption are obviously no longer satisfied use
The actual demand at family, therefore, the document that needs how are searched in a large amount of ciphertexts become a problem.
After the propositions such as Song can search for encipherment scheme by key stream encryption data and realization, Wang etc. proposes single pass
Keyword sorted search scheme encrypts the tf-idf (term frequency-inverse document frequency) of document by order-preserving encryption, realizes to searching
The accurate sequence of hitch fruit.Cao etc., which proposes multi-key word sequence, can search for encipherment scheme, introduce vector space model and peace
Full KNN (secure k-nearest neighbor) method, encrypts index vector by matrix, and to index vector
Inner product similarity is calculated with locating vector to realize the sorted search of multi-key word.Li et al. propose support boolean queries can
Encipherment scheme is searched for, "or", "AND", the weight of " non-" query word will be executed and be set as three groups of positive number increasing sequence ai,bj,ck, and
And meet super incremental, i.e. ∑ a between sequencei< b1,∑ai+∑bj< c1, will then execute the weight c of " non-" query wordkIt is replaced as
Corresponding opposite number-ck, then document of the relevance score greater than 0 is the relevant documentation for meeting boolean queries.But above-mentioned side
Case does not account for the height sparsity of document vector, often only will appear a small amount of keyword, therefore document in a document
It will appear a large amount of 0 in vector, in retrieval, Cloud Server can not know which is relevant documentation, thus will be to all texts
Shelves carry out the calculating and sequence of similarity score, waste a large amount of time, reduce the search efficiency of scheme.
The case where searching request of user's input often will appear misspelling or format mismatching, above scheme can not
Meet the actual demand of user.Therefore, what Li et al. realized keyword searches for scheme generally, crucial with the method construct of asterisk wildcard
Word fuzzy set, and using editing distance as measuring similarity standard.Liu et al. proposes the side of searching for generally for saving memory space
Case constructs fuzzy set by dictionary, but the accuracy searched for decreases.Chai et al. be put forward for the first time " half it is honest and
The Cloud Server model of curiosity ", provider server may only perform part and search to save calculation amount and bandwidth resources
Rope operation and returning part search result, therefore, article propose based on search tree index structure can verify that can search for plus
Close scheme.Wang etc. utilizes asterisk wildcard and index tree, realizes and efficiently searches for scheme generally.Chuah etc. searches for improve
Efficiency realizes the multi-key word based on BedTree and searches for scheme generally.Wang etc. is by local sensitivity Hash and the safe side KNN
Method (secure k-nearest neighbor) combines, and realizes the new multi-key word of one kind and searches for scheme generally.Wang et al.
On the basis of keyword is searched for generally, by constructing notation index tree, the keyword that proposing can verify that searches for scheme generally.
But the above scheme of searching for generally only only accounts for similar on key word character symbol, there is no in view of in keywords semantics
It is similar.Therefore, Fu etc., which carries out synonym expansion to document keyword, realizes support synonym by way of calculating inner product
The multi-key word sorted search scheme of inquiry.Xia etc. is that document sets create inverted index, crucial using semantic base expanding query
Word, and relevance score is encrypted by one-to-many order-preserving encryption function, realize the search of multi-key word Semantic Ranking
Scheme.However, these semantic ambiguity search plans do not participate in semantic similarity in the scoring of document, and have ignored not
The weight difference of keyword in same area.
For in current ciphertext keyword search scheme, search efficiency is low, the index creation time is long, ranking results are not smart
The problems such as true, the invention proposes one kind, and search efficiency can be improved, reduces the index creation time, and realizes that semantic ciphertext is closed
Keyword search plan.
Summary of the invention
In view of this, the purpose of the present invention is to propose to protect the quick multi-key word of data-privacy semantic in a kind of cloud computing
Sorted search method can be improved search efficiency, reduce the index creation time, and realizes semantic ciphertext keyword search.
The present invention is realized using following scheme: the quick multi-key word Semantic Ranking of data-privacy is protected in a kind of cloud computing
Searching method, including data owner, authorized user, privately owned Cloud Server and publicly-owned Cloud Server, specifically include following step
It is rapid:
Step S1: data owner is from plain text document set F=(f1,f2,…,fm) in extracting keywords, obtain keyword
Set W=(w1,w2,…,wn);
Step S2: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension can
Inverse matrix { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;Then, data possess
Person generates the key sk of an encrypted document, and key SK, sk are sent to authorized user;
Step S3: creation index: data owner is every document fiGenerate document vectorIf document fiIn comprising close
Keyword wj, then enableOtherwise it enablesThen by document vectorDivide u block, obtains document markup vector bi=
(bb1,bb2,…,bbu) and ei=(bi,sidi), then calculate keyword wjIn document fiIn word frequency weight wft,f, inverse document
Frequency idftWith domain weight score ZijAfterwards, willIn 1 value be set to Zij·(wft,f×idft);Then rightCarry out dimension
Extension, wherein the position (n+1) is arranged to a random number ε, the position (n+2) is arranged to 1, thenIt is expressed as
Using safe KNN algorithm for encryption document vectorIt obtainsIi=(sidi,Vi);Wherein,For vectorDivision vector;Finally, data owner is by E=(e1,e2,···,em) it is sent to private clound clothes
Business device gathers indexIt is uploaded to publicly-owned Cloud Server;
Step S4: data owner is using symmetric encipherment algorithm to collection of document F=(f1,f2,…,fm) encrypted, it obtains
To ciphertext set C=(c1,c2,…,cm) and be uploaded to publicly-owned Cloud Server;
Step S5: when authorized user's search, η search key Γ=(q building trapdoor: is inputted first1,q2,…,
qs,…,qη), semantic extension is then carried out, former word q is calculatedsAnd the semantic similarity between word and sequence are expanded, choose most phase
The preceding σ expansion word closed expands word as final, obtains semanteme and expands set Q=(q1,q2,…,qη,δ1,…,δσ) and its it is corresponding
Semantic similarity score SC=(sc1,sc2,…,scη,scη+1,…,scη+σ);Query vector is created according to semantic extension collection QIf wj∈ Q, then enableOtherwiseIt willIt is divided into u block, obtains inquiry label vectorThen by query vector
In 1 be set to corresponding semantic similarity score scj, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with big
In r pairs of random number of 0Scaling, and it is extended to (n+2) dimension, the position (n+2) is arranged to η+σ, thereforeIt is expressed as
Using safe KNN algorithm for encryption query vectorObtain trapdoorFinally, authorized user will
Inquire label vectorPrivately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server;
Step S6: privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, by itself and document markup
Vector bi=(bb1,bb2,…,bbu) matched, obtaining may be comprising the candidate index set of identifiers of search key
SID∈=(, sidi,···,sidj,···,sidz,···);
Privately owned Cloud Server is by SID∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexi
Find corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, after the sequence of all scores,
K documents are to user before returning;
Step S7: the key sk that authorized user uses data owner to distribute solves the top-k piece ciphertext of return
It is close, obtain required plain text document collection.
Further, further include that piecemeal is carried out to document vector sum query vector respectively, generate the lesser document mark of dimension
Remember that vector sum inquires label vector, the matching of label vector is inquired by document markup vector sum, quickly filters out a large amount of unrelated
Document.
Further, to document vector carry out piecemeal specifically includes the following steps:
Step S11: document vectorIt is divided into u block, if some block is all 0, mark value bbs=0, otherwise bbs=1, it obtains
Document markup vector bi=(bb1,bb2,…,bbu);
Step S12: traversing all documents, obtains document markup vector set B=(b1,b2,...,bm)。
Further, by document markup vector sum inquire label vector matching specifically includes the following steps:
Step S21: privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, it successively usesIn it is every
Remove matching e for one 1iIn corresponding piece, i.e. the mark value bb of blocksIt whether is 0;
Step S22: the mark value bb of blocksIf 0, then illustrate corresponding piece of keyword that do not search for of the document, if
For 1 by corresponding index identifier sidiIt records, obtaining may be comprising the candidate index identifier collection of search key
Close SID∈=(, sidi,···,sidj,···,sidz,···);
Step S23: privately owned Cloud Server is by SID∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the mark of index
Know symbol sidiFind corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, by all scores
After sequence, k documents are to user before returning.
It further, further include being segmented document vector, the matrix multiple that each section is reduced with dimension respectively is specific to wrap
Include following steps:
Step S31: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension
Invertible matrix { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;
Step S32: data owner is when constructing index, according to safe KNN algorithm by document vectorIt splits intoWithAfterwards, willWithH sections are respectively classified into, is obtained Then it uses
Key SK is encrypted, and is obtainedWith corresponding index
Ii=(sidi,Vi);Finally, data owner is by E=(e1,e2,···,em) it is sent to privately owned Cloud Server, by indexed set
It closesIt is uploaded to publicly-owned Cloud Server;
Step S33: authorized user also uses operation similar when constructing index in step S32 when generating trapdoor, obtains
TrapdoorFinally, authorized user will inquire label vectorPrivately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server;
Step S34: for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:
Further, the sequence uses three factor sort methods, assigns not to the keyword in document not same area
Same weight, by semantic similarity, domain weighted scoring and relevance score triplicity, specifically includes the following steps:
Step S41: computing semantic similarity: the Resnik algorithm based on the information content is used;
Step S42: computational domain weighted scoring: given a series of document, it is assumed that every document has l domain, corresponding power
Weight coefficient is g respectively1,…gl∈ [0,1], they meet:
Enable siTo inquire and the matching score in i-th of domain of document, wherein whether 1 and 0 respectively indicate and match, then, domain
Weighted scoring is defined as:
Step S43: relevance score: the sublinear change of scale side based on tf-idf weight calculation method and tf is calculated
Method calculates relevance score.
Compared with prior art, the invention has the following beneficial effects:
1, the present invention can be realized efficient search efficiency: for can search for, encipherment scheme is inefficient to be lacked the present invention
Fall into, devise a kind of vector piecemeal indicia matched algorithm, respectively to document vector sum query vector carry out piecemeal, generate dimension compared with
Small document markup vector sum inquires label vector.The matching of label vector is inquired by document markup vector sum, is quickly filtered
Fall a large amount of irrelevant documents, reduces the time for calculating Documents Similarity score and sequence, improve effectiveness of retrieval.
2, the present invention can reduce the index creation time: can search for the dimension of the document vector of encipherment scheme creation usually very
Greatly, so time major expenses being mutually multiplied by document vector sum matrix of scheme constructs index.The present invention devise it is a kind of to
Sectional encryption algorithm is measured, document vector is segmented, the matrix multiple that each section is greatly reduced with dimension respectively, this makes this side
The index construct time of case greatly reduces.
3, the present invention can return exactly to ranking results: the concept of domain weighted scoring is introduced commenting for document for the first time by the present invention
In point, different weights is assigned to the keyword in document not same area, solves and does not consider keyword in existing scheme
The defect of location information, while realizing semantic ambiguity retrieval.The present invention divides semantic similarity, domain weighted scoring and the degree of correlation
Number triplicity, devises a kind of three factor sort methods, and Cloud Server accurately can be sorted and be returned to search result
Back to search user.
Detailed description of the invention
Fig. 1 is the system framework figure of the embodiment of the present invention.
Fig. 2 is the creation process of the present embodiment index.
Fig. 3 is the building process of trapdoor in the embodiment of the present invention.
Fig. 4 is in the embodiment of the present invention as number of documents m=10, keyword number n=100, label vector dimension u=10
When, the process of vector piecemeal labeling algorithm.
Fig. 5 is in the embodiment of the present invention as number of documents m=10, and document markup vector matches calculation with inquiry label vector
The process of method.
Fig. 6 is the main process of vector sectional encryption algorithm in the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
Fig. 1 is the system framework of the embodiment of the present invention, includes four entities: data owner, authorized user, private clound clothes
Business device and publicly-owned Cloud Server.Index and the trapdoor encryption of the present embodiment (can refer to Wong W using safe KNN algorithm for encryption
K,Cheung W L,Kao B,et al.Secure kNN computation on encrypted databases//
Proceedings of the ACM Sigmod International Conference on Management of
Data.New York,USA,2009:139-152.).The basic procedure of the present embodiment is as follows:
(1) Setup: data owner is from plain text document set F=(f1,f2,…,fm) in extracting keywords, obtain key
Set of words W=(w1,w2,…,wn)。
(2) KeyGen (n): data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2)
Invertible matrix { the M of dimension1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n.Then, number
The key sk that an encrypted document is generated according to owner, is sent to authorized user for key SK, sk.
(3) it BuildIndex (F, W, SK): Fig. 2 is counted for the creation process of the present embodiment index based on vector space model
It is every document f according to owneriGenerate document vectorIf document fiIn include keyword wj, thenOtherwiseThen by document vectorDivide u block, obtains document markup vector bi=(bb1,bb2,…,bbu),ei=(bi,
sidi).Then keyword w is calculatedjIn document fiIn word frequency weight wft,f, inverse document frequency idftWith domain weight score ZijAfterwards,
It willIn 1 value be set to (Zij·(wft,f×idft)).Then rightDimension extension is carried out, wherein the position (n+1) is arranged to
One random number ε, the position (n+2) are arranged to 1, thenIt is expressed as
Using safe KNN algorithm for encryption document vectorIt obtainsIi=(sidi,Vi).Finally,
Data owner is by E=(e1,e2,···,em) it is sent to privately owned Cloud Server, index is gatheredIt uploads
To publicly-owned Cloud Server.
(4) Encrypt (F, sk): data owner is using symmetric encipherment algorithm to collection of document F=(f1,f2,…,fm)
It is encrypted, obtains ciphertext set C=(c1,c2,…,cm) and be uploaded to publicly-owned Cloud Server.
(5) Trapdoor (Γ, SK): Fig. 3 for trapdoor in the embodiment of the present invention building process, when authorized user searches for
When, η search key Γ=(q is inputted first1,q2,…,qs,…,qη).Then semantic extension is carried out, former word q is calculatedsWith
The semantic similarity between word and sequence are expanded, maximally related preceding σ expansion word is chosen as final and expands word, obtain semanteme and open up
Open up set Q=(q1,q2,…,qη,δ1,…,δσ) and its corresponding semantic similarity score SC=(sc1,sc2,…,scη,
scη+1,…,scη+σ).Query vector is created according to semantic extension collection QIf wj∈ Q then willOtherwiseIt will
It is divided into u block, obtains inquiry label vectorThen by vectorIn 1 be set to corresponding semantic similarity score scj, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with r pairs of random number greater than 0Scaling, and it is extended to (n+2) dimension, (n+
2) position is arranged to (η+σ), thereforeIt is expressed as
Using safe KNN algorithm for encryption query vectorObtain trapdoorFinally, authorized user will
Inquire label vectorPrivately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server.
(6)QueryPrivately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards,
By itself and document markup vector bi=(bb1,bb2,…,bbu) matched, obtaining may be comprising the candidate rope of search key
Draw set of identifiers SID∈=(, sidi,···,sidj,···,sidz,···)。
Privately owned Cloud Server is by SID∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexi
Find corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, after the sequence of all scores,
K documents are to user before returning.
(7) Decrypt (C, sk): the key sk that authorized user uses data owner to distribute, it is close to a top-k piece for return
Text is decrypted, and obtains required plain text document collection.
Preferably, the present embodiment additionally uses vector piecemeal indicia matched algorithm, it is specific as follows:
The wordbook that can search for encipherment scheme creation is generally very big, this make MRSE scheme (Cao N, Wang C, Li M,
et al.Privacy-preserving multi-keyword ranked search over encrypted cloud
data.IEEE Transactions on Parallel and Distributed Systems,2014,25(1):829-
837) dimension of the document vector created in is usually very big.In retrieval, Cloud Server can not know which is relevant documentation, because
And all documents are carried out with the calculating and sequence of similarity score, waste a large amount of time.
The present embodiment carries out piecemeal to document vector sum query vector respectively, generates the lesser document markup vector sum of dimension
Inquire label vector.The matching that label vector is inquired by document markup vector sum, quickly filters out a large amount of irrelevant documents, reduces
The time for calculating Documents Similarity score and sequence, improve effectiveness of retrieval.
(1) vector piecemeal labeling algorithm:
Document vectorIt is divided into u block, if some block is all 0, mark value bbs=0, otherwise bbs=1, obtain document markup
Vector bi=(bb1,bb2,…,bbu);
All documents are traversed, document markup vector set B=(b is obtained1,b2,...,bm)。
Fig. 4 expression is as number of documents m=10, keyword number n=100, label vector dimension u=10, vector piecemeal mark
Remember the process of algorithm.
Keyword number n=100, then document vectorDimension be 100 dimension.It willIt is divided into u=10 block, each piece there are 10
This block is labeled as bb if 10 elements are all 0 by elements=0, as long as having an element in 10 elements is 1, this block
Label is just 1, obtains document markup vector b1=(1,0,0,0,0,1,0,0,1,0);
10 documents are successively obtained into document markup vector b by this operation1To b10。
(2) matching algorithm of label vector and document markup vector is inquired:
Privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, it successively usesIn each 1 go
With eiIn corresponding piece, i.e. the mark value bb of blocksIt whether is 0;
The mark value bb of blocksIf 0, then illustrate corresponding piece of keyword that do not search for of the document, it will be right if it is 1
The index identifier sid answerediIt records, obtaining may be comprising the candidate index set of identifiers SID of search key∈=
(···,sidi,···,sidj,···,sidz,···);
Privately owned Cloud Server is by SID∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexi
Find corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, after the sequence of all scores,
K documents are to user before returning.
Fig. 5 expression is as number of documents m=10, the process of document markup vector and inquiry label vector matching algorithm.
When the label vector of searching keyword isWhen, it willIn second 1 with 10 documents
The value of label vector corresponding position compares, and obtains b3Corresponding document may record its index comprising the keyword of inquiry
Identifier sid3;
Successively willIn 1 candidate index identifier is obtained compared with the value of the label vector corresponding position of 10 documents
Set SID∈=(sid3,sid8,sid5);
Privately owned Cloud Server is by candidate index set of identifiers SID∈It is sent to publicly-owned Cloud Server, carries out similarity score
Calculating and sorting operation.
Preferably, the present embodiment additionally uses vector sectional encryption algorithm, it is specific as follows:
MRSE scheme (Cao N, Wang C, Li M, et al.Privacy-preserving multi-keyword
ranked search over encrypted cloud data.IEEE Transactions on Parallel and
Distributed Systems, 2014,25 (1): 829-837) creation document vector dimension it is usually very big, so scheme
Construct time major expenses being mutually multiplied by document vector sum matrix of index.Document vector is segmented by the present embodiment, will be each
The matrix multiple that section is greatly reduced with dimension respectively, this greatly reduces the index construct time of this scheme.
Fig. 6 is the main process of vector sectional encryption algorithm, and steps are as follows:
1) the vector S of (n+2) bit and the reversible square of two (n+2) × (n+2) dimension is randomly generated in data owner
Battle array { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n.
2) data owner is when constructing index, according to safe KNN algorithm by document vectorIt splits intoWithAfterwards,
It willWithH sections are respectively classified into, is obtainedThen key SK is used
It is encrypted, is obtained With corresponding index Ii=
(sidi,Vi).Finally, data owner is by E=(e1,e2,···,em) it is sent to privately owned Cloud Server, index is gatheredIt is uploaded to publicly-owned Cloud Server.
3) similar operation, obtains trapdoor when authorized user is also indexed using building when generating trapdoorFinally, authorized user will inquire label vectorHair
To privately owned Cloud Server, by trapdoor TQUpload to publicly-owned Cloud Server.
4) for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:
Preferably, the present embodiment additionally uses three factor sort methods.It is specific as follows:
The present embodiment realizes semantic ambiguity retrieval, when authorized user wishes to search the semantic relevant text of searching keyword
Shelves, or when can not input accurate keyword due to various reasons, semantic relevant document can also be matched to and returned to
Authorized user meets the search need of user.
The present embodiment for the first time introduces the concept of domain weighted scoring in the scoring of document, in document not same area
Keyword assigns different weights, by semantic similarity, domain weighted scoring and relevance score triplicity, proposes one kind three
Factor sort method, so that ranking results are more accurate.
(1) semantic similarity
The present embodiment (can refer to Resnik P.Using using the Resnik algorithm based on the information content
Information Content to Evaluate Semantic Similarity in a Taxonomy.//
Proceedings of the 14th International Joint Conference on Artificial
Intelligence.Montreal, Canada, 1995:448-453.) computing semantic similarity, steps are as follows:
1) two concept c are calculated using Resnik algorithm1And c2Similarity, calculation formula are as follows:
sim(c1, c2)=- log p (lso (c1, c2))=IC (lso (c1, c2))
Wherein lso (c1,c2) indicate concept c1And c2The public father node of bottommost layer in WordNet is-a tree.
P (c) is the probability for occurring the noun of concept c in WordNet corpus, and calculation method is as follows:
Wherein, N indicates the number of noun in WordNet corpus, and freq (c) indicates the list in corpus comprising concept c
Word number, calculation formula are as follows:
Wherein, words (c) indicates the set of letters comprising concept c.
2) Resnik algorithm calculates two word w1And w2Similarity, calculation formula are as follows:
Wherein, s (w1) and s (w2) respectively indicate word w1And w2The concept set for including, a word may include several
A concept.
(2) domain weighted scoring
Domain weighted scoring (can be referred to Manning C D, Raghavan P, Sch ü tze for the first time by the present embodiment
H.Introduction to information retrieval.Cambridge:Cambridge University Press,
2008.) concept introduces in the scoring of document, gives a series of document, it is assumed that every document has l domain, corresponding power
Weight coefficient is g respectively1,…gl∈ [0,1], they meet:
Enable siTo inquire and the matching score (1 and 0 respectively indicate whether match) in i-th of domain of document, then, domain weighting
Methods of marking can be with is defined as:
(3) relevance score
Based on tf-idf weight calculation method, and with reference to the sublinear scale transformation method of tf (Jin L I can be referred to,
Chen X.Efficient multi-user keyword search over encrypted data in cloud
Computing.Computing&Informatics, 2013,32 (4): 723-738), steps are as follows:
1) word frequency weight wf is calculatedt,f:
Semantic similarity, domain weighted scoring and relevance score triplicity are designed a kind of three factors row by the present embodiment
Sequence method, so that ranking results are more accurate.The calculation formula of publicly-owned Cloud Server shelves score above is as follows:
Wherein, Zij、(wft,f×idft) and scjRespectively document fiMiddle keyword wjDomain weighted scoring, relevance score
And semantic similarity.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with
Modification, is all covered by the present invention.
Claims (6)
1. protecting the quick multi-key word Semantic Ranking searching method of data-privacy in a kind of cloud computing, it is characterised in that: including
Data owner, authorized user, privately owned Cloud Server and publicly-owned Cloud Server, specifically includes the following steps:
Step S1: data owner is from plain text document set F=(f1,f2,…,fm) in extracting keywords, obtain keyword set
W=(w1,w2,…,wn);
Step S2: the vector S of (n+2) bit and the reversible square of two (n+2) × (n+2) dimension is randomly generated in data owner
Battle array { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;Then, data owner is raw
At the key sk of an encrypted document, key SK, sk are sent to authorized user;
Step S3: creation index: data owner is every document fiGenerate document vectorIf document fiIn include keyword
wj, then enableOtherwise it enablesThen by document vectorDivide u block, obtains document markup vector bi=(bb1,
bb2,…,bbu) and ei=(bi,sidi), then calculate keyword wjIn document fiIn word frequency weight wft,f, inverse document frequency
idftWith domain weight score ZijAfterwards, willIn 1 value be set to Zij·(wft,f×idft);Then rightDimension extension is carried out,
Wherein the position (n+1) is arranged to a random number ε, and the position (n+2) is arranged to 1, thenIt is expressed assidiIt indicates
Meaning is index identifier;
Using safe KNN algorithm for encryption document vectorIt obtainsIi=(sidi,Vi);Wherein,For vectorDivision vector;Finally, data owner is by E=(e1,e2,…,em) it is sent to privately owned cloud service
Device gathers indexIt is uploaded to publicly-owned Cloud Server;
Step S4: data owner is using symmetric encipherment algorithm to collection of document F=(f1,f2,…,fm) encrypted, it obtains close
Collected works close C=(c1,c2,…,cm) and be uploaded to publicly-owned Cloud Server;
Step S5: when authorized user's search, η search key Γ=(q building trapdoor: is inputted first1,q2,…,qs,…,
qη), semantic extension is then carried out, former word q is calculatedsAnd the semantic similarity between word and sequence are expanded, before selection is maximally related
σ expansion word expands word as final, obtains semantic expansion set Q=(q1,q2,…,qη,δ1,…,δσ) and its corresponding semanteme
Similarity score SC=(sc1,sc2,…,scη,scη+1,…,scη+σ);Query vector is created according to semantic extension collection QIf wj
∈ Q, then enableOtherwiseIt willIt is divided into u block, obtains inquiry label vectorThen by query vectorIn 1
It is set to corresponding semantic similarity score scj, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with greater than 0
R pairs of random numberScaling, and it is extended to (n+2) dimension, the position (n+2) is arranged to η+σ, thereforeIt is expressed as
Using safe KNN algorithm for encryption query vectorObtain trapdoorFinally, authorized user will inquire
Label vectorPrivately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server;Wherein,Indicate query vectorPoint
A vector after splitting,Indicate query vectorAnother vector after division;
Step S6: privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, by itself and document markup vector
bi=(bb1,bb2,…,bbu) matched, obtaining may be comprising the candidate index set of identifiers SID of search key∈=
(…,sidi,…,sidj,…,sidz,…);
Privately owned Cloud Server is by SID∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexiIt finds
Corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document returns after the sequence of all scores
Preceding k document is to user;
Step S7: the key sk that authorized user uses data owner to distribute is decrypted the top-k piece ciphertext of return, obtains
Obtain required plain text document collection.
2. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1
Method, it is characterised in that: further include respectively to document vector sum query vector carry out piecemeal, generate the lesser document markup of dimension to
Amount and inquiry label vector, the matching of label vector is inquired by document markup vector sum, quickly filters out a large amount of irrelevant documents.
3. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 2
Method, it is characterised in that: to document vector carry out piecemeal specifically includes the following steps:
Step S11: document vectorIt is divided into u block, if some block is all 0, mark value bbs=0, otherwise bbs=1, obtain document
Label vector bi=(bb1,bb2,…,bbu);
Step S12: traversing all documents, obtains document markup vector set B=(b1,b2,...,bm)。
4. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 2
Method, it is characterised in that: by document markup vector sum inquire label vector matching specifically includes the following steps:
Step S21: privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, it successively usesIn each 1
Remove matching eiIn corresponding piece, i.e. the mark value bb of blocksIt whether is 0;
Step S22: the mark value bb of blocksIf 0, then illustrate the keyword that the corresponding document of the block is not searched for, if it is 1
By corresponding index identifier sidiIt records, obtaining may be comprising the candidate index set of identifiers SID of search key∈
=(..., sidi,…,sidj,…,sidz,…);
Step S23: privately owned Cloud Server is by SID∈It is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier of index
sidiFind corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, all scores are sorted
Afterwards, k documents are to user before returning.
5. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1
Method, it is characterised in that: further include being segmented document vector, the matrix multiple that each section is reduced with dimension respectively specifically includes
Following steps:
Step S31: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension it is reversible
Matrix { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;
Step S32: data owner is when constructing index, according to safe KNN algorithm by document vectorIt splits intoWith
Afterwards, willWithH sections are respectively classified into, is obtainedThen using close
Key SK is encrypted, and is obtainedWith corresponding index Ii
=(sidi,Vi);Finally, data owner is by E=(e1,e2,…,em) it is sent to privately owned Cloud Server, index is gatheredIt is uploaded to publicly-owned Cloud Server;Wherein, matrix M1h,M2hIndicate dimension be ((n+2)/h) × ((n+2)/
H) invertible matrix;
Step S33: authorized user also uses operation similar when constructing index in step S32 when generating trapdoor, obtains trapdoorFinally, authorized user will inquire label vectorHair
To privately owned Cloud Server, by trapdoor TQUpload to publicly-owned Cloud Server;
Step S34: for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:
6. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1
Method, it is characterised in that: the sequence uses three factor sort methods, assigns to the keyword in document not same area different
Weight, by semantic similarity, domain weighted scoring and relevance score triplicity, specifically includes the following steps:
Step S41: computing semantic similarity: the Resnik algorithm based on the information content is used;
Step S42: computational domain weighted scoring: given a series of document, it is assumed that every document has l domain, corresponding weight system
Number is g respectively1,…gl∈ [0,1], they meet:
Enable siTo inquire and the matching score in i-th of domain of document, wherein whether 1 and 0 respectively indicate and match, then, domain weighting is commented
Point is defined as:
Step S43: relevance score is calculated: based on the sublinear scale transformation method of tf-idf weight calculation method and tf
Calculate relevance score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710181664.6A CN106951411B (en) | 2017-03-24 | 2017-03-24 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710181664.6A CN106951411B (en) | 2017-03-24 | 2017-03-24 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951411A CN106951411A (en) | 2017-07-14 |
CN106951411B true CN106951411B (en) | 2019-10-15 |
Family
ID=59472824
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710181664.6A Active CN106951411B (en) | 2017-03-24 | 2017-03-24 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951411B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107734054A (en) * | 2017-11-06 | 2018-02-23 | 福州大学 | Encryption data searching system in safe cloud storage |
CN108390855A (en) * | 2018-01-11 | 2018-08-10 | 中国人民解放军战略支援部队信息工程大学 | A kind of attribute base keyword search encryption system and method towards cloud storage |
CN108549701A (en) * | 2018-04-17 | 2018-09-18 | 上海海事大学 | Cloud environment encrypts outsourcing data semantic extended search method and system |
CN108647529A (en) * | 2018-05-09 | 2018-10-12 | 上海海事大学 | A kind of semantic-based multi-key word sorted search intimacy protection system and method |
CN110555198B (en) * | 2018-05-31 | 2023-05-23 | 北京百度网讯科技有限公司 | Method, apparatus, device and computer readable storage medium for generating articles |
CN108959567B (en) * | 2018-07-04 | 2021-05-04 | 武汉大学 | Safe retrieval method suitable for large-scale images in cloud environment |
CN109165520B (en) * | 2018-07-16 | 2020-08-11 | 哈尔滨工业大学(深圳) | Data encryption method and device and data encryption retrieval system |
CN109063509A (en) * | 2018-08-07 | 2018-12-21 | 上海海事大学 | It is a kind of that encryption method can search for based on keywords semantics sequence |
CN110858251B (en) * | 2018-08-22 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Data query method and device |
CN109271485B (en) * | 2018-09-19 | 2022-03-08 | 南京邮电大学 | Cloud environment encrypted document sequencing and searching method supporting semantics |
CN109376288B (en) * | 2018-09-28 | 2021-04-23 | 邦道科技有限公司 | Cloud computing platform for realizing semantic search and balancing method thereof |
CN109308264B (en) * | 2018-10-22 | 2021-11-16 | 北京天融信网络安全技术有限公司 | Method for evaluating data desensitization effect, corresponding device and storage medium |
CN109471964B (en) * | 2018-10-23 | 2021-10-01 | 三亚哈尔滨工程大学南海创新发展基地 | Synonym set-based fuzzy multi-keyword searchable encryption method |
CN109815730B (en) * | 2018-12-29 | 2020-11-20 | 中国科学院软件研究所 | Searchable encryption method and system supporting skyline query |
CN109766314A (en) * | 2019-01-07 | 2019-05-17 | 西安电子科技大学 | Ciphertext data multi-key word searching method based on probability trapdoor |
CN109960757A (en) * | 2019-02-27 | 2019-07-02 | 北京搜狗科技发展有限公司 | Web search method and device |
CN110175328B (en) * | 2019-05-16 | 2023-02-17 | 镇江市高等专科学校 | Method and system for measuring document similarity based on keyword sequence structure |
CN110795544B (en) * | 2019-09-10 | 2023-07-21 | 腾讯科技(深圳)有限公司 | Content searching method, device, equipment and storage medium |
CN110727951B (en) * | 2019-10-14 | 2021-08-27 | 桂林电子科技大学 | Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function |
CN111966778B (en) * | 2020-07-08 | 2022-07-29 | 南京邮电大学 | Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index |
CN112134872B (en) * | 2020-09-16 | 2022-07-26 | 江苏省未来网络创新研究院 | Network system with multi-application-layer cloud computing function |
CN112966018B (en) * | 2021-03-03 | 2022-04-01 | 北京邮电大学 | Fuzzy retrieval method based on mixed cloud |
CN113139106B (en) * | 2021-05-07 | 2022-03-15 | 杭州世平信息科技有限公司 | Event auditing method and device for security check |
CN113987593B (en) * | 2021-12-28 | 2022-03-15 | 北京妙医佳健康科技集团有限公司 | Data processing method |
CN115630154B (en) * | 2022-12-19 | 2023-05-05 | 竞速信息技术(廊坊)有限公司 | Big data environment-oriented dynamic abstract information construction method and system |
CN117278216B (en) * | 2023-11-23 | 2024-02-13 | 三亚学院 | Encryption system based on cloud computing virtualization and network storage files |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104023051A (en) * | 2014-05-22 | 2014-09-03 | 西安理工大学 | Multi-user multi-keyword searchable encryption method in cloud storage |
CN104765848A (en) * | 2015-04-17 | 2015-07-08 | 中国人民解放军空军航空大学 | Symmetrical searchable encryption method for supporting result high-efficiency sequencing in hybrid cloud storage |
CN105335510A (en) * | 2015-10-30 | 2016-02-17 | 成都博睿德科技有限公司 | Text data efficient searching method |
CN106407447A (en) * | 2016-09-30 | 2017-02-15 | 福州大学 | Simhash-based fuzzy sequencing searching method for encrypted cloud data |
-
2017
- 2017-03-24 CN CN201710181664.6A patent/CN106951411B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104023051A (en) * | 2014-05-22 | 2014-09-03 | 西安理工大学 | Multi-user multi-keyword searchable encryption method in cloud storage |
CN104765848A (en) * | 2015-04-17 | 2015-07-08 | 中国人民解放军空军航空大学 | Symmetrical searchable encryption method for supporting result high-efficiency sequencing in hybrid cloud storage |
CN105335510A (en) * | 2015-10-30 | 2016-02-17 | 成都博睿德科技有限公司 | Text data efficient searching method |
CN106407447A (en) * | 2016-09-30 | 2017-02-15 | 福州大学 | Simhash-based fuzzy sequencing searching method for encrypted cloud data |
Non-Patent Citations (2)
Title |
---|
Privacy-preserving multi-keyword ranked search over encrypted cloud data;Ning Cao et al.;《2011 Proceedings IEEE INFOCOM》;20110415;829-837 * |
Two-Step-Ranking Secure Multi-Keyword Search over Encrypted Cloud Data;Jun Xu et al.;《2012 International Conference on Cloud and Service Computing》;20121124;124-130 * |
Also Published As
Publication number | Publication date |
---|---|
CN106951411A (en) | 2017-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951411B (en) | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing | |
CN104765848B (en) | What support result efficiently sorted in mixing cloud storage symmetrically can search for encryption method | |
CN106997384B (en) | Semantic fuzzy searchable encryption method capable of verifying sequencing | |
Chen et al. | Collabseer: a search engine for collaboration discovery | |
Zhang et al. | SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems | |
CN108647529A (en) | A kind of semantic-based multi-key word sorted search intimacy protection system and method | |
CN109063509A (en) | It is a kind of that encryption method can search for based on keywords semantics sequence | |
Zhong et al. | Efficient dynamic multi-keyword fuzzy search over encrypted cloud data | |
Xia et al. | An efficient and privacy-preserving semantic multi-keyword ranked search over encrypted cloud data | |
CN108171071A (en) | A kind of multiple key towards cloud computing can sort cipher text retrieval method | |
Liu et al. | FSSE: An effective fuzzy semantic searchable encryption scheme over encrypted cloud data | |
CN109255244A (en) | Data ciphering method and its device, data encryption searching system | |
CN108549701A (en) | Cloud environment encrypts outsourcing data semantic extended search method and system | |
Mittal et al. | Privacy preserving synonym based fuzzy multi-keyword ranked search over encrypted cloud data | |
Van Zwol et al. | Ranking entity facets based on user click feedback | |
Hu et al. | Topical authority propagation on microblogs | |
CN108768651A (en) | That supports the inquiry of multidimensional interval range can search for public key encryption method | |
Yang et al. | A keyword-based scholar recommendation framework for biomedical literature | |
Er et al. | Set of t-uples expansion by example | |
Raghavendra et al. | Split keyword fuzzy and synonym search over encrypted cloud data | |
Xia et al. | A multi-keyword ranked search over encrypted cloud data supporting semantic extension | |
Onal et al. | Utilizing word embeddings for result diversification in tweet search | |
Hassanpour et al. | An adaptive meta-search engine considering the user’s field of interest | |
Ma et al. | Joining user profiles across online social networks: From the perspective of an adversary | |
CN106777233A (en) | The personalized search system of the support secret protection based on cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |