CN106951411B - The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing - Google Patents

The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing Download PDF

Info

Publication number
CN106951411B
CN106951411B CN201710181664.6A CN201710181664A CN106951411B CN 106951411 B CN106951411 B CN 106951411B CN 201710181664 A CN201710181664 A CN 201710181664A CN 106951411 B CN106951411 B CN 106951411B
Authority
CN
China
Prior art keywords
vector
document
cloud server
sid
owned cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710181664.6A
Other languages
Chinese (zh)
Other versions
CN106951411A (en
Inventor
杨旸
刘佳
叶少珍
蔡圣暐
杨书略
邹剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201710181664.6A priority Critical patent/CN106951411B/en
Publication of CN106951411A publication Critical patent/CN106951411A/en
Application granted granted Critical
Publication of CN106951411B publication Critical patent/CN106951411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The present invention relates to the quick multi-key word Semantic Ranking searching methods that data-privacy is protected in a kind of cloud computing, the concept of domain weighted scoring are introduced in the scoring of document, assign different weights to the keyword in title, abstract etc. not same area and be distinguish;Semantic expansion is carried out to search key, computing semantic similarity, by semantic similarity, domain weighted scoring and relevance score triplicity, a kind of three factor sort methods are devised, Cloud Server can accurately sort to search result and return to search user;For can search for the not high defect of encipherment scheme search efficiency, a kind of vector piecemeal indicia matched algorithm is devised, the document vector piecemeal that vector space model is created generates the lesser label vector of dimension.The present invention can be improved search efficiency, reduce the index creation time, and realize semantic ciphertext keyword search.

Description

The quick multi-key word Semantic Ranking search of data-privacy is protected in a kind of cloud computing Method
Technical field
The present invention relates to field of cloud calculation, protect the quick multi-key word of data-privacy semantic in especially a kind of cloud computing Sorted search method.
Background technique
With the rapid development of cloud computing technology, sensitive data is more and more stored in cloud, such as Email, individual Health records, personal video and photo, corporate financial data and public document etc..The data that Cloud Server provides high quality are deposited Storage service, stores data into cloud, it is possible to reduce the data storage and maintenance expense of user.But data owner and cloud clothes Business device can not make outer bag data be in danger in the same trust domain, in order to protect the personal secrets of user, by data It is a kind of common solution that Cloud Server is then stored into after encryption.However data no longer have original spy after encryption Property, when user needs certain data, required data can not be directly told in ciphertext, the data volume very little the case where Under, all ciphertext data can be downloaded to local, search for oneself desired data after decryption in plaintext.However as cloud The sharp increase of end data scale, this way for wasting plenty of time expense and bandwidth power consumption are obviously no longer satisfied use The actual demand at family, therefore, the document that needs how are searched in a large amount of ciphertexts become a problem.
After the propositions such as Song can search for encipherment scheme by key stream encryption data and realization, Wang etc. proposes single pass Keyword sorted search scheme encrypts the tf-idf (term frequency-inverse document frequency) of document by order-preserving encryption, realizes to searching The accurate sequence of hitch fruit.Cao etc., which proposes multi-key word sequence, can search for encipherment scheme, introduce vector space model and peace Full KNN (secure k-nearest neighbor) method, encrypts index vector by matrix, and to index vector Inner product similarity is calculated with locating vector to realize the sorted search of multi-key word.Li et al. propose support boolean queries can Encipherment scheme is searched for, "or", "AND", the weight of " non-" query word will be executed and be set as three groups of positive number increasing sequence ai,bj,ck, and And meet super incremental, i.e. ∑ a between sequencei< b1,∑ai+∑bj< c1, will then execute the weight c of " non-" query wordkIt is replaced as Corresponding opposite number-ck, then document of the relevance score greater than 0 is the relevant documentation for meeting boolean queries.But above-mentioned side Case does not account for the height sparsity of document vector, often only will appear a small amount of keyword, therefore document in a document It will appear a large amount of 0 in vector, in retrieval, Cloud Server can not know which is relevant documentation, thus will be to all texts Shelves carry out the calculating and sequence of similarity score, waste a large amount of time, reduce the search efficiency of scheme.
The case where searching request of user's input often will appear misspelling or format mismatching, above scheme can not Meet the actual demand of user.Therefore, what Li et al. realized keyword searches for scheme generally, crucial with the method construct of asterisk wildcard Word fuzzy set, and using editing distance as measuring similarity standard.Liu et al. proposes the side of searching for generally for saving memory space Case constructs fuzzy set by dictionary, but the accuracy searched for decreases.Chai et al. be put forward for the first time " half it is honest and The Cloud Server model of curiosity ", provider server may only perform part and search to save calculation amount and bandwidth resources Rope operation and returning part search result, therefore, article propose based on search tree index structure can verify that can search for plus Close scheme.Wang etc. utilizes asterisk wildcard and index tree, realizes and efficiently searches for scheme generally.Chuah etc. searches for improve Efficiency realizes the multi-key word based on BedTree and searches for scheme generally.Wang etc. is by local sensitivity Hash and the safe side KNN Method (secure k-nearest neighbor) combines, and realizes the new multi-key word of one kind and searches for scheme generally.Wang et al. On the basis of keyword is searched for generally, by constructing notation index tree, the keyword that proposing can verify that searches for scheme generally. But the above scheme of searching for generally only only accounts for similar on key word character symbol, there is no in view of in keywords semantics It is similar.Therefore, Fu etc., which carries out synonym expansion to document keyword, realizes support synonym by way of calculating inner product The multi-key word sorted search scheme of inquiry.Xia etc. is that document sets create inverted index, crucial using semantic base expanding query Word, and relevance score is encrypted by one-to-many order-preserving encryption function, realize the search of multi-key word Semantic Ranking Scheme.However, these semantic ambiguity search plans do not participate in semantic similarity in the scoring of document, and have ignored not The weight difference of keyword in same area.
For in current ciphertext keyword search scheme, search efficiency is low, the index creation time is long, ranking results are not smart The problems such as true, the invention proposes one kind, and search efficiency can be improved, reduces the index creation time, and realizes that semantic ciphertext is closed Keyword search plan.
Summary of the invention
In view of this, the purpose of the present invention is to propose to protect the quick multi-key word of data-privacy semantic in a kind of cloud computing Sorted search method can be improved search efficiency, reduce the index creation time, and realizes semantic ciphertext keyword search.
The present invention is realized using following scheme: the quick multi-key word Semantic Ranking of data-privacy is protected in a kind of cloud computing Searching method, including data owner, authorized user, privately owned Cloud Server and publicly-owned Cloud Server, specifically include following step It is rapid:
Step S1: data owner is from plain text document set F=(f1,f2,…,fm) in extracting keywords, obtain keyword Set W=(w1,w2,…,wn);
Step S2: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension can Inverse matrix { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;Then, data possess Person generates the key sk of an encrypted document, and key SK, sk are sent to authorized user;
Step S3: creation index: data owner is every document fiGenerate document vector
Figure BDA0001253734210000031
If document fiIn comprising close Keyword wj, then enable
Figure BDA0001253734210000032
Otherwise it enables
Figure BDA0001253734210000033
Then by document vector
Figure BDA0001253734210000034
Divide u block, obtains document markup vector bi= (bb1,bb2,…,bbu) and ei=(bi,sidi), then calculate keyword wjIn document fiIn word frequency weight wft,f, inverse document Frequency idftWith domain weight score ZijAfterwards, will
Figure BDA0001253734210000035
In 1 value be set to Zij·(wft,f×idft);Then right
Figure BDA0001253734210000036
Carry out dimension Extension, wherein the position (n+1) is arranged to a random number ε, the position (n+2) is arranged to 1, then
Figure BDA0001253734210000037
It is expressed as
Figure BDA0001253734210000038
Using safe KNN algorithm for encryption document vector
Figure BDA0001253734210000039
It obtains
Figure BDA00012537342100000310
Ii=(sidi,Vi);Wherein,
Figure BDA00012537342100000311
For vector
Figure BDA00012537342100000312
Division vector;Finally, data owner is by E=(e1,e2,···,em) it is sent to private clound clothes Business device gathers index
Figure BDA00012537342100000313
It is uploaded to publicly-owned Cloud Server;
Step S4: data owner is using symmetric encipherment algorithm to collection of document F=(f1,f2,…,fm) encrypted, it obtains To ciphertext set C=(c1,c2,…,cm) and be uploaded to publicly-owned Cloud Server;
Step S5: when authorized user's search, η search key Γ=(q building trapdoor: is inputted first1,q2,…, qs,…,qη), semantic extension is then carried out, former word q is calculatedsAnd the semantic similarity between word and sequence are expanded, choose most phase The preceding σ expansion word closed expands word as final, obtains semanteme and expands set Q=(q1,q2,…,qη1,…,δσ) and its it is corresponding Semantic similarity score SC=(sc1,sc2,…,scη,scη+1,…,scη+σ);Query vector is created according to semantic extension collection Q
Figure BDA0001253734210000041
If wj∈ Q, then enable
Figure BDA0001253734210000042
Otherwise
Figure BDA0001253734210000043
It will
Figure BDA0001253734210000044
It is divided into u block, obtains inquiry label vectorThen by query vector
Figure BDA0001253734210000046
In 1 be set to corresponding semantic similarity score scj, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with big In r pairs of random number of 0
Figure BDA0001253734210000048
Scaling, and it is extended to (n+2) dimension, the position (n+2) is arranged to η+σ, therefore
Figure BDA0001253734210000049
It is expressed as
Figure BDA00012537342100000410
Using safe KNN algorithm for encryption query vector
Figure BDA00012537342100000411
Obtain trapdoorFinally, authorized user will Inquire label vector
Figure BDA00012537342100000413
Privately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server;
Step S6: privately owned Cloud Server receives the inquiry label vector of authorized user's transmission
Figure BDA00012537342100000414
Afterwards, by itself and document markup Vector bi=(bb1,bb2,…,bbu) matched, obtaining may be comprising the candidate index set of identifiers of search key SID=(, sidi,···,sidj,···,sidz,···);
Privately owned Cloud Server is by SIDIt is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexi Find corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, after the sequence of all scores, K documents are to user before returning;
Step S7: the key sk that authorized user uses data owner to distribute solves the top-k piece ciphertext of return It is close, obtain required plain text document collection.
Further, further include that piecemeal is carried out to document vector sum query vector respectively, generate the lesser document mark of dimension Remember that vector sum inquires label vector, the matching of label vector is inquired by document markup vector sum, quickly filters out a large amount of unrelated Document.
Further, to document vector carry out piecemeal specifically includes the following steps:
Step S11: document vectorIt is divided into u block, if some block is all 0, mark value bbs=0, otherwise bbs=1, it obtains Document markup vector bi=(bb1,bb2,…,bbu);
Step S12: traversing all documents, obtains document markup vector set B=(b1,b2,...,bm)。
Further, by document markup vector sum inquire label vector matching specifically includes the following steps:
Step S21: privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, it successively uses
Figure BDA0001253734210000053
In it is every Remove matching e for one 1iIn corresponding piece, i.e. the mark value bb of blocksIt whether is 0;
Step S22: the mark value bb of blocksIf 0, then illustrate corresponding piece of keyword that do not search for of the document, if For 1 by corresponding index identifier sidiIt records, obtaining may be comprising the candidate index identifier collection of search key Close SID=(, sidi,···,sidj,···,sidz,···);
Step S23: privately owned Cloud Server is by SIDIt is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the mark of index Know symbol sidiFind corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, by all scores After sequence, k documents are to user before returning.
It further, further include being segmented document vector, the matrix multiple that each section is reduced with dimension respectively is specific to wrap Include following steps:
Step S31: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension Invertible matrix { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;
Step S32: data owner is when constructing index, according to safe KNN algorithm by document vectorIt splits into
Figure BDA0001253734210000055
WithAfterwards, will
Figure BDA0001253734210000057
WithH sections are respectively classified into, is obtained
Figure BDA0001253734210000059
Figure BDA00012537342100000510
Then it uses Key SK is encrypted, and is obtainedWith corresponding index Ii=(sidi,Vi);Finally, data owner is by E=(e1,e2,···,em) it is sent to privately owned Cloud Server, by indexed set It closesIt is uploaded to publicly-owned Cloud Server;
Step S33: authorized user also uses operation similar when constructing index in step S32 when generating trapdoor, obtains Trapdoor
Figure BDA0001253734210000063
Finally, authorized user will inquire label vector
Figure BDA0001253734210000064
Privately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server;
Step S34: for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:
Figure BDA0001253734210000065
Further, the sequence uses three factor sort methods, assigns not to the keyword in document not same area Same weight, by semantic similarity, domain weighted scoring and relevance score triplicity, specifically includes the following steps:
Step S41: computing semantic similarity: the Resnik algorithm based on the information content is used;
Step S42: computational domain weighted scoring: given a series of document, it is assumed that every document has l domain, corresponding power Weight coefficient is g respectively1,…gl∈ [0,1], they meet:
Figure BDA0001253734210000066
Enable siTo inquire and the matching score in i-th of domain of document, wherein whether 1 and 0 respectively indicate and match, then, domain Weighted scoring is defined as:
Figure BDA0001253734210000067
Step S43: relevance score: the sublinear change of scale side based on tf-idf weight calculation method and tf is calculated Method calculates relevance score.
Compared with prior art, the invention has the following beneficial effects:
1, the present invention can be realized efficient search efficiency: for can search for, encipherment scheme is inefficient to be lacked the present invention Fall into, devise a kind of vector piecemeal indicia matched algorithm, respectively to document vector sum query vector carry out piecemeal, generate dimension compared with Small document markup vector sum inquires label vector.The matching of label vector is inquired by document markup vector sum, is quickly filtered Fall a large amount of irrelevant documents, reduces the time for calculating Documents Similarity score and sequence, improve effectiveness of retrieval.
2, the present invention can reduce the index creation time: can search for the dimension of the document vector of encipherment scheme creation usually very Greatly, so time major expenses being mutually multiplied by document vector sum matrix of scheme constructs index.The present invention devise it is a kind of to Sectional encryption algorithm is measured, document vector is segmented, the matrix multiple that each section is greatly reduced with dimension respectively, this makes this side The index construct time of case greatly reduces.
3, the present invention can return exactly to ranking results: the concept of domain weighted scoring is introduced commenting for document for the first time by the present invention In point, different weights is assigned to the keyword in document not same area, solves and does not consider keyword in existing scheme The defect of location information, while realizing semantic ambiguity retrieval.The present invention divides semantic similarity, domain weighted scoring and the degree of correlation Number triplicity, devises a kind of three factor sort methods, and Cloud Server accurately can be sorted and be returned to search result Back to search user.
Detailed description of the invention
Fig. 1 is the system framework figure of the embodiment of the present invention.
Fig. 2 is the creation process of the present embodiment index.
Fig. 3 is the building process of trapdoor in the embodiment of the present invention.
Fig. 4 is in the embodiment of the present invention as number of documents m=10, keyword number n=100, label vector dimension u=10 When, the process of vector piecemeal labeling algorithm.
Fig. 5 is in the embodiment of the present invention as number of documents m=10, and document markup vector matches calculation with inquiry label vector The process of method.
Fig. 6 is the main process of vector sectional encryption algorithm in the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
Fig. 1 is the system framework of the embodiment of the present invention, includes four entities: data owner, authorized user, private clound clothes Business device and publicly-owned Cloud Server.Index and the trapdoor encryption of the present embodiment (can refer to Wong W using safe KNN algorithm for encryption K,Cheung W L,Kao B,et al.Secure kNN computation on encrypted databases// Proceedings of the ACM Sigmod International Conference on Management of Data.New York,USA,2009:139-152.).The basic procedure of the present embodiment is as follows:
(1) Setup: data owner is from plain text document set F=(f1,f2,…,fm) in extracting keywords, obtain key Set of words W=(w1,w2,…,wn)。
(2) KeyGen (n): data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) Invertible matrix { the M of dimension1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n.Then, number The key sk that an encrypted document is generated according to owner, is sent to authorized user for key SK, sk.
(3) it BuildIndex (F, W, SK): Fig. 2 is counted for the creation process of the present embodiment index based on vector space model It is every document f according to owneriGenerate document vector
Figure BDA0001253734210000081
If document fiIn include keyword wj, then
Figure BDA0001253734210000082
OtherwiseThen by document vector
Figure BDA0001253734210000084
Divide u block, obtains document markup vector bi=(bb1,bb2,…,bbu),ei=(bi, sidi).Then keyword w is calculatedjIn document fiIn word frequency weight wft,f, inverse document frequency idftWith domain weight score ZijAfterwards, It will
Figure BDA0001253734210000085
In 1 value be set to (Zij·(wft,f×idft)).Then right
Figure BDA0001253734210000086
Dimension extension is carried out, wherein the position (n+1) is arranged to One random number ε, the position (n+2) are arranged to 1, then
Figure BDA0001253734210000087
It is expressed as
Figure BDA0001253734210000088
Using safe KNN algorithm for encryption document vectorIt obtainsIi=(sidi,Vi).Finally, Data owner is by E=(e1,e2,···,em) it is sent to privately owned Cloud Server, index is gathered
Figure BDA00012537342100000811
It uploads To publicly-owned Cloud Server.
(4) Encrypt (F, sk): data owner is using symmetric encipherment algorithm to collection of document F=(f1,f2,…,fm) It is encrypted, obtains ciphertext set C=(c1,c2,…,cm) and be uploaded to publicly-owned Cloud Server.
(5) Trapdoor (Γ, SK): Fig. 3 for trapdoor in the embodiment of the present invention building process, when authorized user searches for When, η search key Γ=(q is inputted first1,q2,…,qs,…,qη).Then semantic extension is carried out, former word q is calculatedsWith The semantic similarity between word and sequence are expanded, maximally related preceding σ expansion word is chosen as final and expands word, obtain semanteme and open up Open up set Q=(q1,q2,…,qη1,…,δσ) and its corresponding semantic similarity score SC=(sc1,sc2,…,scη, scη+1,…,scη+σ).Query vector is created according to semantic extension collection QIf wj∈ Q then will
Figure BDA0001253734210000092
Otherwise
Figure BDA0001253734210000093
It will
Figure BDA0001253734210000094
It is divided into u block, obtains inquiry label vector
Figure BDA0001253734210000095
Then by vector
Figure BDA0001253734210000096
In 1 be set to corresponding semantic similarity score scj, then willIt is extended to (n+1) dimension and the position (n+1) is set as 1, with r pairs of random number greater than 0
Figure BDA0001253734210000098
Scaling, and it is extended to (n+2) dimension, (n+ 2) position is arranged to (η+σ), thereforeIt is expressed as
Figure BDA00012537342100000910
Using safe KNN algorithm for encryption query vector
Figure BDA00012537342100000911
Obtain trapdoor
Figure BDA00012537342100000912
Finally, authorized user will Inquire label vector
Figure BDA00012537342100000913
Privately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server.
(6)Query
Figure BDA00012537342100000914
Privately owned Cloud Server receives the inquiry label vector of authorized user's transmissionAfterwards, By itself and document markup vector bi=(bb1,bb2,…,bbu) matched, obtaining may be comprising the candidate rope of search key Draw set of identifiers SID=(, sidi,···,sidj,···,sidz,···)。
Privately owned Cloud Server is by SIDIt is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexi Find corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, after the sequence of all scores, K documents are to user before returning.
(7) Decrypt (C, sk): the key sk that authorized user uses data owner to distribute, it is close to a top-k piece for return Text is decrypted, and obtains required plain text document collection.
Preferably, the present embodiment additionally uses vector piecemeal indicia matched algorithm, it is specific as follows:
The wordbook that can search for encipherment scheme creation is generally very big, this make MRSE scheme (Cao N, Wang C, Li M, et al.Privacy-preserving multi-keyword ranked search over encrypted cloud data.IEEE Transactions on Parallel and Distributed Systems,2014,25(1):829- 837) dimension of the document vector created in is usually very big.In retrieval, Cloud Server can not know which is relevant documentation, because And all documents are carried out with the calculating and sequence of similarity score, waste a large amount of time.
The present embodiment carries out piecemeal to document vector sum query vector respectively, generates the lesser document markup vector sum of dimension Inquire label vector.The matching that label vector is inquired by document markup vector sum, quickly filters out a large amount of irrelevant documents, reduces The time for calculating Documents Similarity score and sequence, improve effectiveness of retrieval.
(1) vector piecemeal labeling algorithm:
Document vector
Figure BDA0001253734210000105
It is divided into u block, if some block is all 0, mark value bbs=0, otherwise bbs=1, obtain document markup Vector bi=(bb1,bb2,…,bbu);
All documents are traversed, document markup vector set B=(b is obtained1,b2,...,bm)。
Fig. 4 expression is as number of documents m=10, keyword number n=100, label vector dimension u=10, vector piecemeal mark Remember the process of algorithm.
Keyword number n=100, then document vector
Figure BDA0001253734210000101
Dimension be 100 dimension.It will
Figure BDA0001253734210000102
It is divided into u=10 block, each piece there are 10 This block is labeled as bb if 10 elements are all 0 by elements=0, as long as having an element in 10 elements is 1, this block Label is just 1, obtains document markup vector b1=(1,0,0,0,0,1,0,0,1,0);
10 documents are successively obtained into document markup vector b by this operation1To b10
(2) matching algorithm of label vector and document markup vector is inquired:
Privately owned Cloud Server receives the inquiry label vector of authorized user's transmission
Figure BDA0001253734210000103
Afterwards, it successively uses
Figure BDA0001253734210000104
In each 1 go With eiIn corresponding piece, i.e. the mark value bb of blocksIt whether is 0;
The mark value bb of blocksIf 0, then illustrate corresponding piece of keyword that do not search for of the document, it will be right if it is 1 The index identifier sid answerediIt records, obtaining may be comprising the candidate index set of identifiers SID of search key= (···,sidi,···,sidj,···,sidz,···);
Privately owned Cloud Server is by SIDIt is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexi Find corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, after the sequence of all scores, K documents are to user before returning.
Fig. 5 expression is as number of documents m=10, the process of document markup vector and inquiry label vector matching algorithm.
When the label vector of searching keyword is
Figure BDA0001253734210000111
When, it will
Figure BDA0001253734210000112
In second 1 with 10 documents The value of label vector corresponding position compares, and obtains b3Corresponding document may record its index comprising the keyword of inquiry Identifier sid3
Successively will
Figure BDA0001253734210000113
In 1 candidate index identifier is obtained compared with the value of the label vector corresponding position of 10 documents Set SID=(sid3,sid8,sid5);
Privately owned Cloud Server is by candidate index set of identifiers SIDIt is sent to publicly-owned Cloud Server, carries out similarity score Calculating and sorting operation.
Preferably, the present embodiment additionally uses vector sectional encryption algorithm, it is specific as follows:
MRSE scheme (Cao N, Wang C, Li M, et al.Privacy-preserving multi-keyword ranked search over encrypted cloud data.IEEE Transactions on Parallel and Distributed Systems, 2014,25 (1): 829-837) creation document vector dimension it is usually very big, so scheme Construct time major expenses being mutually multiplied by document vector sum matrix of index.Document vector is segmented by the present embodiment, will be each The matrix multiple that section is greatly reduced with dimension respectively, this greatly reduces the index construct time of this scheme.
Fig. 6 is the main process of vector sectional encryption algorithm, and steps are as follows:
1) the vector S of (n+2) bit and the reversible square of two (n+2) × (n+2) dimension is randomly generated in data owner Battle array { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n.
2) data owner is when constructing index, according to safe KNN algorithm by document vector
Figure BDA0001253734210000121
It splits intoWith
Figure BDA0001253734210000123
Afterwards, It will
Figure BDA0001253734210000124
With
Figure BDA0001253734210000125
H sections are respectively classified into, is obtained
Figure BDA0001253734210000126
Then key SK is used It is encrypted, is obtained
Figure BDA0001253734210000127
Figure BDA0001253734210000128
With corresponding index Ii= (sidi,Vi).Finally, data owner is by E=(e1,e2,···,em) it is sent to privately owned Cloud Server, index is gathered
Figure BDA00012537342100001212
It is uploaded to publicly-owned Cloud Server.
3) similar operation, obtains trapdoor when authorized user is also indexed using building when generating trapdoor
Figure BDA0001253734210000129
Finally, authorized user will inquire label vector
Figure BDA00012537342100001210
Hair To privately owned Cloud Server, by trapdoor TQUpload to publicly-owned Cloud Server.
4) for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:
Preferably, the present embodiment additionally uses three factor sort methods.It is specific as follows:
The present embodiment realizes semantic ambiguity retrieval, when authorized user wishes to search the semantic relevant text of searching keyword Shelves, or when can not input accurate keyword due to various reasons, semantic relevant document can also be matched to and returned to Authorized user meets the search need of user.
The present embodiment for the first time introduces the concept of domain weighted scoring in the scoring of document, in document not same area Keyword assigns different weights, by semantic similarity, domain weighted scoring and relevance score triplicity, proposes one kind three Factor sort method, so that ranking results are more accurate.
(1) semantic similarity
The present embodiment (can refer to Resnik P.Using using the Resnik algorithm based on the information content Information Content to Evaluate Semantic Similarity in a Taxonomy.// Proceedings of the 14th International Joint Conference on Artificial Intelligence.Montreal, Canada, 1995:448-453.) computing semantic similarity, steps are as follows:
1) two concept c are calculated using Resnik algorithm1And c2Similarity, calculation formula are as follows:
sim(c1, c2)=- log p (lso (c1, c2))=IC (lso (c1, c2))
Wherein lso (c1,c2) indicate concept c1And c2The public father node of bottommost layer in WordNet is-a tree.
P (c) is the probability for occurring the noun of concept c in WordNet corpus, and calculation method is as follows:
Figure BDA0001253734210000131
Wherein, N indicates the number of noun in WordNet corpus, and freq (c) indicates the list in corpus comprising concept c Word number, calculation formula are as follows:
Figure BDA0001253734210000132
Wherein, words (c) indicates the set of letters comprising concept c.
2) Resnik algorithm calculates two word w1And w2Similarity, calculation formula are as follows:
Figure BDA0001253734210000133
Wherein, s (w1) and s (w2) respectively indicate word w1And w2The concept set for including, a word may include several A concept.
(2) domain weighted scoring
Domain weighted scoring (can be referred to Manning C D, Raghavan P, Sch ü tze for the first time by the present embodiment H.Introduction to information retrieval.Cambridge:Cambridge University Press, 2008.) concept introduces in the scoring of document, gives a series of document, it is assumed that every document has l domain, corresponding power Weight coefficient is g respectively1,…gl∈ [0,1], they meet:
Figure BDA0001253734210000141
Enable siTo inquire and the matching score (1 and 0 respectively indicate whether match) in i-th of domain of document, then, domain weighting Methods of marking can be with is defined as:
Figure BDA0001253734210000142
(3) relevance score
Based on tf-idf weight calculation method, and with reference to the sublinear scale transformation method of tf (Jin L I can be referred to, Chen X.Efficient multi-user keyword search over encrypted data in cloud Computing.Computing&Informatics, 2013,32 (4): 723-738), steps are as follows:
1) word frequency weight wf is calculatedt,f:
2) inverse document frequency idf is calculatedt:
Figure BDA0001253734210000144
Semantic similarity, domain weighted scoring and relevance score triplicity are designed a kind of three factors row by the present embodiment Sequence method, so that ranking results are more accurate.The calculation formula of publicly-owned Cloud Server shelves score above is as follows:
Figure BDA0001253734210000145
Wherein, Zij、(wft,f×idft) and scjRespectively document fiMiddle keyword wjDomain weighted scoring, relevance score And semantic similarity.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims (6)

1. protecting the quick multi-key word Semantic Ranking searching method of data-privacy in a kind of cloud computing, it is characterised in that: including Data owner, authorized user, privately owned Cloud Server and publicly-owned Cloud Server, specifically includes the following steps:
Step S1: data owner is from plain text document set F=(f1,f2,…,fm) in extracting keywords, obtain keyword set W=(w1,w2,…,wn);
Step S2: the vector S of (n+2) bit and the reversible square of two (n+2) × (n+2) dimension is randomly generated in data owner Battle array { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;Then, data owner is raw At the key sk of an encrypted document, key SK, sk are sent to authorized user;
Step S3: creation index: data owner is every document fiGenerate document vector
Figure FDA0002111106770000011
If document fiIn include keyword wj, then enable
Figure FDA0002111106770000012
Otherwise it enables
Figure FDA0002111106770000013
Then by document vector
Figure FDA0002111106770000014
Divide u block, obtains document markup vector bi=(bb1, bb2,…,bbu) and ei=(bi,sidi), then calculate keyword wjIn document fiIn word frequency weight wft,f, inverse document frequency idftWith domain weight score ZijAfterwards, will
Figure FDA0002111106770000015
In 1 value be set to Zij·(wft,f×idft);Then rightDimension extension is carried out, Wherein the position (n+1) is arranged to a random number ε, and the position (n+2) is arranged to 1, then
Figure FDA0002111106770000017
It is expressed as
Figure FDA0002111106770000018
sidiIt indicates Meaning is index identifier;
Using safe KNN algorithm for encryption document vector
Figure FDA0002111106770000019
It obtainsIi=(sidi,Vi);Wherein,
Figure FDA00021111067700000111
For vector
Figure FDA00021111067700000112
Division vector;Finally, data owner is by E=(e1,e2,…,em) it is sent to privately owned cloud service Device gathers index
Figure FDA00021111067700000113
It is uploaded to publicly-owned Cloud Server;
Step S4: data owner is using symmetric encipherment algorithm to collection of document F=(f1,f2,…,fm) encrypted, it obtains close Collected works close C=(c1,c2,…,cm) and be uploaded to publicly-owned Cloud Server;
Step S5: when authorized user's search, η search key Γ=(q building trapdoor: is inputted first1,q2,…,qs,…, qη), semantic extension is then carried out, former word q is calculatedsAnd the semantic similarity between word and sequence are expanded, before selection is maximally related σ expansion word expands word as final, obtains semantic expansion set Q=(q1,q2,…,qη1,…,δσ) and its corresponding semanteme Similarity score SC=(sc1,sc2,…,scη,scη+1,…,scη+σ);Query vector is created according to semantic extension collection Q
Figure FDA0002111106770000021
If wj ∈ Q, then enable
Figure FDA0002111106770000022
Otherwise
Figure FDA0002111106770000023
It will
Figure FDA0002111106770000024
It is divided into u block, obtains inquiry label vector
Figure FDA0002111106770000025
Then by query vector
Figure FDA0002111106770000026
In 1 It is set to corresponding semantic similarity score scj, then will
Figure FDA0002111106770000027
It is extended to (n+1) dimension and the position (n+1) is set as 1, with greater than 0 R pairs of random numberScaling, and it is extended to (n+2) dimension, the position (n+2) is arranged to η+σ, therefore
Figure FDA0002111106770000029
It is expressed as
Figure FDA00021111067700000210
Using safe KNN algorithm for encryption query vector
Figure FDA00021111067700000211
Obtain trapdoor
Figure FDA00021111067700000212
Finally, authorized user will inquire Label vector
Figure FDA00021111067700000213
Privately owned Cloud Server is issued, by trapdoor TQUpload to publicly-owned Cloud Server;Wherein,
Figure FDA00021111067700000214
Indicate query vector
Figure FDA00021111067700000215
Point A vector after splitting,
Figure FDA00021111067700000216
Indicate query vectorAnother vector after division;
Step S6: privately owned Cloud Server receives the inquiry label vector of authorized user's transmission
Figure FDA00021111067700000218
Afterwards, by itself and document markup vector bi=(bb1,bb2,…,bbu) matched, obtaining may be comprising the candidate index set of identifiers SID of search key= (…,sidi,…,sidj,…,sidz,…);
Privately owned Cloud Server is by SIDIt is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier sid of indexiIt finds Corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document returns after the sequence of all scores Preceding k document is to user;
Step S7: the key sk that authorized user uses data owner to distribute is decrypted the top-k piece ciphertext of return, obtains Obtain required plain text document collection.
2. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1 Method, it is characterised in that: further include respectively to document vector sum query vector carry out piecemeal, generate the lesser document markup of dimension to Amount and inquiry label vector, the matching of label vector is inquired by document markup vector sum, quickly filters out a large amount of irrelevant documents.
3. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 2 Method, it is characterised in that: to document vector carry out piecemeal specifically includes the following steps:
Step S11: document vector
Figure FDA00021111067700000219
It is divided into u block, if some block is all 0, mark value bbs=0, otherwise bbs=1, obtain document Label vector bi=(bb1,bb2,…,bbu);
Step S12: traversing all documents, obtains document markup vector set B=(b1,b2,...,bm)。
4. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 2 Method, it is characterised in that: by document markup vector sum inquire label vector matching specifically includes the following steps:
Step S21: privately owned Cloud Server receives the inquiry label vector of authorized user's transmission
Figure FDA0002111106770000031
Afterwards, it successively uses
Figure FDA0002111106770000032
In each 1 Remove matching eiIn corresponding piece, i.e. the mark value bb of blocksIt whether is 0;
Step S22: the mark value bb of blocksIf 0, then illustrate the keyword that the corresponding document of the block is not searched for, if it is 1 By corresponding index identifier sidiIt records, obtaining may be comprising the candidate index set of identifiers SID of search key =(..., sidi,…,sidj,…,sidz,…);
Step S23: privately owned Cloud Server is by SIDIt is uploaded to publicly-owned Cloud Server, publicly-owned Cloud Server is according to the identifier of index sidiFind corresponding Security Index Ii, by corresponding ViWith trapdoor TQThe similarity score for calculating document, all scores are sorted Afterwards, k documents are to user before returning.
5. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1 Method, it is characterised in that: further include being segmented document vector, the matrix multiple that each section is reduced with dimension respectively specifically includes Following steps:
Step S31: data owner be randomly generated (n+2) bit vector S and two (n+2) × (n+2) dimension it is reversible Matrix { M1,M2, key SK is by four-tuple { S, M1,M2, u } and composition, u is a positive integer and u | n;
Step S32: data owner is when constructing index, according to safe KNN algorithm by document vector
Figure FDA0002111106770000033
It splits into
Figure FDA0002111106770000034
With
Figure FDA0002111106770000035
Afterwards, willWith
Figure FDA0002111106770000037
H sections are respectively classified into, is obtained
Figure FDA0002111106770000038
Then using close Key SK is encrypted, and is obtained
Figure FDA0002111106770000039
With corresponding index Ii =(sidi,Vi);Finally, data owner is by E=(e1,e2,…,em) it is sent to privately owned Cloud Server, index is gathered
Figure FDA00021111067700000310
It is uploaded to publicly-owned Cloud Server;Wherein, matrix M1h,M2hIndicate dimension be ((n+2)/h) × ((n+2)/ H) invertible matrix;
Step S33: authorized user also uses operation similar when constructing index in step S32 when generating trapdoor, obtains trapdoor
Figure FDA00021111067700000311
Finally, authorized user will inquire label vector
Figure FDA00021111067700000312
Hair To privately owned Cloud Server, by trapdoor TQUpload to publicly-owned Cloud Server;
Step S34: for publicly-owned Cloud Server when calculating relevance score, the formula of use is as follows:
Figure FDA0002111106770000041
6. protecting the quick multi-key word Semantic Ranking searcher of data-privacy in a kind of cloud computing according to claim 1 Method, it is characterised in that: the sequence uses three factor sort methods, assigns to the keyword in document not same area different Weight, by semantic similarity, domain weighted scoring and relevance score triplicity, specifically includes the following steps:
Step S41: computing semantic similarity: the Resnik algorithm based on the information content is used;
Step S42: computational domain weighted scoring: given a series of document, it is assumed that every document has l domain, corresponding weight system Number is g respectively1,…gl∈ [0,1], they meet:
Figure FDA0002111106770000042
Enable siTo inquire and the matching score in i-th of domain of document, wherein whether 1 and 0 respectively indicate and match, then, domain weighting is commented Point is defined as:
Figure FDA0002111106770000043
Step S43: relevance score is calculated: based on the sublinear scale transformation method of tf-idf weight calculation method and tf Calculate relevance score.
CN201710181664.6A 2017-03-24 2017-03-24 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing Active CN106951411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710181664.6A CN106951411B (en) 2017-03-24 2017-03-24 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710181664.6A CN106951411B (en) 2017-03-24 2017-03-24 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing

Publications (2)

Publication Number Publication Date
CN106951411A CN106951411A (en) 2017-07-14
CN106951411B true CN106951411B (en) 2019-10-15

Family

ID=59472824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710181664.6A Active CN106951411B (en) 2017-03-24 2017-03-24 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing

Country Status (1)

Country Link
CN (1) CN106951411B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107734054A (en) * 2017-11-06 2018-02-23 福州大学 Encryption data searching system in safe cloud storage
CN108390855A (en) * 2018-01-11 2018-08-10 中国人民解放军战略支援部队信息工程大学 A kind of attribute base keyword search encryption system and method towards cloud storage
CN108549701A (en) * 2018-04-17 2018-09-18 上海海事大学 Cloud environment encrypts outsourcing data semantic extended search method and system
CN108647529A (en) * 2018-05-09 2018-10-12 上海海事大学 A kind of semantic-based multi-key word sorted search intimacy protection system and method
CN110555198B (en) * 2018-05-31 2023-05-23 北京百度网讯科技有限公司 Method, apparatus, device and computer readable storage medium for generating articles
CN108959567B (en) * 2018-07-04 2021-05-04 武汉大学 Safe retrieval method suitable for large-scale images in cloud environment
CN109165520B (en) * 2018-07-16 2020-08-11 哈尔滨工业大学(深圳) Data encryption method and device and data encryption retrieval system
CN109063509A (en) * 2018-08-07 2018-12-21 上海海事大学 It is a kind of that encryption method can search for based on keywords semantics sequence
CN110858251B (en) * 2018-08-22 2020-07-21 阿里巴巴集团控股有限公司 Data query method and device
CN109271485B (en) * 2018-09-19 2022-03-08 南京邮电大学 Cloud environment encrypted document sequencing and searching method supporting semantics
CN109376288B (en) * 2018-09-28 2021-04-23 邦道科技有限公司 Cloud computing platform for realizing semantic search and balancing method thereof
CN109308264B (en) * 2018-10-22 2021-11-16 北京天融信网络安全技术有限公司 Method for evaluating data desensitization effect, corresponding device and storage medium
CN109471964B (en) * 2018-10-23 2021-10-01 三亚哈尔滨工程大学南海创新发展基地 Synonym set-based fuzzy multi-keyword searchable encryption method
CN109815730B (en) * 2018-12-29 2020-11-20 中国科学院软件研究所 Searchable encryption method and system supporting skyline query
CN109766314A (en) * 2019-01-07 2019-05-17 西安电子科技大学 Ciphertext data multi-key word searching method based on probability trapdoor
CN109960757A (en) * 2019-02-27 2019-07-02 北京搜狗科技发展有限公司 Web search method and device
CN110175328B (en) * 2019-05-16 2023-02-17 镇江市高等专科学校 Method and system for measuring document similarity based on keyword sequence structure
CN110795544B (en) * 2019-09-10 2023-07-21 腾讯科技(深圳)有限公司 Content searching method, device, equipment and storage medium
CN110727951B (en) * 2019-10-14 2021-08-27 桂林电子科技大学 Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function
CN111966778B (en) * 2020-07-08 2022-07-29 南京邮电大学 Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index
CN112134872B (en) * 2020-09-16 2022-07-26 江苏省未来网络创新研究院 Network system with multi-application-layer cloud computing function
CN112966018B (en) * 2021-03-03 2022-04-01 北京邮电大学 Fuzzy retrieval method based on mixed cloud
CN113139106B (en) * 2021-05-07 2022-03-15 杭州世平信息科技有限公司 Event auditing method and device for security check
CN113987593B (en) * 2021-12-28 2022-03-15 北京妙医佳健康科技集团有限公司 Data processing method
CN115630154B (en) * 2022-12-19 2023-05-05 竞速信息技术(廊坊)有限公司 Big data environment-oriented dynamic abstract information construction method and system
CN117278216B (en) * 2023-11-23 2024-02-13 三亚学院 Encryption system based on cloud computing virtualization and network storage files

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104023051A (en) * 2014-05-22 2014-09-03 西安理工大学 Multi-user multi-keyword searchable encryption method in cloud storage
CN104765848A (en) * 2015-04-17 2015-07-08 中国人民解放军空军航空大学 Symmetrical searchable encryption method for supporting result high-efficiency sequencing in hybrid cloud storage
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN106407447A (en) * 2016-09-30 2017-02-15 福州大学 Simhash-based fuzzy sequencing searching method for encrypted cloud data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104023051A (en) * 2014-05-22 2014-09-03 西安理工大学 Multi-user multi-keyword searchable encryption method in cloud storage
CN104765848A (en) * 2015-04-17 2015-07-08 中国人民解放军空军航空大学 Symmetrical searchable encryption method for supporting result high-efficiency sequencing in hybrid cloud storage
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN106407447A (en) * 2016-09-30 2017-02-15 福州大学 Simhash-based fuzzy sequencing searching method for encrypted cloud data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Privacy-preserving multi-keyword ranked search over encrypted cloud data;Ning Cao et al.;《2011 Proceedings IEEE INFOCOM》;20110415;829-837 *
Two-Step-Ranking Secure Multi-Keyword Search over Encrypted Cloud Data;Jun Xu et al.;《2012 International Conference on Cloud and Service Computing》;20121124;124-130 *

Also Published As

Publication number Publication date
CN106951411A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
CN106951411B (en) The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
CN104765848B (en) What support result efficiently sorted in mixing cloud storage symmetrically can search for encryption method
CN106997384B (en) Semantic fuzzy searchable encryption method capable of verifying sequencing
Chen et al. Collabseer: a search engine for collaboration discovery
Zhang et al. SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems
CN108647529A (en) A kind of semantic-based multi-key word sorted search intimacy protection system and method
CN109063509A (en) It is a kind of that encryption method can search for based on keywords semantics sequence
Zhong et al. Efficient dynamic multi-keyword fuzzy search over encrypted cloud data
Xia et al. An efficient and privacy-preserving semantic multi-keyword ranked search over encrypted cloud data
CN108171071A (en) A kind of multiple key towards cloud computing can sort cipher text retrieval method
Liu et al. FSSE: An effective fuzzy semantic searchable encryption scheme over encrypted cloud data
CN109255244A (en) Data ciphering method and its device, data encryption searching system
CN108549701A (en) Cloud environment encrypts outsourcing data semantic extended search method and system
Mittal et al. Privacy preserving synonym based fuzzy multi-keyword ranked search over encrypted cloud data
Van Zwol et al. Ranking entity facets based on user click feedback
Hu et al. Topical authority propagation on microblogs
CN108768651A (en) That supports the inquiry of multidimensional interval range can search for public key encryption method
Yang et al. A keyword-based scholar recommendation framework for biomedical literature
Er et al. Set of t-uples expansion by example
Raghavendra et al. Split keyword fuzzy and synonym search over encrypted cloud data
Xia et al. A multi-keyword ranked search over encrypted cloud data supporting semantic extension
Onal et al. Utilizing word embeddings for result diversification in tweet search
Hassanpour et al. An adaptive meta-search engine considering the user’s field of interest
Ma et al. Joining user profiles across online social networks: From the perspective of an adversary
CN106777233A (en) The personalized search system of the support secret protection based on cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant