CN106934063A

CN106934063A - A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application

Info

Publication number: CN106934063A
Application number: CN201710199651.1A
Authority: CN
Inventors: 拱长青; 肖芸; 林娜; 郭振洲; 李席广; 赵亮; 孟庆杰
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Aerospace University
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2017-07-07
Anticipated expiration: 2037-03-30
Also published as: CN106934063B

Abstract

The present invention provides a kind of homomorphic cryptography cipher text retrieval method towards cloud computing application, and pretreatment stage is put into high in the clouds to be implemented, and the TF IDF weight vectors of document are all to be calculated beyond the clouds；High in the clouds is based on the document after homomorphic cryptography, can carry out isomorphism operation and calculating to ciphertext, constructs every TF IDF weight vectors of document, and exist with clear-text way；When searching ciphertext is carried out, search terms are encrypted and upload to high in the clouds by client, high in the clouds to be operated also can obtain corresponding TF IDF weight vectors in plain text to searching ciphertext item, and then calculated the similarity of search terms and each document in document sets and obtained plaintext ranking results；Contrasted with pretreatment stage is put into the scheme that client processed, alleviated the calculating pressure of client, the powerful computing capability of cloud computing and storage capacity can be made full use of to operate data, improved retrieval operating efficiency.

Description

A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application

Technical field:

The invention belongs to cloud computing applied technical field, more particularly to a kind of homomorphic cryptography ciphertext towards cloud computing application Search method.

Background technology:

Cloud computing is a kind of service mode of innovation, and it can make user obtain almost limitless meter at any time by internet Calculation ability and rich and varied information service, are the evolution of Distributed Calculation, parallel computation and grid computing.Cloud storage falls within Cloud computing category, using increasingly popularizing.With developing rapidly for cloud computing, a large amount of sensitive informations are concentrated to high in the clouds.In order to keep away Exempt from the private data leakage of client, stored beyond the clouds after must private data be encrypted.When being stored in cloud server end Ciphertext data when having developed into certain scale, the effective retrieval to ciphertext data will be a problem demanding prompt solution.

In existing homomorphic cryptography searching ciphertext scheme, the extensive works such as the pretreatment of document sets are typically positioned over client Carry out.There is within 2014 document that TF-IDF vector retrieval modelings are applied in homomorphic cryptography searching ciphertext scheme, the base of its scheme This thinking is to select corresponding keyword to document sets in client first, calculates keyword to each document in document sets TF-IDF weight, builds each TF-IDF weight vector of document, obtains the weight vectors collection of document sets.Afterwards to weight to Vector in quantity set is encrypted and is uploaded to high in the clouds and preserved respectively.When retrieval, calculated in client examine first The weight vectors of rope, and high in the clouds is uploaded to after being encrypted to it, the weight vectors ciphertext and document sets of search terms are used afterwards Weight vectors collection ciphertext, calculates the similarity ciphertext of search terms and each document in document sets and returns to user, Yong Hu Client obtains plaintext ranking results to sequence ciphertext decryption.

Above-mentioned searching ciphertext scheme, increased the calculating pressure of client, and not make full use of the computing capability in high in the clouds.

Homomorphism cipher text retrieval method towards cloud computing proposed by the present invention, the extensive works such as the pretreatment of document sets are moved Carried out to high in the clouds, pretreatment is implemented in high in the clouds to the document after encryption, the computing capability in high in the clouds can be made full use of, and then realize Efficient homomorphism searching ciphertext.

The content of the invention

The present invention is not in order in overcoming searching ciphertext scheme, make full use of the computing capability in high in the clouds, so as to cause client The shortcomings of calculating pressure at end is very big, it is proposed that a kind of computing capability for making full use of high in the clouds, i.e., towards cloud computing application Homomorphic cryptography cipher text retrieval method, the method is compared with existing method, it will be apparent that improves retrieval operating efficiency, alleviates client The calculating pressure at end.

The technical solution adopted for the present invention to solve the technical problems is：A kind of homomorphic cryptography towards cloud computing application is close Literary search method, it is characterised in that the method is comprised the following steps：

Step 1：Document sets are encrypted in client and is uploaded to high in the clouds

In client, user is encrypted using the full homomorphic encryption algorithm of integer to every document in document sets, then It is uploaded to high in the clouds preservation.

(1) AES

KeyGen：The Safety Big Prime Number of a selection one P for randomly generating is used as key p, wherein p ∈ [2^P-1,2^P]；

Encrypt(m)：A Q Safety Big Prime Number q is randomly selected, wherein, q ∈ [2^Q-1,2^Q], P>Q>Clear packets are long Degree, random number r is randomly selected in predetermined time interval, and packet M=m is carried out to M₁m₂m₃...m_t(m_iLength be L), ciphertext c is calculated_i=m_i+ 2pq+pqr, you can obtain ciphertext；

Decrypt(c)：Calculate m_i=c_iMod p, obtain clear-text message M=m₁m₂m₃...m_t, that is, it is bright after being decrypted Text；

Isomorphism is analyzed：It is provided with two plaintext m₁, m₂, its corresponding ciphertext is respectively c₁, c₂, then

c₁=m₁+2pq+pqr₁

c₂=m₂+2pq+pqr₂

Additive homomorphism is analyzed：

c₁+c₂=(m₁+m₂)+4pq+pq(r₁+r₂), because (c₁+c₂) mod p=m₁+m₂, therefore the algorithm meets addition Homomorphism；

Multiplicative homomorphic is analyzed：

c₁*c₂=(m₁*m₂)+2m₁pq+m₁pqr₂+2m₂pq+4p₂q₂+2p²q²r₂+m₂pqr₁+2p²q²r₁+p²q2r₁r₂= (m₁*m₂)+2pq(m₁+m₂)+pq(m₁r₂+m₂r₁)+2p²q²(r₂+r₁)+4p²q²+p²q²r₁r₂,

Because (c₁*c₂) mod p=m₁*m₂, therefore the algorithm meets multiplicative homomorphic.

(2) keyword m_indexMatch query algorithm

A, using above-mentioned AES, keyword is encrypted in client, obtain corresponding ciphertext c_index=m_index+2pq + pqr, and upload to high in the clouds；

B, high in the clouds receive ciphertext keyword c_indexAfterwards, inquired about using ciphertext matching algorithm, wherein N=pq, matched Formula Retrieval=(c_i-c_index) mod N=((m_i-m_index)+pq(r₁-r₂))mod N；

If Retrieval=0, m_i-m_index=0, that is, work as m_i=m_indexWhen, pq (r₁-r₂) mod N=0, so Retrieval=0, only N need to be uploaded during retrieval to high in the clouds, and wherein N=pq, high in the clouds cannot release the key p of user, but server Ciphertext that can be directly to user carries out match query operation.

Step 2：Pretreatment operation is carried out to ciphertext document sets beyond the clouds

High in the clouds firstly generates the copy of ciphertext document sets, and pretreatment operation is all that copy is carried out, and pretreatment is divided into three Stage, i.e. participle filter, and set up inverted index, generate the weight vectors collection of document sets.

(1) participle filtering

English typically splits entry by space, and Chinese character then can not in this way, simplest mode It is exactly individual Chinese character as an entry.

Need to reject to the retrieval of document sets after participle without keyword in all senses, i.e. stop words.To each pass Keyword, sees that it whether there is in word list is disabled, and is filtered out if there is then it is deleted from document；Due to original document Collection has been encrypted in client, and lookup ciphertext stop words is carried out in the case of encryption can use linear matched Method searched；The method for carrying out matching inquiry is as follows：

In step 1, client has used formula (1) that high in the clouds is uploaded to after stop words is encrypted, and high in the clouds preserves ciphertext , be encrypted for each keyword in document sets using formula (2) and be uploaded to high in the clouds preservation by stop words；

Assuming that the ciphertext keyword in the document of high in the clouds is t_index, storage ciphertext stop words beyond the clouds is t_i, then have：

t_i=m_i+2pq+pqr₂Formula (1)

t_index=m_index+2pq+pqr₁Formula (2)

Filtration treatment is all carried out to each keyword in the ciphertext document of every, high in the clouds by formula (3) beyond the clouds, it is looked into The matching formula of inquiry is：

This matching operation is carried out beyond the clouds, if result is Retrieval, if Retrieval=0, t_i-t_index= 0, then it represents that ciphertext stop words has been inquired in ciphertext document, when using above-mentioned formula, only need to upload N, wherein N to high in the clouds =pq, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable；

After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets are generated, it is therein every One document d_jRepresent.

(2) inverted index is set up

The foundation of inverted index is placed on server end by the present invention to be carried out, and keyword k is counted under the conditions of encryption_iWord Frequency TF_iAnd inverse document frequency IDF_i, ultimately produce the inverted index table of ciphertext document sets；

Assuming that the ciphertext keyword for needing statistics beyond the clouds is c_kindex, storage ciphertext keyword beyond the clouds is c_i, then have

c_kindex=m_kindex+2pq+pqr₁Formula (4)

c_i=m_i+2pq+pqr₂Formula (5)

Order,

This matching operation is carried out beyond the clouds, in formula (6), if result is Retrieval, and if Retrieval=0, Then c_i-c_kindex=0, then it represents that be have found in this ciphertext document needs the ciphertext keyword c of statistics_kindex, that is, match and look into Ask to one, word frequency f_ijNumeration count increases by 1, the number of files n of appearance_iRepresent the ciphertext keyword ckindex of the statistics in text Shelves concentrate the number of documents for occurring, and eventually pass the f that statistics obtains the ciphertext keyword_ijAnd n_i；When using above-mentioned formula, only N, wherein N=pq need to be uploaded to high in the clouds, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable.

Data to being obtained according to statistics are recorded, as shown in table 1；

The inverted index table of table 1

As shown in table 1, document name and keyword are still beyond the clouds encrypted state, however statistics obtain document code, The frequency of occurrences and the number of files for occurring are beyond the clouds plaintext version.

(3) file vector collection is generated

User when keyword retrieval is carried out, document sets to be retrieved in fact it is corresponding be document sets weight vectors collection, After inverted index table updates completion, according to the keyword word frequency and inverse document frequency that count, generation document sets are corresponding Weight vectors collection；Weight calculation framework used in the present invention is TF-IDF frameworks, and what wherein TF was represented is the word frequency of keyword, What IDF was represented is the inverse document frequency of keyword；Shown in the value such as formula (7) of the calculating weight vectors that the present invention is used：

Wherein w_ijRepresent keyword k_iTo document d_jTF-IDF weight, f_ijRepresent keyword k_iAppear in document d_jIn Frequency, that is, the number of times for occurring, N represents the total quantity of document in document sets, n_iKeyword k is included in expression document sets_iNumber of files Amount, N/n_iRepresent keyword k_iInverse document frequency；

If ciphertext document d_jThere is t ciphertext keyword k_i, and it is separate between them, define ciphertext document d_jFor t ties up empty Between on vector, ciphertext document d is tried to achieve according to above-mentioned formula_jIn each ciphertext keyword k_iWeighted value, and then generate ciphertext text The plaintext weight vectors of shelves, wherein each ciphertext keyword k_iThe weighted value tried to achieve is all in plain text；Then d_jValue such as formula (8) institute Show：

Step 3：Retrieving

The encryption of plaintext search terms is uploaded to high in the clouds by client, and searching ciphertext is done as ciphertext document sets beyond the clouds Pretreatment work, searching ciphertext is carried out into participle filter operation, this process to ciphertext document as carried out participle beyond the clouds Ciphertext stop words is filtered out, the inverted index of searching ciphertext is resettled, the plaintext weight vectors of searching ciphertext are ultimately produced, Each of which is all plaintext weighted value of the corresponding ciphertext keyword in searching ciphertext, and the calculating of the plaintext weighted value is public Shown in formula such as formula (9)：

If searching ciphertext q has t ciphertext keyword k_i, and it is separate between them, tried to achieve according to above-mentioned formula close Each ciphertext keyword k in literary search terms q_iPlaintext weighted value, and then generate the plaintext weight vectors of searching ciphertext, wherein Each ciphertext keyword k_iThe weighted value tried to achieve is all in plain text；Then shown in the value of q such as formula (10)：

Retrieving be in fact to the plaintext weight vectors of searching ciphertext with ciphertext document sets each ciphertext document it is bright Literary weight vectors carry out Similarity Measure, and the computing formula of similarity is as follows：

If the plaintext weight vectors of ciphertext document areThe plaintext weight vectors of searching ciphertext areAccording to vectorial empty Between model definition, if both similarities areThen have

Wherein, | d_j| the plaintext vector mould and searching ciphertext of the ciphertext document being respectively calculated beyond the clouds with | q | Plaintext vector mould, be calculated ciphertext keyword with every Similarity Measure knot of ciphertext document using formula (11) beyond the clouds Really, and to result of calculation it is ranked up by similarity size, similarity is high, i.e., document high with inquiry degree of correlation comes Above, so as to be conducive to user to search, sequencing of similarity result is returned to client and is checked for user by last high in the clouds.

Step 4：Profile download

After retrieving desired document according to previous step, the document selected can be downloaded, first in client Input document name to be downloaded, is uploaded to high in the clouds and preserves, using formula using formula (12) by document name encryption to be downloaded (13) every document name of document sets is encrypted and is uploaded to high in the clouds preservation；Assuming that storage file to be downloaded beyond the clouds is close The entitled c of file_index, the entitled c of cryptograph files of high in the clouds document_i, then have

c_index=m_index+2pq+pqr₁Formula (12)

c_i=m_i+2pq+pqr₂Formula (13)

Equally, because homomorphic cryptography technology is different to the ciphertext obtained by same data encryption twice, will need to treat The ciphertext document name of download is matched with each ciphertext document name in document sets by formula (14), if matching result is Retrieval, then

This matching operation is carried out beyond the clouds, and the document to be downloaded can be beyond the clouds retrieved by formula (14), if Retrieval=0, then c_i-c_index=0, then it represents that the ciphertext document has been inquired in ciphertext document, above-mentioned formula is being used When, only N, wherein N=pq need to be uploaded to high in the clouds, so key p cannot be drawn from N.

Beneficial effects of the present invention：

In cloud computing searching ciphertext application field, the homomorphic cryptography cipher text retrieval method based on TF-IDF vector retrieval modelings More advantageous, existing scheme is mostly that the extensive works such as the pretreatment of document sets are positioned over into client to be processed, at client Substantial amounts of work is managed, high in the clouds only processes a small amount of work, disadvantage of this is that the pressure that increased client, not using high in the clouds Powerful computing capability itself.

Pretreatment stage is put into high in the clouds to be implemented by the present invention, is processed with pretreatment stage is put into client Scheme contrasted, so treatment benefit be：The calculating pressure of client is alleviated, cloud computing can be made full use of powerful Computing capability and storage capacity data are operated, improve retrieval operating efficiency.

Innovative point of the invention may be summarized as follows：

(1) extensive works such as the pretreatment of document sets are tried one's best and moves to high in the clouds and carry out, the TF-IDF weight vector of document All it is to be calculated beyond the clouds；

(2) high in the clouds is based on the document after homomorphic cryptography, can carry out isomorphism operation and calculating to ciphertext, constructs every The TF-IDF weight vector of document, and exist with clear-text way；

(3) when searching ciphertext is carried out, search terms are encrypted and upload to high in the clouds by client, and high in the clouds is entered to searching ciphertext Row operation also can obtain corresponding TF-IDF weight vector in plain text, and then calculate the phase of search terms and each document in document sets Seemingly spend and obtain plaintext ranking results.

Brief description of the drawings

Fig. 1 is the flow chart of the general steps of method proposed by the invention；

Fig. 2 is step 1,2 pairs of file encryptions uploads and the specific flow chart of pretreatment operation in the present invention；

Fig. 3 is step 1,2,3 operation particular flow sheet in the present invention；

Fig. 4 is the particular flow sheet of step 4 profile download in the present invention.

Specific embodiment：

The present invention proposes a kind of homomorphism cipher text retrieval method towards cloud computing, by extensive works such as the pretreatments of document sets Move on to high in the clouds to carry out, pretreatment is implemented in high in the clouds to the document after encryption, can make full use of the computing capability in high in the clouds, Jin Ershi Existing efficient homomorphism searching ciphertext.Described in detail with reference to drawings and Examples as follows.

Method flow proposed by the present invention is as shown in figure 1, comprise the following steps：

(1) AES

One key p=131 of random selection；

Q and random number r are randomly selected in predetermined time interval.

Four documents of selection randomly select 7 keywords as document sets in every document, document content is specific as follows：

Test1={ I happiness on the others my hands }

Test2={ I was father wind happiness I is }

Test3={ my father is a player player birthday }

Test4={ happiness father others was of my others }

This four documents are encrypted respectively, the ciphertext document for obtaining is specific as follows：

Test1={ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca 01142 196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510d ec9ba2a7}

Test2={ 56e1d11107f444633216d92f822262c9ad76c9b1444676c9bbfb50af b4fc6 981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d}

Test3={ 4d9586fb6512cf29ba295980a7ccc6ed2fb1e59902540d3e12fb70f2 1049e c114217285a811d70d123a3f13327b285b7e8b2d8ba081173c7954e3af8bb70f37903e94f0211 a56534c3b320756e3511d70d}

Test4={ f8971173af126bedb3f5e53c9a123a48d61543af8c412d2f895df639 4a7bc 4d95344467dc4a759fe6444764d95363abdc4a7285ba8ebc11300c7c6ff2b759f51b3c692daaa 7ebd4464761959fdfa15811300d872d02}

The document sets after encryption are uploaded into high in the clouds with ciphertext form to be stored.

(1) participle filtering

English splits entry generally by space, needs to reject to the retrieval of document sets after participle without in all senses Keyword, we are referred to as stop words.Used here as StandardAnalyzer analyzers, and the structure without parameter is used Device StandardAnalyzer () is made, in this constructor without parameter of StandardAnalyzer classes, one is specified Individual filtering characters array STOP_WORDS, it is necessary first to this STOP_WORDS is carried out into homomorphic encryption algorithm and is encrypted upload Preserved with standby to high in the clouds.

Stop words present in the present embodiment be { on, the, was, is, a }, due to original document collection in client It is encrypted, lookup ciphertext stop words is carried out in the case of encryption to be needed to be looked into using the method for linear matched Look for, the method for carrying out matching inquiry is as follows：

Assuming that the ciphertext keyword in the document of high in the clouds is t_index, storage ciphertext stop words beyond the clouds is t_i, according to formula (3) ciphertext stop words matching inquiry can be carried out to the ciphertext keyword of every document of confidential document collection, if Retrieval=0, Then t_i-t_index=0, then t_i=t_index, then this keyword t is proved_indexIt is the stop words for inquiring, and is deleted filtering Fall.

After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets are generated, it is therein every One document d_jRepresent；

(2) inverted index is set up

Assuming that the ciphertext keyword for needing statistics beyond the clouds is c_wind={ 107b965d18369814bd43b }, storage exists High in the clouds ciphertext document test1={ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca 0 1142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe 510dec9ba2a7 } in each ciphertext keyword tag be c_i。

Then calculated according to formula (6), if Retrieval=0, c_i-c_wind=0, then c_i=c_wind, then it represents that close Be have found in document test1 needs the ciphertext keyword c of statistics_wind, i.e. matching inquiry is to one, word frequency f_ijNumeration count It is designated as 1, the number of files n of appearance_iRepresent the ciphertext keyword c of the statistics_windIn document sets occur number of documents, by with Four matchings of ciphertext document, finally count and obtain ciphertext keyword c_windN_i=1；

The data that statistics is obtained are as shown in table 2；

The inverted index table of table 2

As shown in table 2, document name and keyword are still beyond the clouds encrypted state, however statistics obtain document code, The frequency of occurrences and the number of files for occurring are beyond the clouds plaintext version.

(3) file vector collection is generated

Obtained according to formula (7)：

d₁={ i=1, happiness=0, others=1, my=0, hands=2, father=0, wind=0, pla Yer=0, birthday=0 }

d₂={ i=2, happiness=0, others=0, my=0, hands=0, father=0.6, wind=2, p Layer=0, birthday=0 }

d₃={ i=0, happiness=0, others=0, my=0, hands=0, father=0.3, wind=0, p Layer=4, birthday=2 }

d₄={ i=0, happiness=0, others=2, my=0, hands=0, father=0.3, wind=0, p Layer=0, birthday=0 }

It is according to the weight vectors collection that formula (8) finally gives document sets：

Step 3：Retrieving

For example, inquiry " father " search terms, can be according to step 1,2 obtain its weight vectors isThe similarity of search terms and the weight vectors of document sets is calculated finally according to formula (11), is pressed Order from big to small returns to client, and the result for obtaining is：

Step 4：Profile download

Understood according to previous step retrieval, the similarity highest of document 2 and search terms, such as selection document 2 is downloaded, and is implemented Flow is as follows：

" test2 " is input into document to be downloaded, and its search file name is carried out into homomorphic cryptography in client and be uploaded to cloud End, this matching operation is carried out beyond the clouds.

The ciphertext document name collection for preserving beyond the clouds is combined into { d2fc02540950b013217a7893ebcd83e12110184c C700e27ad4d95f25409d2fbf60331130099c9ad7d2fb1aaa784d95f1 1a525 }, document name to be downloaded Ciphertext is { 18d8c3e121129d93222756fb1e }, then can beyond the clouds retrieve the text to be downloaded by formula (14) Shelves, if Retrieval=0, c_i-c_index=0, then it represents that the ciphertext document has been inquired in ciphertext document, then user Can confirm to be downloaded client.

Step 4：Document decryption after download

Client receives the ciphertext document test2={ 56e1d11107f444633216d92f822262c9ad7 of download 6c9b1444676c9bb#fb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d 95edf6214d9542b75d }, then it is decrypted, decipherment algorithm is：

Decrypt(c)：Calculate m_i=c_iMod p, obtain clear-text message M=m₁m₂m₃...m_t, that is, it is bright after being decrypted Document test2={ I was father wind happiness I is } is that is, correct to download and decrypting ciphertext document.

Claims

1. a kind of homomorphic cryptography cipher text retrieval method towards cloud computing application, it is characterised in that the method is comprised the following steps：

In client, user is encrypted to every document in document sets using the full homomorphic encryption algorithm of integer, then uploaded Preserved to high in the clouds.

(1) AES

Encrypt(m)：A Q Safety Big Prime Number q is randomly selected, wherein, q ∈ [2^Q-1,2^Q], P>Q>Clear packets length, with Machine produces a random number r, and packet M=m is carried out to M₁m₂m₃...m_t(m_iLength be L), calculate ciphertext c_i=m_i+2pq+ Pqr, you can obtain ciphertext；

Decrypt(c)：Calculate m_i=c_iMod p, obtain clear-text message M=m₁m₂m₃...M_t, that is, the plaintext after being decrypted；

c₁=m₁+2pq+pqr₁

c₂=m₂+2pq+pqr₂

Additive homomorphism is analyzed：

c₁+c₂=(m₁+m₂)+4pq+pq(r₁+r₂), because (c₁+c₂) mod p=m₁+m₂, therefore the algorithm meets additive homomorphism；

Multiplicative homomorphic is analyzed：

c₁*c₂=(m₁*m₂)+2m₁pq+m₁pqr₂+2m₂pq+4p₂q₂+2p²q²r₂+m₂pqr₁+2p²q²r₁+p²q2r₁r₂=(m₁*m₂)+ 2pq(m₁+m₂)+pq(m₁r₂+m₂r₁)+2p²q²(r₂+r₁)+4p²q²+p²q²r₁r₂,

(2) keyword m_indexMatch query algorithm

A, using above-mentioned AES, keyword is encrypted in client, obtain corresponding ciphertext c_index=m_index+ 2pq+pqr, And upload to high in the clouds；

B, high in the clouds receive ciphertext keyword c_indexAfterwards, inquired about using ciphertext matching algorithm, wherein N=pq, matching formula Retrieval=(c_i-c_index) mod N=((m_i-m_index)+pq(r₁-r₂))mod N；

High in the clouds firstly generates the copy of ciphertext document sets, and pretreatment operation is all that copy is carried out, and pretreatment is divided into three phases, That is participle filtering, sets up inverted index, generates the weight vectors collection of document sets；

(1) participle filtering

Selection keyword, search file collection；

To each keyword, see that it whether there is in word list is disabled, if there is it is then deleted into filtering from document Fall；Because original document collection has been encrypted in client, carry out searching ciphertext keyword in the case of encryption Can be searched using the method for linear matched；The method for carrying out matching inquiry is as follows：

In step 1, client has used formula (1) that high in the clouds is uploaded to after stop words is encrypted, and high in the clouds preserves ciphertext and disables , be encrypted for each keyword in document sets using formula (2) and be uploaded to high in the clouds preservation by word；

t_i=m_i+2pq+pqr₂Formula (1)

t_index=m_index+2pq+pqr₁Formula (2)

Filtration treatment is all carried out to each keyword in the ciphertext document of every, high in the clouds by formula (3) beyond the clouds, its inquiry Matching formula is：

This matching operation is carried out beyond the clouds, if result is Retrieval, if Retrieval=0, t_i-t_index=0, then Expression has inquired ciphertext stop words in ciphertext document, when using above-mentioned formula, only need to upload N, wherein N=to high in the clouds Pq, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable；

After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets, each of which are generated Document is represented with dj.

(2) inverted index is set up

The foundation of inverted index is placed on server end by the present invention to be carried out, and keyword k is counted under the conditions of encryption_iWord frequency TF_i And inverse document frequency IDF_i, ultimately produce the inverted index table of ciphertext document sets；

c_kindex=m_kindex+2pq+pqr₁Formula (4)

c_i=m_i+2pq+pqr₂Formula (5)

Order,

This matching operation is carried out beyond the clouds, in formula (6), if result is Retrieval, if Retrieval=0, c_i- c_kindex=0, then it represents that be have found in this ciphertext document needs the ciphertext keyword c of statistics_kindex, i.e. matching inquiry is to one It is individual, word frequency f_ijNumeration count increases by 1, the number of files n of appearance_iRepresent the ciphertext keyword ckindex of the statistics in document sets The number of documents of appearance, eventually passes the f that statistics obtains the ciphertext keyword_ijAnd n_i。

(3) file vector collection is generated

After inverted index table updates completion, according to the keyword word frequency and inverse document frequency that count, document sets pair are generated The weight vectors collection answered；Weight calculation framework used in the present invention is TF-IDF frameworks, and what wherein TF was represented is keyword Word frequency, what IDF was represented is the inverse document frequency of keyword；Shown in the value such as formula (7) of the calculating weight vectors that the present invention is used：

Wherein w_ijRepresent keyword k_iTo document d_jTF-IDF weight, f_ijRepresent keyword k_iAppear in document d_jIn frequency, The number of times for occurring, N represents the total quantity of document in document sets, n_iKeyword k is included in expression document sets_iNumber of documents, N/ n_iRepresent keyword k_iInverse document frequency；

If ciphertext document d_jThere is t ciphertext keyword k_i, and it is separate between them, define ciphertext document d_jFor on t dimension spaces Vector, ciphertext document d is tried to achieve according to above-mentioned formula_jIn each ciphertext keyword k_iWeighted value, and then generate ciphertext document Plaintext weight vectors, wherein each ciphertext keyword k_iThe weighted value tried to achieve is all in plain text；Then d_jValue such as formula (8) shown in：

Step 3：Retrieving

The encryption of plaintext search terms is uploaded to high in the clouds by client, does pre- as ciphertext document sets to searching ciphertext beyond the clouds Treatment work, participle filter operation is carried out by searching ciphertext item, and each of which is all that corresponding ciphertext keyword is examined in ciphertext Plaintext weighted value in rope, shown in the computing formula such as formula (9) of the plaintext weighted value：

If searching ciphertext q has t ciphertext keyword k_i, and it is separate between them, searching ciphertext is tried to achieve according to above-mentioned formula Each ciphertext keyword k in item q_iPlaintext weighted value, and then generate the plaintext weight vectors of searching ciphertext, wherein each is close Literary keyword k_iThe weighted value tried to achieve is all in plain text；Then shown in the value of q such as formula (10)：

Retrieving is in fact the plaintext power of the plaintext weight vectors to searching ciphertext and each ciphertext document in ciphertext document sets Weight vector carries out Similarity Measure, and the computing formula of similarity is as follows：

If the plaintext weight vectors of ciphertext document areThe plaintext weight vectors of searching ciphertext areAccording to vector space model Definition, if both similarities areThen have

Wherein, | d_j| the plaintext vector mould and the plaintext of searching ciphertext of the ciphertext document being respectively calculated beyond the clouds with | q | Vectorial mould, is calculated ciphertext keyword and every Similarity Measure result of ciphertext document using formula (11) beyond the clouds, and Result of calculation is ranked up by similarity size, similarity is high, i.e., come with inquiry degree of correlation document high before, So as to be conducive to user to search, sequencing of similarity result is returned to client and is checked for user by last high in the clouds.

Step 4：Profile download

After retrieving desired document according to previous step, the document selected can be downloaded, be input into client first Document name to be downloaded, is uploaded to high in the clouds and preserves, using formula (13) text using formula (12) by document name encryption to be downloaded Every document name of shelves collection is encrypted and is uploaded to high in the clouds preservation；Assuming that the cryptograph files of storage file to be downloaded beyond the clouds Entitled c_index, the entitled c of cryptograph files of high in the clouds document_i, then have

c_index=m_index+2pq+pqr₁Formula (12)

c_i=m_i+2pq+pqr₂Formula (13)

Equally, because homomorphic cryptography technology is different to the ciphertext obtained by same data encryption twice, needing will be to be downloaded Ciphertext document name matched by formula (14) with each ciphertext document name in document sets, if matching result is Retrieval, then

This matching operation is carried out beyond the clouds, and the document to be downloaded can be beyond the clouds retrieved by formula (14), if Retrieval=0, then c_i-c_index=0, then it represents that the ciphertext stop words has been inquired in ciphertext document, using above-mentioned public affairs During formula, only N, wherein N=pq need to be uploaded to high in the clouds, so being that cannot draw key p from N.