CN106934063A - A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application - Google Patents

A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application Download PDF

Info

Publication number
CN106934063A
CN106934063A CN201710199651.1A CN201710199651A CN106934063A CN 106934063 A CN106934063 A CN 106934063A CN 201710199651 A CN201710199651 A CN 201710199651A CN 106934063 A CN106934063 A CN 106934063A
Authority
CN
China
Prior art keywords
ciphertext
document
clouds
keyword
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710199651.1A
Other languages
Chinese (zh)
Other versions
CN106934063B (en
Inventor
拱长青
肖芸
林娜
郭振洲
李席广
赵亮
孟庆杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aerospace University
Original Assignee
Shenyang Aerospace University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aerospace University filed Critical Shenyang Aerospace University
Priority to CN201710199651.1A priority Critical patent/CN106934063B/en
Publication of CN106934063A publication Critical patent/CN106934063A/en
Application granted granted Critical
Publication of CN106934063B publication Critical patent/CN106934063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention provides a kind of homomorphic cryptography cipher text retrieval method towards cloud computing application, and pretreatment stage is put into high in the clouds to be implemented, and the TF IDF weight vectors of document are all to be calculated beyond the clouds;High in the clouds is based on the document after homomorphic cryptography, can carry out isomorphism operation and calculating to ciphertext, constructs every TF IDF weight vectors of document, and exist with clear-text way;When searching ciphertext is carried out, search terms are encrypted and upload to high in the clouds by client, high in the clouds to be operated also can obtain corresponding TF IDF weight vectors in plain text to searching ciphertext item, and then calculated the similarity of search terms and each document in document sets and obtained plaintext ranking results;Contrasted with pretreatment stage is put into the scheme that client processed, alleviated the calculating pressure of client, the powerful computing capability of cloud computing and storage capacity can be made full use of to operate data, improved retrieval operating efficiency.

Description

A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application
Technical field:
The invention belongs to cloud computing applied technical field, more particularly to a kind of homomorphic cryptography ciphertext towards cloud computing application Search method.
Background technology:
Cloud computing is a kind of service mode of innovation, and it can make user obtain almost limitless meter at any time by internet Calculation ability and rich and varied information service, are the evolution of Distributed Calculation, parallel computation and grid computing.Cloud storage falls within Cloud computing category, using increasingly popularizing.With developing rapidly for cloud computing, a large amount of sensitive informations are concentrated to high in the clouds.In order to keep away Exempt from the private data leakage of client, stored beyond the clouds after must private data be encrypted.When being stored in cloud server end Ciphertext data when having developed into certain scale, the effective retrieval to ciphertext data will be a problem demanding prompt solution.
In existing homomorphic cryptography searching ciphertext scheme, the extensive works such as the pretreatment of document sets are typically positioned over client Carry out.There is within 2014 document that TF-IDF vector retrieval modelings are applied in homomorphic cryptography searching ciphertext scheme, the base of its scheme This thinking is to select corresponding keyword to document sets in client first, calculates keyword to each document in document sets TF-IDF weight, builds each TF-IDF weight vector of document, obtains the weight vectors collection of document sets.Afterwards to weight to Vector in quantity set is encrypted and is uploaded to high in the clouds and preserved respectively.When retrieval, calculated in client examine first The weight vectors of rope, and high in the clouds is uploaded to after being encrypted to it, the weight vectors ciphertext and document sets of search terms are used afterwards Weight vectors collection ciphertext, calculates the similarity ciphertext of search terms and each document in document sets and returns to user, Yong Hu Client obtains plaintext ranking results to sequence ciphertext decryption.
Above-mentioned searching ciphertext scheme, increased the calculating pressure of client, and not make full use of the computing capability in high in the clouds.
Homomorphism cipher text retrieval method towards cloud computing proposed by the present invention, the extensive works such as the pretreatment of document sets are moved Carried out to high in the clouds, pretreatment is implemented in high in the clouds to the document after encryption, the computing capability in high in the clouds can be made full use of, and then realize Efficient homomorphism searching ciphertext.
The content of the invention
The present invention is not in order in overcoming searching ciphertext scheme, make full use of the computing capability in high in the clouds, so as to cause client The shortcomings of calculating pressure at end is very big, it is proposed that a kind of computing capability for making full use of high in the clouds, i.e., towards cloud computing application Homomorphic cryptography cipher text retrieval method, the method is compared with existing method, it will be apparent that improves retrieval operating efficiency, alleviates client The calculating pressure at end.
The technical solution adopted for the present invention to solve the technical problems is:A kind of homomorphic cryptography towards cloud computing application is close Literary search method, it is characterised in that the method is comprised the following steps:
Step 1:Document sets are encrypted in client and is uploaded to high in the clouds
In client, user is encrypted using the full homomorphic encryption algorithm of integer to every document in document sets, then It is uploaded to high in the clouds preservation.
(1) AES
KeyGen:The Safety Big Prime Number of a selection one P for randomly generating is used as key p, wherein p ∈ [2P-1,2P];
Encrypt(m):A Q Safety Big Prime Number q is randomly selected, wherein, q ∈ [2Q-1,2Q], P>Q>Clear packets are long Degree, random number r is randomly selected in predetermined time interval, and packet M=m is carried out to M1m2m3...mt(miLength be L), ciphertext c is calculatedi=mi+ 2pq+pqr, you can obtain ciphertext;
Decrypt(c):Calculate mi=ciMod p, obtain clear-text message M=m1m2m3...mt, that is, it is bright after being decrypted Text;
Isomorphism is analyzed:It is provided with two plaintext m1, m2, its corresponding ciphertext is respectively c1, c2, then
c1=m1+2pq+pqr1
c2=m2+2pq+pqr2
Additive homomorphism is analyzed:
c1+c2=(m1+m2)+4pq+pq(r1+r2), because (c1+c2) mod p=m1+m2, therefore the algorithm meets addition Homomorphism;
Multiplicative homomorphic is analyzed:
c1*c2=(m1*m2)+2m1pq+m1pqr2+2m2pq+4p2q2+2p2q2r2+m2pqr1+2p2q2r1+p2q2r1r2= (m1*m2)+2pq(m1+m2)+pq(m1r2+m2r1)+2p2q2(r2+r1)+4p2q2+p2q2r1r2,
Because (c1*c2) mod p=m1*m2, therefore the algorithm meets multiplicative homomorphic.
(2) keyword mindexMatch query algorithm
A, using above-mentioned AES, keyword is encrypted in client, obtain corresponding ciphertext cindex=mindex+2pq + pqr, and upload to high in the clouds;
B, high in the clouds receive ciphertext keyword cindexAfterwards, inquired about using ciphertext matching algorithm, wherein N=pq, matched Formula Retrieval=(ci-cindex) mod N=((mi-mindex)+pq(r1-r2))mod N;
If Retrieval=0, mi-mindex=0, that is, work as mi=mindexWhen, pq (r1-r2) mod N=0, so Retrieval=0, only N need to be uploaded during retrieval to high in the clouds, and wherein N=pq, high in the clouds cannot release the key p of user, but server Ciphertext that can be directly to user carries out match query operation.
Step 2:Pretreatment operation is carried out to ciphertext document sets beyond the clouds
High in the clouds firstly generates the copy of ciphertext document sets, and pretreatment operation is all that copy is carried out, and pretreatment is divided into three Stage, i.e. participle filter, and set up inverted index, generate the weight vectors collection of document sets.
(1) participle filtering
English typically splits entry by space, and Chinese character then can not in this way, simplest mode It is exactly individual Chinese character as an entry.
Need to reject to the retrieval of document sets after participle without keyword in all senses, i.e. stop words.To each pass Keyword, sees that it whether there is in word list is disabled, and is filtered out if there is then it is deleted from document;Due to original document Collection has been encrypted in client, and lookup ciphertext stop words is carried out in the case of encryption can use linear matched Method searched;The method for carrying out matching inquiry is as follows:
In step 1, client has used formula (1) that high in the clouds is uploaded to after stop words is encrypted, and high in the clouds preserves ciphertext , be encrypted for each keyword in document sets using formula (2) and be uploaded to high in the clouds preservation by stop words;
Assuming that the ciphertext keyword in the document of high in the clouds is tindex, storage ciphertext stop words beyond the clouds is ti, then have:
ti=mi+2pq+pqr2Formula (1)
tindex=mindex+2pq+pqr1Formula (2)
Filtration treatment is all carried out to each keyword in the ciphertext document of every, high in the clouds by formula (3) beyond the clouds, it is looked into The matching formula of inquiry is:
This matching operation is carried out beyond the clouds, if result is Retrieval, if Retrieval=0, ti-tindex= 0, then it represents that ciphertext stop words has been inquired in ciphertext document, when using above-mentioned formula, only need to upload N, wherein N to high in the clouds =pq, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable;
After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets are generated, it is therein every One document djRepresent.
(2) inverted index is set up
The foundation of inverted index is placed on server end by the present invention to be carried out, and keyword k is counted under the conditions of encryptioniWord Frequency TFiAnd inverse document frequency IDFi, ultimately produce the inverted index table of ciphertext document sets;
Assuming that the ciphertext keyword for needing statistics beyond the clouds is ckindex, storage ciphertext keyword beyond the clouds is ci, then have
ckindex=mkindex+2pq+pqr1Formula (4)
ci=mi+2pq+pqr2Formula (5)
Order,
This matching operation is carried out beyond the clouds, in formula (6), if result is Retrieval, and if Retrieval=0, Then ci-ckindex=0, then it represents that be have found in this ciphertext document needs the ciphertext keyword c of statisticskindex, that is, match and look into Ask to one, word frequency fijNumeration count increases by 1, the number of files n of appearanceiRepresent the ciphertext keyword ckindex of the statistics in text Shelves concentrate the number of documents for occurring, and eventually pass the f that statistics obtains the ciphertext keywordijAnd ni;When using above-mentioned formula, only N, wherein N=pq need to be uploaded to high in the clouds, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable.
Data to being obtained according to statistics are recorded, as shown in table 1;
The inverted index table of table 1
As shown in table 1, document name and keyword are still beyond the clouds encrypted state, however statistics obtain document code, The frequency of occurrences and the number of files for occurring are beyond the clouds plaintext version.
(3) file vector collection is generated
User when keyword retrieval is carried out, document sets to be retrieved in fact it is corresponding be document sets weight vectors collection, After inverted index table updates completion, according to the keyword word frequency and inverse document frequency that count, generation document sets are corresponding Weight vectors collection;Weight calculation framework used in the present invention is TF-IDF frameworks, and what wherein TF was represented is the word frequency of keyword, What IDF was represented is the inverse document frequency of keyword;Shown in the value such as formula (7) of the calculating weight vectors that the present invention is used:
Wherein wijRepresent keyword kiTo document djTF-IDF weight, fijRepresent keyword kiAppear in document djIn Frequency, that is, the number of times for occurring, N represents the total quantity of document in document sets, niKeyword k is included in expression document setsiNumber of files Amount, N/niRepresent keyword kiInverse document frequency;
If ciphertext document djThere is t ciphertext keyword ki, and it is separate between them, define ciphertext document djFor t ties up empty Between on vector, ciphertext document d is tried to achieve according to above-mentioned formulajIn each ciphertext keyword kiWeighted value, and then generate ciphertext text The plaintext weight vectors of shelves, wherein each ciphertext keyword kiThe weighted value tried to achieve is all in plain text;Then djValue such as formula (8) institute Show:
Step 3:Retrieving
The encryption of plaintext search terms is uploaded to high in the clouds by client, and searching ciphertext is done as ciphertext document sets beyond the clouds Pretreatment work, searching ciphertext is carried out into participle filter operation, this process to ciphertext document as carried out participle beyond the clouds Ciphertext stop words is filtered out, the inverted index of searching ciphertext is resettled, the plaintext weight vectors of searching ciphertext are ultimately produced, Each of which is all plaintext weighted value of the corresponding ciphertext keyword in searching ciphertext, and the calculating of the plaintext weighted value is public Shown in formula such as formula (9):
If searching ciphertext q has t ciphertext keyword ki, and it is separate between them, tried to achieve according to above-mentioned formula close Each ciphertext keyword k in literary search terms qiPlaintext weighted value, and then generate the plaintext weight vectors of searching ciphertext, wherein Each ciphertext keyword kiThe weighted value tried to achieve is all in plain text;Then shown in the value of q such as formula (10):
Retrieving be in fact to the plaintext weight vectors of searching ciphertext with ciphertext document sets each ciphertext document it is bright Literary weight vectors carry out Similarity Measure, and the computing formula of similarity is as follows:
If the plaintext weight vectors of ciphertext document areThe plaintext weight vectors of searching ciphertext areAccording to vectorial empty Between model definition, if both similarities areThen have
Wherein, | dj| the plaintext vector mould and searching ciphertext of the ciphertext document being respectively calculated beyond the clouds with | q | Plaintext vector mould, be calculated ciphertext keyword with every Similarity Measure knot of ciphertext document using formula (11) beyond the clouds Really, and to result of calculation it is ranked up by similarity size, similarity is high, i.e., document high with inquiry degree of correlation comes Above, so as to be conducive to user to search, sequencing of similarity result is returned to client and is checked for user by last high in the clouds.
Step 4:Profile download
After retrieving desired document according to previous step, the document selected can be downloaded, first in client Input document name to be downloaded, is uploaded to high in the clouds and preserves, using formula using formula (12) by document name encryption to be downloaded (13) every document name of document sets is encrypted and is uploaded to high in the clouds preservation;Assuming that storage file to be downloaded beyond the clouds is close The entitled c of fileindex, the entitled c of cryptograph files of high in the clouds documenti, then have
cindex=mindex+2pq+pqr1Formula (12)
ci=mi+2pq+pqr2Formula (13)
Equally, because homomorphic cryptography technology is different to the ciphertext obtained by same data encryption twice, will need to treat The ciphertext document name of download is matched with each ciphertext document name in document sets by formula (14), if matching result is Retrieval, then
This matching operation is carried out beyond the clouds, and the document to be downloaded can be beyond the clouds retrieved by formula (14), if Retrieval=0, then ci-cindex=0, then it represents that the ciphertext document has been inquired in ciphertext document, above-mentioned formula is being used When, only N, wherein N=pq need to be uploaded to high in the clouds, so key p cannot be drawn from N.
Beneficial effects of the present invention:
In cloud computing searching ciphertext application field, the homomorphic cryptography cipher text retrieval method based on TF-IDF vector retrieval modelings More advantageous, existing scheme is mostly that the extensive works such as the pretreatment of document sets are positioned over into client to be processed, at client Substantial amounts of work is managed, high in the clouds only processes a small amount of work, disadvantage of this is that the pressure that increased client, not using high in the clouds Powerful computing capability itself.
Pretreatment stage is put into high in the clouds to be implemented by the present invention, is processed with pretreatment stage is put into client Scheme contrasted, so treatment benefit be:The calculating pressure of client is alleviated, cloud computing can be made full use of powerful Computing capability and storage capacity data are operated, improve retrieval operating efficiency.
Innovative point of the invention may be summarized as follows:
(1) extensive works such as the pretreatment of document sets are tried one's best and moves to high in the clouds and carry out, the TF-IDF weight vector of document All it is to be calculated beyond the clouds;
(2) high in the clouds is based on the document after homomorphic cryptography, can carry out isomorphism operation and calculating to ciphertext, constructs every The TF-IDF weight vector of document, and exist with clear-text way;
(3) when searching ciphertext is carried out, search terms are encrypted and upload to high in the clouds by client, and high in the clouds is entered to searching ciphertext Row operation also can obtain corresponding TF-IDF weight vector in plain text, and then calculate the phase of search terms and each document in document sets Seemingly spend and obtain plaintext ranking results.
Brief description of the drawings
Fig. 1 is the flow chart of the general steps of method proposed by the invention;
Fig. 2 is step 1,2 pairs of file encryptions uploads and the specific flow chart of pretreatment operation in the present invention;
Fig. 3 is step 1,2,3 operation particular flow sheet in the present invention;
Fig. 4 is the particular flow sheet of step 4 profile download in the present invention.
Specific embodiment:
The present invention proposes a kind of homomorphism cipher text retrieval method towards cloud computing, by extensive works such as the pretreatments of document sets Move on to high in the clouds to carry out, pretreatment is implemented in high in the clouds to the document after encryption, can make full use of the computing capability in high in the clouds, Jin Ershi Existing efficient homomorphism searching ciphertext.Described in detail with reference to drawings and Examples as follows.
Method flow proposed by the present invention is as shown in figure 1, comprise the following steps:
Step 1:Document sets are encrypted in client and is uploaded to high in the clouds
(1) AES
One key p=131 of random selection;
Q and random number r are randomly selected in predetermined time interval.
Four documents of selection randomly select 7 keywords as document sets in every document, document content is specific as follows:
Test1={ I happiness on the others my hands }
Test2={ I was father wind happiness I is }
Test3={ my father is a player player birthday }
Test4={ happiness father others was of my others }
This four documents are encrypted respectively, the ciphertext document for obtaining is specific as follows:
Test1={ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca 01142 196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510d ec9ba2a7}
Test2={ 56e1d11107f444633216d92f822262c9ad76c9b1444676c9bbfb50af b4fc6 981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d}
Test3={ 4d9586fb6512cf29ba295980a7ccc6ed2fb1e59902540d3e12fb70f2 1049e c114217285a811d70d123a3f13327b285b7e8b2d8ba081173c7954e3af8bb70f37903e94f0211 a56534c3b320756e3511d70d}
Test4={ f8971173af126bedb3f5e53c9a123a48d61543af8c412d2f895df639 4a7bc 4d95344467dc4a759fe6444764d95363abdc4a7285ba8ebc11300c7c6ff2b759f51b3c692daaa 7ebd4464761959fdfa15811300d872d02}
The document sets after encryption are uploaded into high in the clouds with ciphertext form to be stored.
Step 2:Pretreatment operation is carried out to ciphertext document sets beyond the clouds
(1) participle filtering
English splits entry generally by space, needs to reject to the retrieval of document sets after participle without in all senses Keyword, we are referred to as stop words.Used here as StandardAnalyzer analyzers, and the structure without parameter is used Device StandardAnalyzer () is made, in this constructor without parameter of StandardAnalyzer classes, one is specified Individual filtering characters array STOP_WORDS, it is necessary first to this STOP_WORDS is carried out into homomorphic encryption algorithm and is encrypted upload Preserved with standby to high in the clouds.
Stop words present in the present embodiment be { on, the, was, is, a }, due to original document collection in client It is encrypted, lookup ciphertext stop words is carried out in the case of encryption to be needed to be looked into using the method for linear matched Look for, the method for carrying out matching inquiry is as follows:
Assuming that the ciphertext keyword in the document of high in the clouds is tindex, storage ciphertext stop words beyond the clouds is ti, according to formula (3) ciphertext stop words matching inquiry can be carried out to the ciphertext keyword of every document of confidential document collection, if Retrieval=0, Then ti-tindex=0, then ti=tindex, then this keyword t is provedindexIt is the stop words for inquiring, and is deleted filtering Fall.
After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets are generated, it is therein every One document djRepresent;
(2) inverted index is set up
Assuming that the ciphertext keyword for needing statistics beyond the clouds is cwind={ 107b965d18369814bd43b }, storage exists High in the clouds ciphertext document test1={ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca 0 1142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe 510dec9ba2a7 } in each ciphertext keyword tag be ci
Then calculated according to formula (6), if Retrieval=0, ci-cwind=0, then ci=cwind, then it represents that close Be have found in document test1 needs the ciphertext keyword c of statisticswind, i.e. matching inquiry is to one, word frequency fijNumeration count It is designated as 1, the number of files n of appearanceiRepresent the ciphertext keyword c of the statisticswindIn document sets occur number of documents, by with Four matchings of ciphertext document, finally count and obtain ciphertext keyword cwindNi=1;
The data that statistics is obtained are as shown in table 2;
The inverted index table of table 2
As shown in table 2, document name and keyword are still beyond the clouds encrypted state, however statistics obtain document code, The frequency of occurrences and the number of files for occurring are beyond the clouds plaintext version.
(3) file vector collection is generated
Obtained according to formula (7):
d1={ i=1, happiness=0, others=1, my=0, hands=2, father=0, wind=0, pla Yer=0, birthday=0 }
d2={ i=2, happiness=0, others=0, my=0, hands=0, father=0.6, wind=2, p Layer=0, birthday=0 }
d3={ i=0, happiness=0, others=0, my=0, hands=0, father=0.3, wind=0, p Layer=4, birthday=2 }
d4={ i=0, happiness=0, others=2, my=0, hands=0, father=0.3, wind=0, p Layer=0, birthday=0 }
It is according to the weight vectors collection that formula (8) finally gives document sets:
Step 3:Retrieving
For example, inquiry " father " search terms, can be according to step 1,2 obtain its weight vectors isThe similarity of search terms and the weight vectors of document sets is calculated finally according to formula (11), is pressed Order from big to small returns to client, and the result for obtaining is:
Step 4:Profile download
Understood according to previous step retrieval, the similarity highest of document 2 and search terms, such as selection document 2 is downloaded, and is implemented Flow is as follows:
" test2 " is input into document to be downloaded, and its search file name is carried out into homomorphic cryptography in client and be uploaded to cloud End, this matching operation is carried out beyond the clouds.
The ciphertext document name collection for preserving beyond the clouds is combined into { d2fc02540950b013217a7893ebcd83e12110184c C700e27ad4d95f25409d2fbf60331130099c9ad7d2fb1aaa784d95f1 1a525 }, document name to be downloaded Ciphertext is { 18d8c3e121129d93222756fb1e }, then can beyond the clouds retrieve the text to be downloaded by formula (14) Shelves, if Retrieval=0, ci-cindex=0, then it represents that the ciphertext document has been inquired in ciphertext document, then user Can confirm to be downloaded client.
Step 4:Document decryption after download
Client receives the ciphertext document test2={ 56e1d11107f444633216d92f822262c9ad7 of download 6c9b1444676c9bb#fb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d 95edf6214d9542b75d }, then it is decrypted, decipherment algorithm is:
Decrypt(c):Calculate mi=ciMod p, obtain clear-text message M=m1m2m3...mt, that is, it is bright after being decrypted Document test2={ I was father wind happiness I is } is that is, correct to download and decrypting ciphertext document.

Claims (1)

1. a kind of homomorphic cryptography cipher text retrieval method towards cloud computing application, it is characterised in that the method is comprised the following steps:
Step 1:Document sets are encrypted in client and is uploaded to high in the clouds
In client, user is encrypted to every document in document sets using the full homomorphic encryption algorithm of integer, then uploaded Preserved to high in the clouds.
(1) AES
KeyGen:The Safety Big Prime Number of a selection one P for randomly generating is used as key p, wherein p ∈ [2P-1,2P];
Encrypt(m):A Q Safety Big Prime Number q is randomly selected, wherein, q ∈ [2Q-1,2Q], P>Q>Clear packets length, with Machine produces a random number r, and packet M=m is carried out to M1m2m3...mt(miLength be L), calculate ciphertext ci=mi+2pq+ Pqr, you can obtain ciphertext;
Decrypt(c):Calculate mi=ciMod p, obtain clear-text message M=m1m2m3...Mt, that is, the plaintext after being decrypted;
Isomorphism is analyzed:It is provided with two plaintext m1, m2, its corresponding ciphertext is respectively c1, c2, then
c1=m1+2pq+pqr1
c2=m2+2pq+pqr2
Additive homomorphism is analyzed:
c1+c2=(m1+m2)+4pq+pq(r1+r2), because (c1+c2) mod p=m1+m2, therefore the algorithm meets additive homomorphism;
Multiplicative homomorphic is analyzed:
c1*c2=(m1*m2)+2m1pq+m1pqr2+2m2pq+4p2q2+2p2q2r2+m2pqr1+2p2q2r1+p2q2r1r2=(m1*m2)+ 2pq(m1+m2)+pq(m1r2+m2r1)+2p2q2(r2+r1)+4p2q2+p2q2r1r2,
Because (c1*c2) mod p=m1*m2, therefore the algorithm meets multiplicative homomorphic.
(2) keyword mindexMatch query algorithm
A, using above-mentioned AES, keyword is encrypted in client, obtain corresponding ciphertext cindex=mindex+ 2pq+pqr, And upload to high in the clouds;
B, high in the clouds receive ciphertext keyword cindexAfterwards, inquired about using ciphertext matching algorithm, wherein N=pq, matching formula Retrieval=(ci-cindex) mod N=((mi-mindex)+pq(r1-r2))mod N;
If Retrieval=0, mi-mindex=0, that is, work as mi=mindexWhen, pq (r1-r2) mod N=0, so Retrieval=0, only N need to be uploaded during retrieval to high in the clouds, and wherein N=pq, high in the clouds cannot release the key p of user, but server Ciphertext that can be directly to user carries out match query operation.
Step 2:Pretreatment operation is carried out to ciphertext document sets beyond the clouds
High in the clouds firstly generates the copy of ciphertext document sets, and pretreatment operation is all that copy is carried out, and pretreatment is divided into three phases, That is participle filtering, sets up inverted index, generates the weight vectors collection of document sets;
(1) participle filtering
Selection keyword, search file collection;
To each keyword, see that it whether there is in word list is disabled, if there is it is then deleted into filtering from document Fall;Because original document collection has been encrypted in client, carry out searching ciphertext keyword in the case of encryption Can be searched using the method for linear matched;The method for carrying out matching inquiry is as follows:
In step 1, client has used formula (1) that high in the clouds is uploaded to after stop words is encrypted, and high in the clouds preserves ciphertext and disables , be encrypted for each keyword in document sets using formula (2) and be uploaded to high in the clouds preservation by word;
Assuming that the ciphertext keyword in the document of high in the clouds is tindex, storage ciphertext stop words beyond the clouds is ti, then have:
ti=mi+2pq+pqr2Formula (1)
tindex=mindex+2pq+pqr1Formula (2)
Filtration treatment is all carried out to each keyword in the ciphertext document of every, high in the clouds by formula (3) beyond the clouds, its inquiry Matching formula is:
This matching operation is carried out beyond the clouds, if result is Retrieval, if Retrieval=0, ti-tindex=0, then Expression has inquired ciphertext stop words in ciphertext document, when using above-mentioned formula, only need to upload N, wherein N=to high in the clouds Pq, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable;
After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets, each of which are generated Document is represented with dj.
(2) inverted index is set up
The foundation of inverted index is placed on server end by the present invention to be carried out, and keyword k is counted under the conditions of encryptioniWord frequency TFi And inverse document frequency IDFi, ultimately produce the inverted index table of ciphertext document sets;
Assuming that the ciphertext keyword for needing statistics beyond the clouds is ckindex, storage ciphertext keyword beyond the clouds is ci, then have
ckindex=mkindex+2pq+pqr1Formula (4)
ci=mi+2pq+pqr2Formula (5)
Order,
This matching operation is carried out beyond the clouds, in formula (6), if result is Retrieval, if Retrieval=0, ci- ckindex=0, then it represents that be have found in this ciphertext document needs the ciphertext keyword c of statisticskindex, i.e. matching inquiry is to one It is individual, word frequency fijNumeration count increases by 1, the number of files n of appearanceiRepresent the ciphertext keyword ckindex of the statistics in document sets The number of documents of appearance, eventually passes the f that statistics obtains the ciphertext keywordijAnd ni
(3) file vector collection is generated
After inverted index table updates completion, according to the keyword word frequency and inverse document frequency that count, document sets pair are generated The weight vectors collection answered;Weight calculation framework used in the present invention is TF-IDF frameworks, and what wherein TF was represented is keyword Word frequency, what IDF was represented is the inverse document frequency of keyword;Shown in the value such as formula (7) of the calculating weight vectors that the present invention is used:
Wherein wijRepresent keyword kiTo document djTF-IDF weight, fijRepresent keyword kiAppear in document djIn frequency, The number of times for occurring, N represents the total quantity of document in document sets, niKeyword k is included in expression document setsiNumber of documents, N/ niRepresent keyword kiInverse document frequency;
If ciphertext document djThere is t ciphertext keyword ki, and it is separate between them, define ciphertext document djFor on t dimension spaces Vector, ciphertext document d is tried to achieve according to above-mentioned formulajIn each ciphertext keyword kiWeighted value, and then generate ciphertext document Plaintext weight vectors, wherein each ciphertext keyword kiThe weighted value tried to achieve is all in plain text;Then djValue such as formula (8) shown in:
Step 3:Retrieving
The encryption of plaintext search terms is uploaded to high in the clouds by client, does pre- as ciphertext document sets to searching ciphertext beyond the clouds Treatment work, participle filter operation is carried out by searching ciphertext item, and each of which is all that corresponding ciphertext keyword is examined in ciphertext Plaintext weighted value in rope, shown in the computing formula such as formula (9) of the plaintext weighted value:
If searching ciphertext q has t ciphertext keyword ki, and it is separate between them, searching ciphertext is tried to achieve according to above-mentioned formula Each ciphertext keyword k in item qiPlaintext weighted value, and then generate the plaintext weight vectors of searching ciphertext, wherein each is close Literary keyword kiThe weighted value tried to achieve is all in plain text;Then shown in the value of q such as formula (10):
Retrieving is in fact the plaintext power of the plaintext weight vectors to searching ciphertext and each ciphertext document in ciphertext document sets Weight vector carries out Similarity Measure, and the computing formula of similarity is as follows:
If the plaintext weight vectors of ciphertext document areThe plaintext weight vectors of searching ciphertext areAccording to vector space model Definition, if both similarities areThen have
Wherein, | dj| the plaintext vector mould and the plaintext of searching ciphertext of the ciphertext document being respectively calculated beyond the clouds with | q | Vectorial mould, is calculated ciphertext keyword and every Similarity Measure result of ciphertext document using formula (11) beyond the clouds, and Result of calculation is ranked up by similarity size, similarity is high, i.e., come with inquiry degree of correlation document high before, So as to be conducive to user to search, sequencing of similarity result is returned to client and is checked for user by last high in the clouds.
Step 4:Profile download
After retrieving desired document according to previous step, the document selected can be downloaded, be input into client first Document name to be downloaded, is uploaded to high in the clouds and preserves, using formula (13) text using formula (12) by document name encryption to be downloaded Every document name of shelves collection is encrypted and is uploaded to high in the clouds preservation;Assuming that the cryptograph files of storage file to be downloaded beyond the clouds Entitled cindex, the entitled c of cryptograph files of high in the clouds documenti, then have
cindex=mindex+2pq+pqr1Formula (12)
ci=mi+2pq+pqr2Formula (13)
Equally, because homomorphic cryptography technology is different to the ciphertext obtained by same data encryption twice, needing will be to be downloaded Ciphertext document name matched by formula (14) with each ciphertext document name in document sets, if matching result is Retrieval, then
This matching operation is carried out beyond the clouds, and the document to be downloaded can be beyond the clouds retrieved by formula (14), if Retrieval=0, then ci-cindex=0, then it represents that the ciphertext stop words has been inquired in ciphertext document, using above-mentioned public affairs During formula, only N, wherein N=pq need to be uploaded to high in the clouds, so being that cannot draw key p from N.
CN201710199651.1A 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application Active CN106934063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710199651.1A CN106934063B (en) 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710199651.1A CN106934063B (en) 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Publications (2)

Publication Number Publication Date
CN106934063A true CN106934063A (en) 2017-07-07
CN106934063B CN106934063B (en) 2020-08-07

Family

ID=59424866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710199651.1A Active CN106934063B (en) 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Country Status (1)

Country Link
CN (1) CN106934063B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108282328A (en) * 2018-02-02 2018-07-13 沈阳航空航天大学 A kind of ciphertext statistical method based on homomorphic cryptography
WO2019024838A1 (en) * 2017-07-31 2019-02-07 腾讯科技(深圳)有限公司 Search item generation method and relevant apparatus
CN110309674A (en) * 2019-07-04 2019-10-08 浙江理工大学 A kind of sort method based on full homomorphic cryptography
CN110737912A (en) * 2018-09-26 2020-01-31 杨思琦 thesis duplicate checking method based on homomorphic encryption
CN111737719A (en) * 2020-07-17 2020-10-02 支付宝(杭州)信息技术有限公司 Privacy-protecting text classification method and device
CN117278216A (en) * 2023-11-23 2023-12-22 三亚学院 Encryption system based on cloud computing virtualization and network storage files

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120213359A1 (en) * 2011-02-17 2012-08-23 Gradiant Method and apparatus for secure iterative processing
CN105323209A (en) * 2014-06-05 2016-02-10 江苏博智软件科技有限公司 Cloud data security protection method adopting fully homomorphic encryption technology and multiple digital watermarking technology
CN105610910A (en) * 2015-12-18 2016-05-25 中南民族大学 Cloud storage oriented ciphertext full-text search method and system based on full homomorphic ciphers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120213359A1 (en) * 2011-02-17 2012-08-23 Gradiant Method and apparatus for secure iterative processing
CN105323209A (en) * 2014-06-05 2016-02-10 江苏博智软件科技有限公司 Cloud data security protection method adopting fully homomorphic encryption technology and multiple digital watermarking technology
CN105610910A (en) * 2015-12-18 2016-05-25 中南民族大学 Cloud storage oriented ciphertext full-text search method and system based on full homomorphic ciphers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕文斌 等: "基于同态加密的密文检索方案研究", 《计算机测量与控制》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019024838A1 (en) * 2017-07-31 2019-02-07 腾讯科技(深圳)有限公司 Search item generation method and relevant apparatus
US11416708B2 (en) 2017-07-31 2022-08-16 Tencent Technology (Shenzhen) Company Limited Search item generation method and related device
CN108282328A (en) * 2018-02-02 2018-07-13 沈阳航空航天大学 A kind of ciphertext statistical method based on homomorphic cryptography
CN108282328B (en) * 2018-02-02 2021-03-12 沈阳航空航天大学 Ciphertext statistical method based on homomorphic encryption
CN110737912A (en) * 2018-09-26 2020-01-31 杨思琦 thesis duplicate checking method based on homomorphic encryption
CN110309674A (en) * 2019-07-04 2019-10-08 浙江理工大学 A kind of sort method based on full homomorphic cryptography
CN110309674B (en) * 2019-07-04 2021-10-01 浙江理工大学 Ordering method based on fully homomorphic encryption
CN111737719A (en) * 2020-07-17 2020-10-02 支付宝(杭州)信息技术有限公司 Privacy-protecting text classification method and device
CN117278216A (en) * 2023-11-23 2023-12-22 三亚学院 Encryption system based on cloud computing virtualization and network storage files
CN117278216B (en) * 2023-11-23 2024-02-13 三亚学院 Encryption system based on cloud computing virtualization and network storage files

Also Published As

Publication number Publication date
CN106934063B (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN106934063A (en) A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application
Kellaris et al. Generic attacks on secure outsourced databases
US8904171B2 (en) Secure search and retrieval
US20170242924A1 (en) Masking query data access pattern in encrypted data
Tahir et al. A new secure and lightweight searchable encryption scheme over encrypted cloud data
EP3511845B1 (en) Encrypted message search method, message transmission/reception system, server, terminal and programme
Buyrukbilen et al. Secure similar document detection with simhash
CN115314295B (en) Block chain-based searchable encryption technical method
CN104468121B (en) The encrypted public key of support multi-key cipher based on given server can search for encryption method
CN107704768A (en) A kind of multiple key classification safety search method of ciphertext
Kissel et al. Verifiable phrase search over encrypted data secure against a semi-honest-but-curious adversary
CN110737912A (en) thesis duplicate checking method based on homomorphic encryption
CN109213731A (en) Multi-key word cipher text retrieval method in cloud environment based on iterative cryptographic
Cui et al. Harnessing encrypted data in cloud for secure and efficient image sharing from mobile devices
CN107454059A (en) Search encryption method based on stream cipher under a kind of cloud storage condition
Ibrahim et al. Approximate keyword-based search over encrypted cloud data
Moataz et al. Privacy-preserving multiple keyword search on outsourced data in the clouds
Liu et al. Achieving secure and efficient cloud search services: Cross-lingual multi-keyword rank search over encrypted cloud data
Agun et al. Privacy and efficiency tradeoffs for multiword top k search with linear additive rank scoring
Saha et al. Efficient protocols for private database queries
Gopal et al. Secure similarity based document retrieval system in cloud
Devi et al. A comparative study on homomorphic encryption algorithms for data security in cloud environment
Aritomo et al. A privacy-preserving similarity search scheme over encrypted word embeddings
Uplavikar et al. Lucene-P $^ 2 $2: A Distributed Platform for Privacy-Preserving Text-Based Search
Shaon et al. A practical framework for executing complex queries over encrypted multimedia data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant