CN106934063A - A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application - Google Patents
A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application Download PDFInfo
- Publication number
- CN106934063A CN106934063A CN201710199651.1A CN201710199651A CN106934063A CN 106934063 A CN106934063 A CN 106934063A CN 201710199651 A CN201710199651 A CN 201710199651A CN 106934063 A CN106934063 A CN 106934063A
- Authority
- CN
- China
- Prior art keywords
- ciphertext
- document
- clouds
- keyword
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The present invention provides a kind of homomorphic cryptography cipher text retrieval method towards cloud computing application, and pretreatment stage is put into high in the clouds to be implemented, and the TF IDF weight vectors of document are all to be calculated beyond the clouds;High in the clouds is based on the document after homomorphic cryptography, can carry out isomorphism operation and calculating to ciphertext, constructs every TF IDF weight vectors of document, and exist with clear-text way;When searching ciphertext is carried out, search terms are encrypted and upload to high in the clouds by client, high in the clouds to be operated also can obtain corresponding TF IDF weight vectors in plain text to searching ciphertext item, and then calculated the similarity of search terms and each document in document sets and obtained plaintext ranking results;Contrasted with pretreatment stage is put into the scheme that client processed, alleviated the calculating pressure of client, the powerful computing capability of cloud computing and storage capacity can be made full use of to operate data, improved retrieval operating efficiency.
Description
Technical field:
The invention belongs to cloud computing applied technical field, more particularly to a kind of homomorphic cryptography ciphertext towards cloud computing application
Search method.
Background technology:
Cloud computing is a kind of service mode of innovation, and it can make user obtain almost limitless meter at any time by internet
Calculation ability and rich and varied information service, are the evolution of Distributed Calculation, parallel computation and grid computing.Cloud storage falls within
Cloud computing category, using increasingly popularizing.With developing rapidly for cloud computing, a large amount of sensitive informations are concentrated to high in the clouds.In order to keep away
Exempt from the private data leakage of client, stored beyond the clouds after must private data be encrypted.When being stored in cloud server end
Ciphertext data when having developed into certain scale, the effective retrieval to ciphertext data will be a problem demanding prompt solution.
In existing homomorphic cryptography searching ciphertext scheme, the extensive works such as the pretreatment of document sets are typically positioned over client
Carry out.There is within 2014 document that TF-IDF vector retrieval modelings are applied in homomorphic cryptography searching ciphertext scheme, the base of its scheme
This thinking is to select corresponding keyword to document sets in client first, calculates keyword to each document in document sets
TF-IDF weight, builds each TF-IDF weight vector of document, obtains the weight vectors collection of document sets.Afterwards to weight to
Vector in quantity set is encrypted and is uploaded to high in the clouds and preserved respectively.When retrieval, calculated in client examine first
The weight vectors of rope, and high in the clouds is uploaded to after being encrypted to it, the weight vectors ciphertext and document sets of search terms are used afterwards
Weight vectors collection ciphertext, calculates the similarity ciphertext of search terms and each document in document sets and returns to user, Yong Hu
Client obtains plaintext ranking results to sequence ciphertext decryption.
Above-mentioned searching ciphertext scheme, increased the calculating pressure of client, and not make full use of the computing capability in high in the clouds.
Homomorphism cipher text retrieval method towards cloud computing proposed by the present invention, the extensive works such as the pretreatment of document sets are moved
Carried out to high in the clouds, pretreatment is implemented in high in the clouds to the document after encryption, the computing capability in high in the clouds can be made full use of, and then realize
Efficient homomorphism searching ciphertext.
The content of the invention
The present invention is not in order in overcoming searching ciphertext scheme, make full use of the computing capability in high in the clouds, so as to cause client
The shortcomings of calculating pressure at end is very big, it is proposed that a kind of computing capability for making full use of high in the clouds, i.e., towards cloud computing application
Homomorphic cryptography cipher text retrieval method, the method is compared with existing method, it will be apparent that improves retrieval operating efficiency, alleviates client
The calculating pressure at end.
The technical solution adopted for the present invention to solve the technical problems is:A kind of homomorphic cryptography towards cloud computing application is close
Literary search method, it is characterised in that the method is comprised the following steps:
Step 1:Document sets are encrypted in client and is uploaded to high in the clouds
In client, user is encrypted using the full homomorphic encryption algorithm of integer to every document in document sets, then
It is uploaded to high in the clouds preservation.
(1) AES
KeyGen:The Safety Big Prime Number of a selection one P for randomly generating is used as key p, wherein p ∈ [2P-1,2P];
Encrypt(m):A Q Safety Big Prime Number q is randomly selected, wherein, q ∈ [2Q-1,2Q], P>Q>Clear packets are long
Degree, random number r is randomly selected in predetermined time interval, and packet M=m is carried out to M1m2m3...mt(miLength be
L), ciphertext c is calculatedi=mi+ 2pq+pqr, you can obtain ciphertext;
Decrypt(c):Calculate mi=ciMod p, obtain clear-text message M=m1m2m3...mt, that is, it is bright after being decrypted
Text;
Isomorphism is analyzed:It is provided with two plaintext m1, m2, its corresponding ciphertext is respectively c1, c2, then
c1=m1+2pq+pqr1
c2=m2+2pq+pqr2
Additive homomorphism is analyzed:
c1+c2=(m1+m2)+4pq+pq(r1+r2), because (c1+c2) mod p=m1+m2, therefore the algorithm meets addition
Homomorphism;
Multiplicative homomorphic is analyzed:
c1*c2=(m1*m2)+2m1pq+m1pqr2+2m2pq+4p2q2+2p2q2r2+m2pqr1+2p2q2r1+p2q2r1r2=
(m1*m2)+2pq(m1+m2)+pq(m1r2+m2r1)+2p2q2(r2+r1)+4p2q2+p2q2r1r2,
Because (c1*c2) mod p=m1*m2, therefore the algorithm meets multiplicative homomorphic.
(2) keyword mindexMatch query algorithm
A, using above-mentioned AES, keyword is encrypted in client, obtain corresponding ciphertext cindex=mindex+2pq
+ pqr, and upload to high in the clouds;
B, high in the clouds receive ciphertext keyword cindexAfterwards, inquired about using ciphertext matching algorithm, wherein N=pq, matched
Formula Retrieval=(ci-cindex) mod N=((mi-mindex)+pq(r1-r2))mod N;
If Retrieval=0, mi-mindex=0, that is, work as mi=mindexWhen, pq (r1-r2) mod N=0, so
Retrieval=0, only N need to be uploaded during retrieval to high in the clouds, and wherein N=pq, high in the clouds cannot release the key p of user, but server
Ciphertext that can be directly to user carries out match query operation.
Step 2:Pretreatment operation is carried out to ciphertext document sets beyond the clouds
High in the clouds firstly generates the copy of ciphertext document sets, and pretreatment operation is all that copy is carried out, and pretreatment is divided into three
Stage, i.e. participle filter, and set up inverted index, generate the weight vectors collection of document sets.
(1) participle filtering
English typically splits entry by space, and Chinese character then can not in this way, simplest mode
It is exactly individual Chinese character as an entry.
Need to reject to the retrieval of document sets after participle without keyword in all senses, i.e. stop words.To each pass
Keyword, sees that it whether there is in word list is disabled, and is filtered out if there is then it is deleted from document;Due to original document
Collection has been encrypted in client, and lookup ciphertext stop words is carried out in the case of encryption can use linear matched
Method searched;The method for carrying out matching inquiry is as follows:
In step 1, client has used formula (1) that high in the clouds is uploaded to after stop words is encrypted, and high in the clouds preserves ciphertext
, be encrypted for each keyword in document sets using formula (2) and be uploaded to high in the clouds preservation by stop words;
Assuming that the ciphertext keyword in the document of high in the clouds is tindex, storage ciphertext stop words beyond the clouds is ti, then have:
ti=mi+2pq+pqr2Formula (1)
tindex=mindex+2pq+pqr1Formula (2)
Filtration treatment is all carried out to each keyword in the ciphertext document of every, high in the clouds by formula (3) beyond the clouds, it is looked into
The matching formula of inquiry is:
This matching operation is carried out beyond the clouds, if result is Retrieval, if Retrieval=0, ti-tindex=
0, then it represents that ciphertext stop words has been inquired in ciphertext document, when using above-mentioned formula, only need to upload N, wherein N to high in the clouds
=pq, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable;
After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets are generated, it is therein every
One document djRepresent.
(2) inverted index is set up
The foundation of inverted index is placed on server end by the present invention to be carried out, and keyword k is counted under the conditions of encryptioniWord
Frequency TFiAnd inverse document frequency IDFi, ultimately produce the inverted index table of ciphertext document sets;
Assuming that the ciphertext keyword for needing statistics beyond the clouds is ckindex, storage ciphertext keyword beyond the clouds is ci, then have
ckindex=mkindex+2pq+pqr1Formula (4)
ci=mi+2pq+pqr2Formula (5)
Order,
This matching operation is carried out beyond the clouds, in formula (6), if result is Retrieval, and if Retrieval=0,
Then ci-ckindex=0, then it represents that be have found in this ciphertext document needs the ciphertext keyword c of statisticskindex, that is, match and look into
Ask to one, word frequency fijNumeration count increases by 1, the number of files n of appearanceiRepresent the ciphertext keyword ckindex of the statistics in text
Shelves concentrate the number of documents for occurring, and eventually pass the f that statistics obtains the ciphertext keywordijAnd ni;When using above-mentioned formula, only
N, wherein N=pq need to be uploaded to high in the clouds, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable.
Data to being obtained according to statistics are recorded, as shown in table 1;
The inverted index table of table 1
As shown in table 1, document name and keyword are still beyond the clouds encrypted state, however statistics obtain document code,
The frequency of occurrences and the number of files for occurring are beyond the clouds plaintext version.
(3) file vector collection is generated
User when keyword retrieval is carried out, document sets to be retrieved in fact it is corresponding be document sets weight vectors collection,
After inverted index table updates completion, according to the keyword word frequency and inverse document frequency that count, generation document sets are corresponding
Weight vectors collection;Weight calculation framework used in the present invention is TF-IDF frameworks, and what wherein TF was represented is the word frequency of keyword,
What IDF was represented is the inverse document frequency of keyword;Shown in the value such as formula (7) of the calculating weight vectors that the present invention is used:
Wherein wijRepresent keyword kiTo document djTF-IDF weight, fijRepresent keyword kiAppear in document djIn
Frequency, that is, the number of times for occurring, N represents the total quantity of document in document sets, niKeyword k is included in expression document setsiNumber of files
Amount, N/niRepresent keyword kiInverse document frequency;
If ciphertext document djThere is t ciphertext keyword ki, and it is separate between them, define ciphertext document djFor t ties up empty
Between on vector, ciphertext document d is tried to achieve according to above-mentioned formulajIn each ciphertext keyword kiWeighted value, and then generate ciphertext text
The plaintext weight vectors of shelves, wherein each ciphertext keyword kiThe weighted value tried to achieve is all in plain text;Then djValue such as formula (8) institute
Show:
Step 3:Retrieving
The encryption of plaintext search terms is uploaded to high in the clouds by client, and searching ciphertext is done as ciphertext document sets beyond the clouds
Pretreatment work, searching ciphertext is carried out into participle filter operation, this process to ciphertext document as carried out participle beyond the clouds
Ciphertext stop words is filtered out, the inverted index of searching ciphertext is resettled, the plaintext weight vectors of searching ciphertext are ultimately produced,
Each of which is all plaintext weighted value of the corresponding ciphertext keyword in searching ciphertext, and the calculating of the plaintext weighted value is public
Shown in formula such as formula (9):
If searching ciphertext q has t ciphertext keyword ki, and it is separate between them, tried to achieve according to above-mentioned formula close
Each ciphertext keyword k in literary search terms qiPlaintext weighted value, and then generate the plaintext weight vectors of searching ciphertext, wherein
Each ciphertext keyword kiThe weighted value tried to achieve is all in plain text;Then shown in the value of q such as formula (10):
Retrieving be in fact to the plaintext weight vectors of searching ciphertext with ciphertext document sets each ciphertext document it is bright
Literary weight vectors carry out Similarity Measure, and the computing formula of similarity is as follows:
If the plaintext weight vectors of ciphertext document areThe plaintext weight vectors of searching ciphertext areAccording to vectorial empty
Between model definition, if both similarities areThen have
Wherein, | dj| the plaintext vector mould and searching ciphertext of the ciphertext document being respectively calculated beyond the clouds with | q |
Plaintext vector mould, be calculated ciphertext keyword with every Similarity Measure knot of ciphertext document using formula (11) beyond the clouds
Really, and to result of calculation it is ranked up by similarity size, similarity is high, i.e., document high with inquiry degree of correlation comes
Above, so as to be conducive to user to search, sequencing of similarity result is returned to client and is checked for user by last high in the clouds.
Step 4:Profile download
After retrieving desired document according to previous step, the document selected can be downloaded, first in client
Input document name to be downloaded, is uploaded to high in the clouds and preserves, using formula using formula (12) by document name encryption to be downloaded
(13) every document name of document sets is encrypted and is uploaded to high in the clouds preservation;Assuming that storage file to be downloaded beyond the clouds is close
The entitled c of fileindex, the entitled c of cryptograph files of high in the clouds documenti, then have
cindex=mindex+2pq+pqr1Formula (12)
ci=mi+2pq+pqr2Formula (13)
Equally, because homomorphic cryptography technology is different to the ciphertext obtained by same data encryption twice, will need to treat
The ciphertext document name of download is matched with each ciphertext document name in document sets by formula (14), if matching result is
Retrieval, then
This matching operation is carried out beyond the clouds, and the document to be downloaded can be beyond the clouds retrieved by formula (14), if
Retrieval=0, then ci-cindex=0, then it represents that the ciphertext document has been inquired in ciphertext document, above-mentioned formula is being used
When, only N, wherein N=pq need to be uploaded to high in the clouds, so key p cannot be drawn from N.
Beneficial effects of the present invention:
In cloud computing searching ciphertext application field, the homomorphic cryptography cipher text retrieval method based on TF-IDF vector retrieval modelings
More advantageous, existing scheme is mostly that the extensive works such as the pretreatment of document sets are positioned over into client to be processed, at client
Substantial amounts of work is managed, high in the clouds only processes a small amount of work, disadvantage of this is that the pressure that increased client, not using high in the clouds
Powerful computing capability itself.
Pretreatment stage is put into high in the clouds to be implemented by the present invention, is processed with pretreatment stage is put into client
Scheme contrasted, so treatment benefit be:The calculating pressure of client is alleviated, cloud computing can be made full use of powerful
Computing capability and storage capacity data are operated, improve retrieval operating efficiency.
Innovative point of the invention may be summarized as follows:
(1) extensive works such as the pretreatment of document sets are tried one's best and moves to high in the clouds and carry out, the TF-IDF weight vector of document
All it is to be calculated beyond the clouds;
(2) high in the clouds is based on the document after homomorphic cryptography, can carry out isomorphism operation and calculating to ciphertext, constructs every
The TF-IDF weight vector of document, and exist with clear-text way;
(3) when searching ciphertext is carried out, search terms are encrypted and upload to high in the clouds by client, and high in the clouds is entered to searching ciphertext
Row operation also can obtain corresponding TF-IDF weight vector in plain text, and then calculate the phase of search terms and each document in document sets
Seemingly spend and obtain plaintext ranking results.
Brief description of the drawings
Fig. 1 is the flow chart of the general steps of method proposed by the invention;
Fig. 2 is step 1,2 pairs of file encryptions uploads and the specific flow chart of pretreatment operation in the present invention;
Fig. 3 is step 1,2,3 operation particular flow sheet in the present invention;
Fig. 4 is the particular flow sheet of step 4 profile download in the present invention.
Specific embodiment:
The present invention proposes a kind of homomorphism cipher text retrieval method towards cloud computing, by extensive works such as the pretreatments of document sets
Move on to high in the clouds to carry out, pretreatment is implemented in high in the clouds to the document after encryption, can make full use of the computing capability in high in the clouds, Jin Ershi
Existing efficient homomorphism searching ciphertext.Described in detail with reference to drawings and Examples as follows.
Method flow proposed by the present invention is as shown in figure 1, comprise the following steps:
Step 1:Document sets are encrypted in client and is uploaded to high in the clouds
(1) AES
One key p=131 of random selection;
Q and random number r are randomly selected in predetermined time interval.
Four documents of selection randomly select 7 keywords as document sets in every document, document content is specific as follows:
Test1={ I happiness on the others my hands }
Test2={ I was father wind happiness I is }
Test3={ my father is a player player birthday }
Test4={ happiness father others was of my others }
This four documents are encrypted respectively, the ciphertext document for obtaining is specific as follows:
Test1={ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca 01142
196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510d
ec9ba2a7}
Test2={ 56e1d11107f444633216d92f822262c9ad76c9b1444676c9bbfb50af b4fc6
981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d}
Test3={ 4d9586fb6512cf29ba295980a7ccc6ed2fb1e59902540d3e12fb70f2 1049e
c114217285a811d70d123a3f13327b285b7e8b2d8ba081173c7954e3af8bb70f37903e94f0211
a56534c3b320756e3511d70d}
Test4={ f8971173af126bedb3f5e53c9a123a48d61543af8c412d2f895df639 4a7bc
4d95344467dc4a759fe6444764d95363abdc4a7285ba8ebc11300c7c6ff2b759f51b3c692daaa
7ebd4464761959fdfa15811300d872d02}
The document sets after encryption are uploaded into high in the clouds with ciphertext form to be stored.
Step 2:Pretreatment operation is carried out to ciphertext document sets beyond the clouds
(1) participle filtering
English splits entry generally by space, needs to reject to the retrieval of document sets after participle without in all senses
Keyword, we are referred to as stop words.Used here as StandardAnalyzer analyzers, and the structure without parameter is used
Device StandardAnalyzer () is made, in this constructor without parameter of StandardAnalyzer classes, one is specified
Individual filtering characters array STOP_WORDS, it is necessary first to this STOP_WORDS is carried out into homomorphic encryption algorithm and is encrypted upload
Preserved with standby to high in the clouds.
Stop words present in the present embodiment be { on, the, was, is, a }, due to original document collection in client
It is encrypted, lookup ciphertext stop words is carried out in the case of encryption to be needed to be looked into using the method for linear matched
Look for, the method for carrying out matching inquiry is as follows:
Assuming that the ciphertext keyword in the document of high in the clouds is tindex, storage ciphertext stop words beyond the clouds is ti, according to formula
(3) ciphertext stop words matching inquiry can be carried out to the ciphertext keyword of every document of confidential document collection, if Retrieval=0,
Then ti-tindex=0, then ti=tindex, then this keyword t is provedindexIt is the stop words for inquiring, and is deleted filtering
Fall.
After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets are generated, it is therein every
One document djRepresent;
(2) inverted index is set up
Assuming that the ciphertext keyword for needing statistics beyond the clouds is cwind={ 107b965d18369814bd43b }, storage exists
High in the clouds ciphertext document test1={ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca 0
1142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe
510dec9ba2a7 } in each ciphertext keyword tag be ci。
Then calculated according to formula (6), if Retrieval=0, ci-cwind=0, then ci=cwind, then it represents that close
Be have found in document test1 needs the ciphertext keyword c of statisticswind, i.e. matching inquiry is to one, word frequency fijNumeration count
It is designated as 1, the number of files n of appearanceiRepresent the ciphertext keyword c of the statisticswindIn document sets occur number of documents, by with
Four matchings of ciphertext document, finally count and obtain ciphertext keyword cwindNi=1;
The data that statistics is obtained are as shown in table 2;
The inverted index table of table 2
As shown in table 2, document name and keyword are still beyond the clouds encrypted state, however statistics obtain document code,
The frequency of occurrences and the number of files for occurring are beyond the clouds plaintext version.
(3) file vector collection is generated
Obtained according to formula (7):
d1={ i=1, happiness=0, others=1, my=0, hands=2, father=0, wind=0, pla
Yer=0, birthday=0 }
d2={ i=2, happiness=0, others=0, my=0, hands=0, father=0.6, wind=2, p
Layer=0, birthday=0 }
d3={ i=0, happiness=0, others=0, my=0, hands=0, father=0.3, wind=0, p
Layer=4, birthday=2 }
d4={ i=0, happiness=0, others=2, my=0, hands=0, father=0.3, wind=0, p
Layer=0, birthday=0 }
It is according to the weight vectors collection that formula (8) finally gives document sets:
Step 3:Retrieving
For example, inquiry " father " search terms, can be according to step 1,2 obtain its weight vectors isThe similarity of search terms and the weight vectors of document sets is calculated finally according to formula (11), is pressed
Order from big to small returns to client, and the result for obtaining is:
Step 4:Profile download
Understood according to previous step retrieval, the similarity highest of document 2 and search terms, such as selection document 2 is downloaded, and is implemented
Flow is as follows:
" test2 " is input into document to be downloaded, and its search file name is carried out into homomorphic cryptography in client and be uploaded to cloud
End, this matching operation is carried out beyond the clouds.
The ciphertext document name collection for preserving beyond the clouds is combined into { d2fc02540950b013217a7893ebcd83e12110184c
C700e27ad4d95f25409d2fbf60331130099c9ad7d2fb1aaa784d95f1 1a525 }, document name to be downloaded
Ciphertext is { 18d8c3e121129d93222756fb1e }, then can beyond the clouds retrieve the text to be downloaded by formula (14)
Shelves, if Retrieval=0, ci-cindex=0, then it represents that the ciphertext document has been inquired in ciphertext document, then user
Can confirm to be downloaded client.
Step 4:Document decryption after download
Client receives the ciphertext document test2={ 56e1d11107f444633216d92f822262c9ad7 of download
6c9b1444676c9bb#fb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d
95edf6214d9542b75d }, then it is decrypted, decipherment algorithm is:
Decrypt(c):Calculate mi=ciMod p, obtain clear-text message M=m1m2m3...mt, that is, it is bright after being decrypted
Document test2={ I was father wind happiness I is } is that is, correct to download and decrypting ciphertext document.
Claims (1)
1. a kind of homomorphic cryptography cipher text retrieval method towards cloud computing application, it is characterised in that the method is comprised the following steps:
Step 1:Document sets are encrypted in client and is uploaded to high in the clouds
In client, user is encrypted to every document in document sets using the full homomorphic encryption algorithm of integer, then uploaded
Preserved to high in the clouds.
(1) AES
KeyGen:The Safety Big Prime Number of a selection one P for randomly generating is used as key p, wherein p ∈ [2P-1,2P];
Encrypt(m):A Q Safety Big Prime Number q is randomly selected, wherein, q ∈ [2Q-1,2Q], P>Q>Clear packets length, with
Machine produces a random number r, and packet M=m is carried out to M1m2m3...mt(miLength be L), calculate ciphertext ci=mi+2pq+
Pqr, you can obtain ciphertext;
Decrypt(c):Calculate mi=ciMod p, obtain clear-text message M=m1m2m3...Mt, that is, the plaintext after being decrypted;
Isomorphism is analyzed:It is provided with two plaintext m1, m2, its corresponding ciphertext is respectively c1, c2, then
c1=m1+2pq+pqr1
c2=m2+2pq+pqr2
Additive homomorphism is analyzed:
c1+c2=(m1+m2)+4pq+pq(r1+r2), because (c1+c2) mod p=m1+m2, therefore the algorithm meets additive homomorphism;
Multiplicative homomorphic is analyzed:
c1*c2=(m1*m2)+2m1pq+m1pqr2+2m2pq+4p2q2+2p2q2r2+m2pqr1+2p2q2r1+p2q2r1r2=(m1*m2)+
2pq(m1+m2)+pq(m1r2+m2r1)+2p2q2(r2+r1)+4p2q2+p2q2r1r2,
Because (c1*c2) mod p=m1*m2, therefore the algorithm meets multiplicative homomorphic.
(2) keyword mindexMatch query algorithm
A, using above-mentioned AES, keyword is encrypted in client, obtain corresponding ciphertext cindex=mindex+ 2pq+pqr,
And upload to high in the clouds;
B, high in the clouds receive ciphertext keyword cindexAfterwards, inquired about using ciphertext matching algorithm, wherein N=pq, matching formula
Retrieval=(ci-cindex) mod N=((mi-mindex)+pq(r1-r2))mod N;
If Retrieval=0, mi-mindex=0, that is, work as mi=mindexWhen, pq (r1-r2) mod N=0, so
Retrieval=0, only N need to be uploaded during retrieval to high in the clouds, and wherein N=pq, high in the clouds cannot release the key p of user, but server
Ciphertext that can be directly to user carries out match query operation.
Step 2:Pretreatment operation is carried out to ciphertext document sets beyond the clouds
High in the clouds firstly generates the copy of ciphertext document sets, and pretreatment operation is all that copy is carried out, and pretreatment is divided into three phases,
That is participle filtering, sets up inverted index, generates the weight vectors collection of document sets;
(1) participle filtering
Selection keyword, search file collection;
To each keyword, see that it whether there is in word list is disabled, if there is it is then deleted into filtering from document
Fall;Because original document collection has been encrypted in client, carry out searching ciphertext keyword in the case of encryption
Can be searched using the method for linear matched;The method for carrying out matching inquiry is as follows:
In step 1, client has used formula (1) that high in the clouds is uploaded to after stop words is encrypted, and high in the clouds preserves ciphertext and disables
, be encrypted for each keyword in document sets using formula (2) and be uploaded to high in the clouds preservation by word;
Assuming that the ciphertext keyword in the document of high in the clouds is tindex, storage ciphertext stop words beyond the clouds is ti, then have:
ti=mi+2pq+pqr2Formula (1)
tindex=mindex+2pq+pqr1Formula (2)
Filtration treatment is all carried out to each keyword in the ciphertext document of every, high in the clouds by formula (3) beyond the clouds, its inquiry
Matching formula is:
This matching operation is carried out beyond the clouds, if result is Retrieval, if Retrieval=0, ti-tindex=0, then
Expression has inquired ciphertext stop words in ciphertext document, when using above-mentioned formula, only need to upload N, wherein N=to high in the clouds
Pq, so key p cannot be drawn from N, so above-mentioned formula is safe and reliable;
After all ciphertext stop words for inquiring finally are deleted into filtration treatments, new document sets, each of which are generated
Document is represented with dj.
(2) inverted index is set up
The foundation of inverted index is placed on server end by the present invention to be carried out, and keyword k is counted under the conditions of encryptioniWord frequency TFi
And inverse document frequency IDFi, ultimately produce the inverted index table of ciphertext document sets;
Assuming that the ciphertext keyword for needing statistics beyond the clouds is ckindex, storage ciphertext keyword beyond the clouds is ci, then have
ckindex=mkindex+2pq+pqr1Formula (4)
ci=mi+2pq+pqr2Formula (5)
Order,
This matching operation is carried out beyond the clouds, in formula (6), if result is Retrieval, if Retrieval=0, ci-
ckindex=0, then it represents that be have found in this ciphertext document needs the ciphertext keyword c of statisticskindex, i.e. matching inquiry is to one
It is individual, word frequency fijNumeration count increases by 1, the number of files n of appearanceiRepresent the ciphertext keyword ckindex of the statistics in document sets
The number of documents of appearance, eventually passes the f that statistics obtains the ciphertext keywordijAnd ni。
(3) file vector collection is generated
After inverted index table updates completion, according to the keyword word frequency and inverse document frequency that count, document sets pair are generated
The weight vectors collection answered;Weight calculation framework used in the present invention is TF-IDF frameworks, and what wherein TF was represented is keyword
Word frequency, what IDF was represented is the inverse document frequency of keyword;Shown in the value such as formula (7) of the calculating weight vectors that the present invention is used:
Wherein wijRepresent keyword kiTo document djTF-IDF weight, fijRepresent keyword kiAppear in document djIn frequency,
The number of times for occurring, N represents the total quantity of document in document sets, niKeyword k is included in expression document setsiNumber of documents, N/
niRepresent keyword kiInverse document frequency;
If ciphertext document djThere is t ciphertext keyword ki, and it is separate between them, define ciphertext document djFor on t dimension spaces
Vector, ciphertext document d is tried to achieve according to above-mentioned formulajIn each ciphertext keyword kiWeighted value, and then generate ciphertext document
Plaintext weight vectors, wherein each ciphertext keyword kiThe weighted value tried to achieve is all in plain text;Then djValue such as formula (8) shown in:
Step 3:Retrieving
The encryption of plaintext search terms is uploaded to high in the clouds by client, does pre- as ciphertext document sets to searching ciphertext beyond the clouds
Treatment work, participle filter operation is carried out by searching ciphertext item, and each of which is all that corresponding ciphertext keyword is examined in ciphertext
Plaintext weighted value in rope, shown in the computing formula such as formula (9) of the plaintext weighted value:
If searching ciphertext q has t ciphertext keyword ki, and it is separate between them, searching ciphertext is tried to achieve according to above-mentioned formula
Each ciphertext keyword k in item qiPlaintext weighted value, and then generate the plaintext weight vectors of searching ciphertext, wherein each is close
Literary keyword kiThe weighted value tried to achieve is all in plain text;Then shown in the value of q such as formula (10):
Retrieving is in fact the plaintext power of the plaintext weight vectors to searching ciphertext and each ciphertext document in ciphertext document sets
Weight vector carries out Similarity Measure, and the computing formula of similarity is as follows:
If the plaintext weight vectors of ciphertext document areThe plaintext weight vectors of searching ciphertext areAccording to vector space model
Definition, if both similarities areThen have
Wherein, | dj| the plaintext vector mould and the plaintext of searching ciphertext of the ciphertext document being respectively calculated beyond the clouds with | q |
Vectorial mould, is calculated ciphertext keyword and every Similarity Measure result of ciphertext document using formula (11) beyond the clouds, and
Result of calculation is ranked up by similarity size, similarity is high, i.e., come with inquiry degree of correlation document high before,
So as to be conducive to user to search, sequencing of similarity result is returned to client and is checked for user by last high in the clouds.
Step 4:Profile download
After retrieving desired document according to previous step, the document selected can be downloaded, be input into client first
Document name to be downloaded, is uploaded to high in the clouds and preserves, using formula (13) text using formula (12) by document name encryption to be downloaded
Every document name of shelves collection is encrypted and is uploaded to high in the clouds preservation;Assuming that the cryptograph files of storage file to be downloaded beyond the clouds
Entitled cindex, the entitled c of cryptograph files of high in the clouds documenti, then have
cindex=mindex+2pq+pqr1Formula (12)
ci=mi+2pq+pqr2Formula (13)
Equally, because homomorphic cryptography technology is different to the ciphertext obtained by same data encryption twice, needing will be to be downloaded
Ciphertext document name matched by formula (14) with each ciphertext document name in document sets, if matching result is
Retrieval, then
This matching operation is carried out beyond the clouds, and the document to be downloaded can be beyond the clouds retrieved by formula (14), if
Retrieval=0, then ci-cindex=0, then it represents that the ciphertext stop words has been inquired in ciphertext document, using above-mentioned public affairs
During formula, only N, wherein N=pq need to be uploaded to high in the clouds, so being that cannot draw key p from N.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710199651.1A CN106934063B (en) | 2017-03-30 | 2017-03-30 | Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710199651.1A CN106934063B (en) | 2017-03-30 | 2017-03-30 | Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106934063A true CN106934063A (en) | 2017-07-07 |
CN106934063B CN106934063B (en) | 2020-08-07 |
Family
ID=59424866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710199651.1A Active CN106934063B (en) | 2017-03-30 | 2017-03-30 | Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106934063B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108282328A (en) * | 2018-02-02 | 2018-07-13 | 沈阳航空航天大学 | A kind of ciphertext statistical method based on homomorphic cryptography |
WO2019024838A1 (en) * | 2017-07-31 | 2019-02-07 | 腾讯科技(深圳)有限公司 | Search item generation method and relevant apparatus |
CN110309674A (en) * | 2019-07-04 | 2019-10-08 | 浙江理工大学 | A kind of sort method based on full homomorphic cryptography |
CN110737912A (en) * | 2018-09-26 | 2020-01-31 | 杨思琦 | thesis duplicate checking method based on homomorphic encryption |
CN111737719A (en) * | 2020-07-17 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting text classification method and device |
CN117278216A (en) * | 2023-11-23 | 2023-12-22 | 三亚学院 | Encryption system based on cloud computing virtualization and network storage files |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120213359A1 (en) * | 2011-02-17 | 2012-08-23 | Gradiant | Method and apparatus for secure iterative processing |
CN105323209A (en) * | 2014-06-05 | 2016-02-10 | 江苏博智软件科技有限公司 | Cloud data security protection method adopting fully homomorphic encryption technology and multiple digital watermarking technology |
CN105610910A (en) * | 2015-12-18 | 2016-05-25 | 中南民族大学 | Cloud storage oriented ciphertext full-text search method and system based on full homomorphic ciphers |
-
2017
- 2017-03-30 CN CN201710199651.1A patent/CN106934063B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120213359A1 (en) * | 2011-02-17 | 2012-08-23 | Gradiant | Method and apparatus for secure iterative processing |
CN105323209A (en) * | 2014-06-05 | 2016-02-10 | 江苏博智软件科技有限公司 | Cloud data security protection method adopting fully homomorphic encryption technology and multiple digital watermarking technology |
CN105610910A (en) * | 2015-12-18 | 2016-05-25 | 中南民族大学 | Cloud storage oriented ciphertext full-text search method and system based on full homomorphic ciphers |
Non-Patent Citations (1)
Title |
---|
吕文斌 等: "基于同态加密的密文检索方案研究", 《计算机测量与控制》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019024838A1 (en) * | 2017-07-31 | 2019-02-07 | 腾讯科技(深圳)有限公司 | Search item generation method and relevant apparatus |
US11416708B2 (en) | 2017-07-31 | 2022-08-16 | Tencent Technology (Shenzhen) Company Limited | Search item generation method and related device |
CN108282328A (en) * | 2018-02-02 | 2018-07-13 | 沈阳航空航天大学 | A kind of ciphertext statistical method based on homomorphic cryptography |
CN108282328B (en) * | 2018-02-02 | 2021-03-12 | 沈阳航空航天大学 | Ciphertext statistical method based on homomorphic encryption |
CN110737912A (en) * | 2018-09-26 | 2020-01-31 | 杨思琦 | thesis duplicate checking method based on homomorphic encryption |
CN110309674A (en) * | 2019-07-04 | 2019-10-08 | 浙江理工大学 | A kind of sort method based on full homomorphic cryptography |
CN110309674B (en) * | 2019-07-04 | 2021-10-01 | 浙江理工大学 | Ordering method based on fully homomorphic encryption |
CN111737719A (en) * | 2020-07-17 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting text classification method and device |
CN117278216A (en) * | 2023-11-23 | 2023-12-22 | 三亚学院 | Encryption system based on cloud computing virtualization and network storage files |
CN117278216B (en) * | 2023-11-23 | 2024-02-13 | 三亚学院 | Encryption system based on cloud computing virtualization and network storage files |
Also Published As
Publication number | Publication date |
---|---|
CN106934063B (en) | 2020-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106934063A (en) | A kind of homomorphic cryptography cipher text retrieval method towards cloud computing application | |
Kellaris et al. | Generic attacks on secure outsourced databases | |
US8904171B2 (en) | Secure search and retrieval | |
US20170242924A1 (en) | Masking query data access pattern in encrypted data | |
Tahir et al. | A new secure and lightweight searchable encryption scheme over encrypted cloud data | |
EP3511845B1 (en) | Encrypted message search method, message transmission/reception system, server, terminal and programme | |
Buyrukbilen et al. | Secure similar document detection with simhash | |
CN115314295B (en) | Block chain-based searchable encryption technical method | |
CN104468121B (en) | The encrypted public key of support multi-key cipher based on given server can search for encryption method | |
CN107704768A (en) | A kind of multiple key classification safety search method of ciphertext | |
Kissel et al. | Verifiable phrase search over encrypted data secure against a semi-honest-but-curious adversary | |
CN110737912A (en) | thesis duplicate checking method based on homomorphic encryption | |
CN109213731A (en) | Multi-key word cipher text retrieval method in cloud environment based on iterative cryptographic | |
Cui et al. | Harnessing encrypted data in cloud for secure and efficient image sharing from mobile devices | |
CN107454059A (en) | Search encryption method based on stream cipher under a kind of cloud storage condition | |
Ibrahim et al. | Approximate keyword-based search over encrypted cloud data | |
Moataz et al. | Privacy-preserving multiple keyword search on outsourced data in the clouds | |
Liu et al. | Achieving secure and efficient cloud search services: Cross-lingual multi-keyword rank search over encrypted cloud data | |
Agun et al. | Privacy and efficiency tradeoffs for multiword top k search with linear additive rank scoring | |
Saha et al. | Efficient protocols for private database queries | |
Gopal et al. | Secure similarity based document retrieval system in cloud | |
Devi et al. | A comparative study on homomorphic encryption algorithms for data security in cloud environment | |
Aritomo et al. | A privacy-preserving similarity search scheme over encrypted word embeddings | |
Uplavikar et al. | Lucene-P $^ 2 $2: A Distributed Platform for Privacy-Preserving Text-Based Search | |
Shaon et al. | A practical framework for executing complex queries over encrypted multimedia data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |