CN106934063B - Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application - Google Patents

Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application Download PDF

Info

Publication number
CN106934063B
CN106934063B CN201710199651.1A CN201710199651A CN106934063B CN 106934063 B CN106934063 B CN 106934063B CN 201710199651 A CN201710199651 A CN 201710199651A CN 106934063 B CN106934063 B CN 106934063B
Authority
CN
China
Prior art keywords
ciphertext
document
cloud
formula
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710199651.1A
Other languages
Chinese (zh)
Other versions
CN106934063A (en
Inventor
拱长青
肖芸
林娜
郭振洲
李席广
赵亮
孟庆杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aerospace University
Original Assignee
Shenyang Aerospace University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aerospace University filed Critical Shenyang Aerospace University
Priority to CN201710199651.1A priority Critical patent/CN106934063B/en
Publication of CN106934063A publication Critical patent/CN106934063A/en
Application granted granted Critical
Publication of CN106934063B publication Critical patent/CN106934063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a homomorphic encrypted ciphertext retrieval method for cloud computing application, wherein a preprocessing stage is put at a cloud end for implementation, and TF-IDF weight vectors of a document are obtained by computing at the cloud end; the cloud end can perform homomorphic operation and calculation on the ciphertext based on the homomorphic encrypted documents to construct a TF-IDF weight vector of each document, and the TF-IDF weight vector exists in a plaintext mode; when ciphertext retrieval is carried out, the client encrypts the retrieval item and uploads the retrieval item to the cloud, the cloud operates the ciphertext retrieval item to obtain corresponding TF-IDF weight vector plaintext, and then the similarity between the retrieval item and each document in the document set is calculated to obtain a plaintext sequencing result; compared with the scheme that the preprocessing stage is put to the client side for processing, the method reduces the computing pressure of the client side, can fully utilize the strong computing power and storage power of cloud computing to operate data, and improves the retrieval work efficiency.

Description

Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application
The technical field is as follows:
the invention belongs to the technical field of cloud computing application, and particularly relates to a homomorphic encrypted ciphertext retrieval method for cloud computing application.
Background art:
cloud computing is an innovative service model, enables users to obtain nearly unlimited computing power and rich and diverse information services at any time through the internet, and is an evolution of distributed computing, parallel computing and grid computing. Cloud storage also belongs to the cloud computing category, and the application is gradually popularized. With the rapid development of cloud computing, a large amount of sensitive information is concentrated to the cloud. In order to avoid leakage of private data of a client, the private data needs to be encrypted and then stored in the cloud. When the ciphertext data stored at the cloud server side develops to a certain scale, effective retrieval of the ciphertext data is an urgent problem to be solved.
In the existing homomorphic encrypted ciphertext retrieval scheme, a large amount of work such as preprocessing of a document set is generally placed on a client side for retrieval. In 2014, a document applies a TF-IDF vector retrieval model to a homomorphic encrypted ciphertext retrieval scheme, and the basic idea of the scheme is to select corresponding keywords for a document set at a client, calculate TF-IDF weights of the keywords for all documents in the document set, construct TF-IDF weight vectors of all documents, and obtain a weight vector set of the document set. And then, respectively encrypting the vectors in the weight vector set and uploading the vectors to a cloud for storage. During retrieval, firstly, the weight vector of a retrieval item is calculated at a client, the weight vector is encrypted and uploaded to the cloud, then, the similarity ciphertext of the retrieval item and each document in the document set is calculated by using the weight vector ciphertext of the retrieval item and the weight vector set ciphertext of the document set and returned to a user, and the user decrypts the sequencing ciphertext at the client to obtain a plaintext sequencing result.
According to the ciphertext retrieval scheme, the computing pressure of the client is increased, and the computing capability of the cloud is not fully utilized.
According to the homomorphic ciphertext retrieval method for cloud computing, a large amount of work such as preprocessing of a document set is moved to the cloud end to be carried out, the cloud end carries out preprocessing on the encrypted document, the computing capability of the cloud end can be fully utilized, and efficient homomorphic ciphertext retrieval is further achieved.
Disclosure of Invention
In order to overcome the defects that the computing power of a cloud end is not fully utilized in a ciphertext retrieval scheme, so that the computing pressure of a client end is very high and the like, the invention provides a homomorphic encrypted ciphertext retrieval method which fully utilizes the computing power of the cloud end, namely facing cloud computing application.
The technical scheme adopted by the invention for solving the technical problems is as follows: a homomorphic encrypted ciphertext retrieval method for cloud computing application is characterized by comprising the following steps:
step 1: encrypting a document set at a client and uploading the document set to a cloud
At a client, a user encrypts each document in a document set by adopting an integer homomorphic encryption algorithm, and uploads the documents to a cloud for storage.
(1) Encryption algorithm
KeyGen selects a secure large prime number of randomly generated P bits as the key P, where P ∈ [2 ]P-1,2P];
Encrypt (m) randomly selecting a Q-bit safe big prime number Q, wherein Q ∈ [2 ]Q-1,2Q],P>Q>Plaintext block length, random number r randomly selected in a predetermined time interval, and M is divided into M1m2m3...mt(miLength of L), the ciphertext c is calculatedi=mi+2pq + pqr, and a ciphertext can be obtained;
decrypt (c): calculate mi=cimod p, get M plaintext message1m2m3...mtObtaining a decrypted plaintext;
and (3) carrying out homomorphism analysis: is provided with two plaintexts m1,m2The corresponding ciphertext is c1,c2Then, then
c1=m1+2pq+pqr1
c2=m2+2pq+pqr2
Additive homomorphism analysis:
c1+c2=(m1+m2)+4pq+pq(r1+r2) Because of (c)1+c2)mod p=m1+m2Thus the algorithm satisfies additive homomorphism;
multiplicative homomorphism analysis:
c1*c2=(m1*m2)+2m1pq+m1pqr2+2m2pq+4p2q2+2p2q2r2+m2pqr1+2p2q2r1+p2q2r1r2=(m1*m2)+2pq(m1+m2)+pq(m1r2+m2r1)+2p2q2(r2+r1)+4p2q2+p2q2r1r2,
because (c)1*c2)mod p=m1*m2Thus the algorithm satisfies the multiplicative homomorphism.
(2) Keyword mindexQuery matching algorithm of
a. Encrypting the key words at the client by adopting the encryption algorithm to obtain the corresponding ciphertext cindex=mindex+2pq + pqr, and uploading to the cloud;
b. ciphertext keyword c received by cloudindexThen, a ciphertext matching algorithm is adopted for query, wherein N is pq, and a matching formula Retrieval is (c)i-cindex)mod N=((mi-mindex)+pq(r1-r2))mod N;
If Retrieval is 0, then mi-mindex0, i.e. when mi=mindexOf (c) is (r)1-r2) And (2) mod N is 0, so that Retrieval is 0, only N needs to be uploaded to the cloud end during Retrieval, wherein N is pq, the cloud end cannot push out the secret key p of the user, but the server can directly perform query matching operation on the ciphertext of the user.
Step 2: preprocessing operation is carried out on ciphertext document set at cloud
The cloud end firstly generates a copy of the ciphertext document set, preprocessing operation is carried out on the copy, and preprocessing is divided into three stages, namely word segmentation filtering, reverse index building and weight vector set generation of the document set.
(1) Word segmentation filtering
English generally divides entries by spaces, while chinese characters cannot do so, the simplest way being to use a single chinese character as an entry.
After word segmentation, keywords without any meaning to the retrieval of the document set need to be removed, namely words are stopped. For each keyword, judging whether the keyword exists in a stop word list or not, and if so, deleting and filtering the keyword from the document; because the original document set is encrypted at the client, the ciphertext stop words can be searched by adopting a linear matching method when the ciphertext stop words are searched under the encryption condition; the method for performing matching query is as follows:
in the step 1, the client encrypts the stop words by adopting a formula (1) and uploads the encrypted stop words to the cloud, the cloud stores the ciphertext stop words, and each keyword in the document set is encrypted by adopting a formula (2) and uploaded to the cloud for storage;
assume that the ciphertext keyword in the cloud document is tindexThe ciphertext stop word stored in the cloud is tiThen, there are:
ti=mi+2pq+pqr2formula (1)
tindex=mindex+2pq+pqr1Formula (2)
Filtering each keyword in each ciphertext document in the cloud through a formula (3), wherein the query matching formula is as follows:
Figure GDA0002546784520000042
the matching operation is performed in the cloud, the result is Retireval, and if Retireval is 0, t isi-tindexWhen the formula is used, only N needs to be uploaded to the cloud, wherein N is pq, so that the secret key p cannot be obtained from N, and the formula is safe and reliable;
finally, after all searched ciphertext stop words are deleted and filtered, a new document set is generated, wherein each document is used as djAnd (4) showing.
(2) Establishing inverted index
The invention indexes in reverseThe establishment is carried out at a server side, and the key word k is counted under the encryption conditioniWord frequency TFiAnd inverse document frequency IDFiFinally, generating an inverted index table of the ciphertext document set;
let c be the ciphertext keyword to be counted at the cloud endkindexThe key word of the ciphertext stored in the cloud is ciThen there is
ckindex=mkindex+2pq+pqr1Formula (4)
ci=mi+2pq+pqr2Formula (5)
In order to ensure that the water-soluble organic acid,
Figure GDA0002546784520000041
the matching operation is performed in the cloud, and in formula (6), let the result be Retrieval, and if Retrieval is 0, ci-ckindexIf 0, it means that the ciphertext keyword c needing to be counted is found in the ciphertext documentkindexI.e. match to one, word frequency fijCount increased by 1, number of documents present niRepresenting the number of the documents of the ciphertext keyword ckindex in the document set, and finally obtaining f of the ciphertext keyword through statisticsijAnd ni(ii) a When the above formula is used, only N needs to be uploaded to the cloud, where N equals pq, so that the key p cannot be obtained from N, and therefore, the above formula is safe and reliable.
Recording the data obtained according to the statistics, as shown in table 1;
TABLE 1 inverted index Table
Figure GDA0002546784520000051
As shown in table 1, the document names and keywords are still encrypted at the cloud, but the document numbers, the occurrence frequencies, and the number of documents are obtained in plaintext at the cloud.
(3) Generating a set of file vectors
When a user searches keywords, a document set to be searched corresponds to a weight vector set of the document set, and after the updating of the inverted index table is completed, a weight vector set corresponding to the document set is generated according to the counted keyword word frequency and the counted inverse document frequency; the weight calculation framework used by the invention is a TF-IDF framework, wherein TF represents the word frequency of the keyword, and IDF represents the inverse document frequency of the keyword; the value of the calculation weight vector adopted by the invention is shown in formula (7):
Figure GDA0002546784520000052
wherein wijRepresenting a keyword kiFor document djTF-IDF weight of (f)ijRepresenting a keyword kiAppear in document djThe frequency of (1), i.e., the number of occurrences, N represents the total number of documents in the document set, NiRepresenting the inclusion of a keyword k in a document setiNumber of documents, N/NiRepresenting a keyword kiThe inverse document frequency of (d);
let ciphertext document djThere are t ciphertext keywords kiAnd independent of each other, define a ciphertext document djFor the vector on the t-dimensional space, the ciphertext document d is obtained according to the formulajIn each ciphertext key word kiTo generate a plaintext weight vector of the ciphertext document, wherein each ciphertext keyword kiThe obtained weight values are all plaintext; then djThe value of (c) is shown in equation (8):
Figure GDA0002546784520000053
and step 3: search process
The client encrypts and uploads the plaintext retrieval item to the cloud, the ciphertext retrieval item is preprocessed like a ciphertext document set at the cloud, the ciphertext retrieval item is subjected to word segmentation and filtering, the process is like the word segmentation and the filtering of ciphertext stop words of the ciphertext document at the cloud, then an inverted index of the ciphertext retrieval item is established, and finally a plaintext weight vector of the ciphertext retrieval item is generated, wherein each item is a plaintext weight value of a corresponding ciphertext keyword in the ciphertext retrieval item, and a calculation formula of the plaintext weight value is shown as a formula (9):
Figure GDA0002546784520000061
let ciphertext search term q have t ciphertext keywords kiAnd they are independent from each other, and according to the above-mentioned formula every ciphertext key word k in the ciphertext search item q can be obtainediAnd generating a plaintext weight vector of ciphertext search terms, wherein each ciphertext keyword kiThe obtained weight values are all plaintext; the value of q is then as shown in equation (10):
Figure GDA0002546784520000062
in the searching process, the similarity between the plaintext weight vector of the ciphertext searching item and the plaintext weight vector of each ciphertext document in the ciphertext document set is calculated, and the calculation formula of the similarity is as follows:
let the plaintext weight vector of the ciphertext document be
Figure GDA0002546784520000063
The plaintext weight vector of the ciphertext search term is
Figure GDA0002546784520000064
According to the definition of the vector space model, the similarity between the two is set as
Figure GDA0002546784520000065
Then there is
Figure GDA0002546784520000066
Wherein, | djI and | q | are respectively a plaintext vector mode of a ciphertext document and a plaintext vector mode of a ciphertext retrieval item obtained through calculation at the cloud end, and a ciphertext key is obtained through calculation at the cloud end by using a formula (11)And (3) calculating the similarity of the key words and each ciphertext document, sorting the calculation results according to the similarity, arranging the documents with high similarity, namely the documents with high query correlation in front, so that the search by the user is facilitated, and finally, returning the similarity sorting results to the client side by the cloud side for the user to check.
And 4, step 4: document download
After the desired document is retrieved according to the previous step, the selected document can be downloaded, firstly, the name of the document to be downloaded is input at the client, the name of the document to be downloaded is encrypted and uploaded to the cloud for storage by adopting a formula (12), and each name of the document in a document set is encrypted by adopting a formula (13) and uploaded to the cloud for storage; suppose the ciphertext file name of the file to be downloaded stored in the cloud is cindexAnd the file name of the ciphertext of the cloud document is cfiThen there is
cfindex=mfindex+2pq+pqr1Formula (12)
cfi=mfi+2pq+pqr2Formula (13)
Similarly, because the ciphertext obtained by encrypting the same data twice by the homomorphic encryption technology is different, the ciphertext document name to be downloaded is required to be matched with each ciphertext document name in the document set by a formula (14), and if the matching result is Retrieval, the ciphertext document name to be downloaded is required to be matched with each ciphertext document name in the document set
Figure GDA0002546784520000071
The matching operation is performed at the cloud, the document to be downloaded can be retrieved at the cloud through formula (14), and if Retrieval is 0, c is performedi-cindexWhen the above formula is used, only N needs to be uploaded to the cloud, where N is pq, so that the key p cannot be obtained from N.
The invention has the technical characteristics and beneficial effects that:
in the field of cloud computing ciphertext retrieval application, a homomorphic encrypted ciphertext retrieval method based on a TF-IDF vector retrieval model has advantages, most of the existing schemes are that a large amount of work such as preprocessing of a document set is placed on a client side to be processed, the client side processes a large amount of work, the cloud side only processes a small amount of work, and the shortcoming of the method is that the pressure of the client side is increased, and the strong computing capacity of the cloud side is not utilized.
In the invention, the preprocessing stage is put in the cloud for implementation, and compared with the scheme that the preprocessing stage is put in the client for processing, the method has the advantages that: the computing pressure of the client is reduced, the strong computing capacity and storage capacity of cloud computing can be fully utilized to operate data, and the retrieval work efficiency is improved.
The innovation points of the invention can be summarized as follows:
(1) a great deal of work such as preprocessing of a document set and the like is migrated to the cloud as much as possible, and TF-IDF weight vectors of the documents are obtained through cloud computing;
(2) the cloud end can perform homomorphic operation and calculation on the ciphertext based on the homomorphic encrypted documents to construct a TF-IDF weight vector of each document, and the TF-IDF weight vector exists in a plaintext mode;
(3) when ciphertext retrieval is carried out, the client encrypts the retrieval item and uploads the retrieval item to the cloud, the cloud operates the ciphertext retrieval item to obtain corresponding TF-IDF weight vector plaintext, and then the similarity between the retrieval item and each document in the document set is calculated, and a plaintext sequencing result is obtained.
Drawings
FIG. 1 is a flow chart of the general steps of the proposed method;
FIG. 2 is a flowchart of the steps 1 and 2 of encrypting and uploading the document and preprocessing operation in detail;
FIG. 3 is a flow chart showing the operation of steps 1,2 and 3 in the present invention;
FIG. 4 is a specific flowchart of the step 4 document downloading in the present invention.
The specific implementation mode is as follows:
the invention provides a homomorphic ciphertext retrieval method facing cloud computing, which is characterized in that a large amount of work such as preprocessing of a document set is moved to a cloud end for preprocessing, the cloud end preprocesses an encrypted document, the computing capability of the cloud end can be fully utilized, and efficient homomorphic ciphertext retrieval is further realized. The detailed description is given below with reference to the accompanying drawings and embodiments.
The flow of the method provided by the invention is shown in figure 1, and comprises the following steps:
step 1: encrypting a document set at a client and uploading the document set to a cloud
(1) Encryption algorithm
Randomly selecting a key p as 131;
q and the random number r are randomly selected within a specified time interval.
Selecting four documents as a document set, wherein 7 keywords are randomly selected from each document, and the document contents are as follows:
test1={I happiness on the others my hands}
test2={I was father wind happiness I is}
test3={my father is a player player birthday}
test4={happiness father others was of my others}
the four documents are respectively encrypted, and the obtained ciphertext documents are as follows:
test1={dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca01142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510dec9ba2a7}
test2={56e1d11107f444633216d92f822262c9ad76c9b1444676c9bbfb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d}
test3={4d9586fb6512cf29ba295980a7ccc6ed2fb1e59902540d3e12fb70f21049ec114217285a811d70d123a3f13327b285b7e8b2d8ba081173c7954e3af8bb70f37903e94f0211a56534c3b320756e3511d70d}
test4={f8971173af126bedb3f5e53c9a123a48d61543af8c412d2f895df6394a7bc4d95344467dc4a759fe6444764d95363abdc4a7285ba8ebc11300c7c6ff2b759f51b3c692daaa7ebd4464761959fdfa15811300d872d02}
and uploading the encrypted document set to the cloud end in a ciphertext form for storage.
Step 2: preprocessing operation is carried out on ciphertext document set at cloud
(1) Word segmentation filtering
English generally divides entries by spaces, and after word division, keywords which have no meaning to the retrieval of a document set need to be removed, which are called stop words. The StandardAnalyzer is used, and a constructor StandardAnalyzer () without parameters is used, in the constructor without parameters of the StandardAnalyzer class, a filter character array STOP _ word is appointed, and the STOP _ word needs to be encrypted and uploaded to a cloud end for storage by a homomorphic encryption algorithm.
In this embodiment, the stop word is { on, the, was, is, a }, since the original document set is encrypted at the client, searching for the ciphertext stop word in the encrypted case needs to be performed by using a linear matching method, and the method for performing matching query is as follows:
assume that the ciphertext keyword in the cloud document is tindexThe ciphertext stop word stored in the cloud is tiAccording to the formula (3), a ciphertext stop word matching query can be performed on the ciphertext keyword of each document in the dense document set, and if Retrieval is 0, t isi-tindex0, then ti=tindexThen prove this keyword tindexThe stop words are searched and deleted and filtered.
Finally, after all searched ciphertext stop words are deleted and filtered, a new document set is generated, wherein each document is used as djRepresents;
(2) establishing inverted index
Let c be the ciphertext keyword to be counted at the cloud endwindEach ciphertext keyword stored in the cloud ciphertext document test1 ═ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca01142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510dec9ba2a 84 } is labeled as c7 ═ 107b965d18369814bd43b }i
Then, it is calculated according to equation (6), and if Retrieval is 0, c is calculatedi-cwind0, then ci=cwindThen, it means that the ciphertext keyword c needing to be counted is found in the ciphertext document test1windI.e. match to one, word frequency fijCount 1, number of documents present niCiphertext key c representing the statisticswindThe number of the documents appearing in the document set is matched with four ciphertext documents, and finally, the ciphertext keyword c is obtained through statisticswindN of (A) to (B)i=1;
The data obtained by statistics are shown in table 2;
TABLE 2 inverted index Table
Figure GDA0002546784520000111
As shown in table 2, the document names and the keywords are still encrypted at the cloud, but the document numbers, the occurrence frequencies, and the number of documents are obtained in plaintext at the cloud.
(3) Generating a set of file vectors
Obtained according to equation (7):
d1={i=1,happiness=0,others=1,my=0,hands=2,father=0,wind=0,player=0,birthday=0}
d2={i=2,happiness=0,others=0,my=0,hands=0,father=0.6,wind=2,player=0,birthday=0}
d3={i=0,happiness=0,others=0,my=0,hands=0,father=0.3,wind=0,player=4,birthday=2}
d4={i=0,happiness=0,others=2,my=0,hands=0,father=0.3,wind=0,player=0,birthday=0}
the weight vector set of the document set is finally obtained according to the formula (8) as follows:
Figure GDA0002546784520000121
Figure GDA0002546784520000122
Figure GDA0002546784520000123
Figure GDA0002546784520000124
and step 3: search process
For example, the query "gather" search term can be obtained according to steps 1 and 2, and the weight vector is
Figure GDA0002546784520000125
And finally, calculating the similarity of the weight vectors of the retrieval items and the document set according to a formula (11), and returning the similarity to the client side from large to small, wherein the obtained result is as follows:
Figure GDA0002546784520000126
Figure GDA0002546784520000127
Figure GDA0002546784520000128
Figure GDA0002546784520000129
and 4, step 4: document download
According to the retrieval in the previous step, the similarity between the document 2 and the retrieval item is highest, and if the document 2 is selected for downloading, the specific implementation flow is as follows:
inputting 'test 2' in the document to be downloaded, and uploading the name of the retrieved document to the cloud end by performing homomorphic encryption on the client end, wherein the matching operation is performed on the cloud end.
The name set of the ciphertext document stored in the cloud is { d2fc02540950b013217a7893ebcd83e12110184cc700e27ad4d95f25409d2fbf60331130099c9ad7d2fb1aaa784d95f11a525, the ciphertext of the name of the document to be downloaded is {18d8c3e121129d93222756fb1e }, the document to be downloaded can be retrieved at the cloud end through a formula (14), and if the Retrieval is 0, the document to be downloaded is ci-cindexAnd 0, this indicates that the ciphertext document is queried in the ciphertext document, and then the user may confirm to download to the client.
And 4, step 4: downloaded document decryption
The client receives the downloaded ciphertext document test2 { (56 e1d11107f444633216d92f822262c9ad76c9b1444676c9bb # fb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d }, and then decrypts the ciphertext document test2, wherein the decryption algorithm is as follows:
decrypt (c): calculate mi=cimod p, get M plaintext message1m2m3...mtAnd obtaining the decrypted plaintext document test2 ═ { I was fast with hash is }, namely correctly downloading and decrypting the ciphertext document.

Claims (1)

1. A homomorphic encrypted ciphertext retrieval method for cloud computing application is characterized by comprising the following steps:
step 1: encrypting a document set at a client and uploading the document set to a cloud
At a client, a user encrypts each document in a document set by adopting an integer fully homomorphic encryption algorithm, and uploads the documents to a cloud for storage;
(1) encryption algorithm
KeyGen chooses a randomly generated P-bit secure large prime number as the key P, where P ∈ [2 ]P-1,2P];
Encrypt (m) randomly selecting a Q-bit safe big prime number Q, wherein Q ∈ [2 ]Q-1,2Q],P>Q>Plaintext block length, randomly generating a random number r, and grouping M into M1m2m3...mt(miLength of L), the ciphertext c is calculatedi=mi+2pq + pqr, and a ciphertext can be obtained;
decrypt (c): calculate mi=cimod p, get M plaintext message1m2m3...mtObtaining a decrypted plaintext;
and (3) carrying out homomorphism analysis: is provided with two plaintexts m1,m2The corresponding ciphertext is c1,c2Then, then
c1=m1+2pq+pqr1
c2=m2+2pq+pqr2
Additive homomorphism analysis:
c1+c2=(m1+m2)+4pq+pq(r1+r2) Because of (c)1+c2)mod p=m1+m2Thus the algorithm satisfies additive homomorphism;
multiplicative homomorphism analysis:
c1*c2=(m1*m2)+2m1pq+m1pqr2+2m2pq+4p2q2+2p2q2r2+m2pqr1+2p2q2r1+p2q2r1r2=(m1*m2)+2pq(m1+m2)+pq(m1r2+m2r1)+2p2q2(r2+r1)+4p2q2+p2q2r1r2,
because (c)1*c2)mod p=m1*m2Therefore, the algorithm satisfies the multiplication homomorphism;
(2) keyword mindexQuery matching algorithm of
a. Encrypting the key words at the client by adopting the encryption algorithm to obtain the corresponding ciphertext cindex=mindex+2pq + pqr, and uploading to the cloud;
b. ciphertext keyword c received by cloudindexThen, a ciphertext matching algorithm is adopted for query, wherein N is pq, and a matching formula Retrieval is (c)i-cindex)mod N=((mi-mindex)+pq(r1-r2))mod N;
If Retrieval is 0, then mi-mindex0, i.e. when mi=mindexOf (c) is (r)1-r2) The Retrieval method comprises the following steps that (1) mod N is 0, so that Retrieval is 0, N is uploaded to a cloud only during Retrieval, wherein N is pq, the cloud cannot push out a secret key p of a user, and a server can directly perform query matching operation on a ciphertext of the user;
step 2: preprocessing operation is carried out on ciphertext document set at cloud
The cloud end firstly generates a copy of a ciphertext document set, preprocessing operation is carried out on the copy, and preprocessing is divided into three stages, namely word segmentation filtering, reverse index establishment and weight vector set generation of the document set;
(1) word segmentation filtering
Selecting keywords and retrieving a document set;
for each keyword, judging whether the keyword exists in a stop word list or not, and if so, deleting and filtering the keyword from the document; because the original document set is encrypted at the client, the ciphertext keywords can be searched by adopting a linear matching method when the original document set is encrypted; the method for performing matching query is as follows:
in the step 1, the client encrypts the stop words by adopting a formula (1) and uploads the encrypted stop words to the cloud, the cloud stores the ciphertext stop words, and each keyword in the document set is encrypted by adopting a formula (2) and uploaded to the cloud for storage;
assume that the ciphertext keyword in the cloud document is tindexThe ciphertext stop word stored in the cloud is tiThen, there are:
ti=mi+2pq+pqr2formula (1)
tindex=mindex+2pq+pqr1Formula (2)
Filtering each keyword in each ciphertext document in the cloud through a formula (3), wherein the query matching formula is as follows:
Figure FDA0002546784510000021
the matching operation is performed in the cloud, the result is Retireval, and if Retireval is 0, t isi-tindexWhen the formula is used, only N needs to be uploaded to the cloud, wherein N is pq, so that the secret key p cannot be obtained from N, and the formula is safe and reliable;
finally, after all searched ciphertext stop words are deleted and filtered, a new document set is generated, wherein each document is used as djRepresents;
(2) establishing inverted index
The establishment of the inverted index is carried out at the server side, and the key word k is counted under the encryption conditioniWord frequency TFiAnd inverse document frequency IDFiFinally, generating an inverted index table of the ciphertext document set;
let c be the ciphertext keyword to be counted at the cloud endkindexThe key word of the ciphertext stored in the cloud is ciThen there is
ckindex=mkindex+2pq+pqr1Formula (4)
ci=mi+2pq+pqr2Formula (5)
In order to ensure that the water-soluble organic acid,
Figure FDA0002546784510000031
the matching operation is performed in the cloud, and in formula (6), let the result be Retrieval, and if Retrieval is 0, ci-ckindexIf 0, it means that the ciphertext keyword c needing to be counted is found in the ciphertext documentkindexI.e. match to one, word frequency fijCount increased by 1, number of documents present niCiphertext key c representing the statisticskindexF, the number of the documents appearing in the document set is finally obtained through statisticsijAnd ni
(3) Generating a set of file vectors
After the updating of the reverse index table is completed, generating a weight vector set corresponding to the document set according to the counted word frequency of the keywords and the frequency of the inverse documents; the weight calculation framework used is a TF-IDF framework, wherein TF represents the word frequency of the keyword, and IDF represents the inverse document frequency of the keyword; the values of the calculated weight vector used are shown in equation (7):
Figure FDA0002546784510000032
wherein wijRepresenting a keyword kiFor document djTF-IDF weight of (f)ijRepresenting a keyword kiAppear in document djThe frequency of (1), i.e., the number of occurrences, N represents the total number of documents in the document set, NiRepresenting the inclusion of a keyword k in a document setiNumber of documents, N/NiRepresenting a keyword kiThe inverse document frequency of (d);
let ciphertext document djThere are t ciphertext keywords kiAnd independent of each other, define a ciphertext document djFor the vector on the t-dimensional space, the ciphertext document d is obtained according to the formulajIn each ciphertext key word kiTo generate a plaintext weight vector of the ciphertext document, wherein each ciphertext keyword kiThe obtained weight values are all plaintext; then djThe value of (c) is shown in equation (8):
Figure FDA0002546784510000041
and step 3: search process
The client encrypts and uploads the plaintext retrieval items to the cloud, the cloud conducts preprocessing work the ciphertext retrieval items as same as a ciphertext document set, and the ciphertext retrieval items are subjected to word segmentation and filtering operation, wherein each item is a plaintext weighted value of a corresponding ciphertext keyword in the ciphertext retrieval item, and a calculation formula of the plaintext weighted value is as shown in a formula (9):
Figure FDA0002546784510000042
let ciphertext search term q have t ciphertext keywords kiAnd they are independent from each other, and according to the above-mentioned formula every ciphertext key word k in the ciphertext search item q can be obtainediAnd generating a plaintext weight vector of ciphertext search terms, wherein each ciphertext keyword kiThe obtained weight values are all plaintext; the value of q is then as shown in equation (10):
Figure FDA0002546784510000043
in the searching process, the similarity between the plaintext weight vector of the ciphertext searching item and the plaintext weight vector of each ciphertext document in the ciphertext document set is calculated, and the calculation formula of the similarity is as follows:
let the plaintext weight vector of the ciphertext document be
Figure FDA0002546784510000044
The plaintext weight vector of the ciphertext search term is
Figure FDA0002546784510000045
According to the definition of the vector space model, the similarity between the two is set as
Figure FDA0002546784510000046
Then there is
Figure FDA0002546784510000047
Wherein, | djI and | q | are respectively a plaintext vector mode of a ciphertext document and a plaintext vector mode of a ciphertext retrieval item obtained by calculation at the cloud end, a calculation result of the similarity between the ciphertext keyword and each ciphertext document is obtained by calculation at the cloud end by using a formula (11), the calculation results are sorted according to the similarity, and the similarity is high, namely, the similarity is high with the similarity of the ciphertext documentThe documents with high inquiry correlation degree are arranged in front, so that the searching by a user is facilitated, and finally, the cloud returns the similarity sequencing result to the client for the user to check;
and 4, step 4: document download
After the desired document is retrieved according to the previous step, the selected document can be downloaded, firstly, the name of the document to be downloaded is input at the client, the name of the document to be downloaded is encrypted and uploaded to the cloud for storage by adopting a formula (12), and each name of the document in a document set is encrypted by adopting a formula (13) and uploaded to the cloud for storage; suppose the ciphertext file name of the file to be downloaded stored in the cloud is cfindexAnd the file name of the ciphertext of the cloud document is cfiThen there is
cfindex=mfindex+2pq+pqr1Formula (12)
cfi=mfi+2pq+pqr2Formula (13)
Similarly, because the ciphertext obtained by encrypting the same data twice by the homomorphic encryption technology is different, the ciphertext document name to be downloaded is required to be matched with each ciphertext document name in the document set by a formula (14), and if the matching result is Retrieval, the ciphertext document name to be downloaded is required to be matched with each ciphertext document name in the document set
Figure FDA0002546784510000051
The matching operation is performed at the cloud, the document to be downloaded can be retrieved at the cloud through formula (14), and if Retrieval is 0, c is performedi-cindexWhen the above formula is used, only N needs to be uploaded to the cloud, where N is pq, so that the key p cannot be obtained from N.
CN201710199651.1A 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application Active CN106934063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710199651.1A CN106934063B (en) 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710199651.1A CN106934063B (en) 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Publications (2)

Publication Number Publication Date
CN106934063A CN106934063A (en) 2017-07-07
CN106934063B true CN106934063B (en) 2020-08-07

Family

ID=59424866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710199651.1A Active CN106934063B (en) 2017-03-30 2017-03-30 Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Country Status (1)

Country Link
CN (1) CN106934063B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019658B (en) 2017-07-31 2023-01-20 腾讯科技(深圳)有限公司 Method and related device for generating search term
CN108282328B (en) * 2018-02-02 2021-03-12 沈阳航空航天大学 Ciphertext statistical method based on homomorphic encryption
CN110737912A (en) * 2018-09-26 2020-01-31 杨思琦 thesis duplicate checking method based on homomorphic encryption
CN110309674B (en) * 2019-07-04 2021-10-01 浙江理工大学 Ordering method based on fully homomorphic encryption
CN111737719B (en) * 2020-07-17 2020-11-24 支付宝(杭州)信息技术有限公司 Privacy-protecting text classification method and device
CN117278216B (en) * 2023-11-23 2024-02-13 三亚学院 Encryption system based on cloud computing virtualization and network storage files

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105323209A (en) * 2014-06-05 2016-02-10 江苏博智软件科技有限公司 Cloud data security protection method adopting fully homomorphic encryption technology and multiple digital watermarking technology
CN105610910A (en) * 2015-12-18 2016-05-25 中南民族大学 Cloud storage oriented ciphertext full-text search method and system based on full homomorphic ciphers

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8837715B2 (en) * 2011-02-17 2014-09-16 Gradiant, Centro Tecnolóxico de Telecomunicacións de Galica Method and apparatus for secure iterative processing and adaptive filtering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105323209A (en) * 2014-06-05 2016-02-10 江苏博智软件科技有限公司 Cloud data security protection method adopting fully homomorphic encryption technology and multiple digital watermarking technology
CN105610910A (en) * 2015-12-18 2016-05-25 中南民族大学 Cloud storage oriented ciphertext full-text search method and system based on full homomorphic ciphers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于同态加密的密文检索方案研究;吕文斌 等;《计算机测量与控制》;20160331;第24卷(第3期);第154-158页 *

Also Published As

Publication number Publication date
CN106934063A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
CN106934063B (en) Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application
US11843687B2 (en) Systems, devices, and processes for homomorphic encryption
CN107480163B (en) Efficient ciphertext image retrieval method supporting privacy protection in cloud environment
CN106815350B (en) Dynamic ciphertext multi-keyword fuzzy search method in cloud environment
US9313179B1 (en) Mixing secure and insecure data and operations at server database
US8904171B2 (en) Secure search and retrieval
EP3012754B1 (en) Searchable symmetric encryption processing system
US9094378B1 (en) Homomorphic cryptography on numerical values in digital computing
Lu et al. Enabling search over encrypted multimedia databases
CN111026788B (en) Homomorphic encryption-based multi-keyword ciphertext ordering and retrieving method in hybrid cloud
CN109471964B (en) Synonym set-based fuzzy multi-keyword searchable encryption method
CN109145079B (en) Cloud searchable encryption method based on personal interest user model
Chen et al. EliMFS: achieving efficient, leakage-resilient, and multi-keyword fuzzy search on encrypted cloud data
US9613292B1 (en) Secure multi-dimensional pattern matching for secure search and recognition
CN115314295B (en) Block chain-based searchable encryption technical method
Du et al. GraphShield: Dynamic large graphs for secure queries with forward privacy
CN110737912A (en) thesis duplicate checking method based on homomorphic encryption
Handa et al. A cluster based multi-keyword search on outsourced encrypted cloud data
Yi et al. Private searching for single and conjunctive keywords on streaming data
CN113630250B (en) Model training method and system based on data encryption
Gui et al. Security analysis of {MongoDB} queryable encryption
CN108920968B (en) File searchable encryption method based on connection keywords
CN116644146A (en) Document searching method, device and system, electronic equipment and storage medium
Agun et al. Privacy and efficiency tradeoffs for multiword top k search with linear additive rank scoring
CN108062406B (en) Index file construction method and device, index method and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant