CN106934063B

CN106934063B - Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

Info

Publication number: CN106934063B
Application number: CN201710199651.1A
Authority: CN
Inventors: 拱长青; 肖芸; 林娜; 郭振洲; 李席广; 赵亮; 孟庆杰
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Aerospace University
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2020-08-07
Anticipated expiration: 2037-03-30
Also published as: CN106934063A

Abstract

The invention provides a homomorphic encrypted ciphertext retrieval method for cloud computing application, wherein a preprocessing stage is put at a cloud end for implementation, and TF-IDF weight vectors of a document are obtained by computing at the cloud end; the cloud end can perform homomorphic operation and calculation on the ciphertext based on the homomorphic encrypted documents to construct a TF-IDF weight vector of each document, and the TF-IDF weight vector exists in a plaintext mode; when ciphertext retrieval is carried out, the client encrypts the retrieval item and uploads the retrieval item to the cloud, the cloud operates the ciphertext retrieval item to obtain corresponding TF-IDF weight vector plaintext, and then the similarity between the retrieval item and each document in the document set is calculated to obtain a plaintext sequencing result; compared with the scheme that the preprocessing stage is put to the client side for processing, the method reduces the computing pressure of the client side, can fully utilize the strong computing power and storage power of cloud computing to operate data, and improves the retrieval work efficiency.

Description

Homomorphic encrypted ciphertext retrieval method oriented to cloud computing application

The technical field is as follows:

the invention belongs to the technical field of cloud computing application, and particularly relates to a homomorphic encrypted ciphertext retrieval method for cloud computing application.

Background art:

cloud computing is an innovative service model, enables users to obtain nearly unlimited computing power and rich and diverse information services at any time through the internet, and is an evolution of distributed computing, parallel computing and grid computing. Cloud storage also belongs to the cloud computing category, and the application is gradually popularized. With the rapid development of cloud computing, a large amount of sensitive information is concentrated to the cloud. In order to avoid leakage of private data of a client, the private data needs to be encrypted and then stored in the cloud. When the ciphertext data stored at the cloud server side develops to a certain scale, effective retrieval of the ciphertext data is an urgent problem to be solved.

In the existing homomorphic encrypted ciphertext retrieval scheme, a large amount of work such as preprocessing of a document set is generally placed on a client side for retrieval. In 2014, a document applies a TF-IDF vector retrieval model to a homomorphic encrypted ciphertext retrieval scheme, and the basic idea of the scheme is to select corresponding keywords for a document set at a client, calculate TF-IDF weights of the keywords for all documents in the document set, construct TF-IDF weight vectors of all documents, and obtain a weight vector set of the document set. And then, respectively encrypting the vectors in the weight vector set and uploading the vectors to a cloud for storage. During retrieval, firstly, the weight vector of a retrieval item is calculated at a client, the weight vector is encrypted and uploaded to the cloud, then, the similarity ciphertext of the retrieval item and each document in the document set is calculated by using the weight vector ciphertext of the retrieval item and the weight vector set ciphertext of the document set and returned to a user, and the user decrypts the sequencing ciphertext at the client to obtain a plaintext sequencing result.

According to the ciphertext retrieval scheme, the computing pressure of the client is increased, and the computing capability of the cloud is not fully utilized.

According to the homomorphic ciphertext retrieval method for cloud computing, a large amount of work such as preprocessing of a document set is moved to the cloud end to be carried out, the cloud end carries out preprocessing on the encrypted document, the computing capability of the cloud end can be fully utilized, and efficient homomorphic ciphertext retrieval is further achieved.

Disclosure of Invention

In order to overcome the defects that the computing power of a cloud end is not fully utilized in a ciphertext retrieval scheme, so that the computing pressure of a client end is very high and the like, the invention provides a homomorphic encrypted ciphertext retrieval method which fully utilizes the computing power of the cloud end, namely facing cloud computing application.

The technical scheme adopted by the invention for solving the technical problems is as follows: a homomorphic encrypted ciphertext retrieval method for cloud computing application is characterized by comprising the following steps:

step 1: encrypting a document set at a client and uploading the document set to a cloud

At a client, a user encrypts each document in a document set by adopting an integer homomorphic encryption algorithm, and uploads the documents to a cloud for storage.

(1) Encryption algorithm

KeyGen selects a secure large prime number of randomly generated P bits as the key P, where P ∈ [2 ]^P-1,2^P]；

Encrypt (m) randomly selecting a Q-bit safe big prime number Q, wherein Q ∈ [2 ]^Q-1,2^Q]，P>Q>Plaintext block length, random number r randomly selected in a predetermined time interval, and M is divided into M₁m₂m₃...m_t(m_iLength of L), the ciphertext c is calculated_i＝m_i+2pq + pqr, and a ciphertext can be obtained;

decrypt (c): calculate m_i＝c_imod p, get M plaintext message₁m₂m₃...m_tObtaining a decrypted plaintext;

and (3) carrying out homomorphism analysis: is provided with two plaintexts m₁，m₂The corresponding ciphertext is c₁，c₂Then, then

c₁＝m₁+2pq+pqr₁

c₂＝m₂+2pq+pqr₂

Additive homomorphism analysis:

c₁+c₂＝(m₁+m₂)+4pq+pq(r₁+r₂) Because of (c)₁+c₂)mod p＝m₁+m₂Thus the algorithm satisfies additive homomorphism;

multiplicative homomorphism analysis:

c₁*c₂＝(m₁*m₂)+2m₁pq+m₁pqr₂+2m₂pq+4p₂q₂+2p²q²r₂+m₂pqr₁+2p²q²r₁+p²q2r₁r₂＝(m₁*m₂)+2pq(m₁+m₂)+pq(m₁r₂+m₂r₁)+2p²q²(r₂+r₁)+4p²q²+p²q²r₁r₂,

because (c)₁*c₂)mod p＝m₁*m₂Thus the algorithm satisfies the multiplicative homomorphism.

(2) Keyword m_indexQuery matching algorithm of

a. Encrypting the key words at the client by adopting the encryption algorithm to obtain the corresponding ciphertext c_index＝m_index+2pq + pqr, and uploading to the cloud;

b. ciphertext keyword c received by cloud_indexThen, a ciphertext matching algorithm is adopted for query, wherein N is pq, and a matching formula Retrieval is (c)_i-c_index)mod N＝((m_i-m_index)+pq(r₁-r₂))mod N；

If Retrieval is 0, then m_i-m_index0, i.e. when m_i＝m_indexOf (c) is (r)₁-r₂) And (2) mod N is 0, so that Retrieval is 0, only N needs to be uploaded to the cloud end during Retrieval, wherein N is pq, the cloud end cannot push out the secret key p of the user, but the server can directly perform query matching operation on the ciphertext of the user.

Step 2: preprocessing operation is carried out on ciphertext document set at cloud

The cloud end firstly generates a copy of the ciphertext document set, preprocessing operation is carried out on the copy, and preprocessing is divided into three stages, namely word segmentation filtering, reverse index building and weight vector set generation of the document set.

(1) Word segmentation filtering

English generally divides entries by spaces, while chinese characters cannot do so, the simplest way being to use a single chinese character as an entry.

After word segmentation, keywords without any meaning to the retrieval of the document set need to be removed, namely words are stopped. For each keyword, judging whether the keyword exists in a stop word list or not, and if so, deleting and filtering the keyword from the document; because the original document set is encrypted at the client, the ciphertext stop words can be searched by adopting a linear matching method when the ciphertext stop words are searched under the encryption condition; the method for performing matching query is as follows:

in the step 1, the client encrypts the stop words by adopting a formula (1) and uploads the encrypted stop words to the cloud, the cloud stores the ciphertext stop words, and each keyword in the document set is encrypted by adopting a formula (2) and uploaded to the cloud for storage;

assume that the ciphertext keyword in the cloud document is t_indexThe ciphertext stop word stored in the cloud is t_iThen, there are:

t_i＝m_i+2pq+pqr₂formula (1)

t_index＝m_index+2pq+pqr₁Formula (2)

Filtering each keyword in each ciphertext document in the cloud through a formula (3), wherein the query matching formula is as follows:

the matching operation is performed in the cloud, the result is Retireval, and if Retireval is 0, t is_i-t_indexWhen the formula is used, only N needs to be uploaded to the cloud, wherein N is pq, so that the secret key p cannot be obtained from N, and the formula is safe and reliable;

finally, after all searched ciphertext stop words are deleted and filtered, a new document set is generated, wherein each document is used as d_jAnd (4) showing.

(2) Establishing inverted index

The invention indexes in reverseThe establishment is carried out at a server side, and the key word k is counted under the encryption condition_iWord frequency TF_iAnd inverse document frequency IDF_iFinally, generating an inverted index table of the ciphertext document set;

let c be the ciphertext keyword to be counted at the cloud end_kindexThe key word of the ciphertext stored in the cloud is c_iThen there is

c_kindex＝m_kindex+2pq+pqr₁Formula (4)

c_i＝m_i+2pq+pqr₂Formula (5)

In order to ensure that the water-soluble organic acid,

the matching operation is performed in the cloud, and in formula (6), let the result be Retrieval, and if Retrieval is 0, c_i-c_kindexIf 0, it means that the ciphertext keyword c needing to be counted is found in the ciphertext document_kindexI.e. match to one, word frequency f_ijCount increased by 1, number of documents present n_iRepresenting the number of the documents of the ciphertext keyword ckindex in the document set, and finally obtaining f of the ciphertext keyword through statistics_ijAnd n_i(ii) a When the above formula is used, only N needs to be uploaded to the cloud, where N equals pq, so that the key p cannot be obtained from N, and therefore, the above formula is safe and reliable.

Recording the data obtained according to the statistics, as shown in table 1;

TABLE 1 inverted index Table

As shown in table 1, the document names and keywords are still encrypted at the cloud, but the document numbers, the occurrence frequencies, and the number of documents are obtained in plaintext at the cloud.

(3) Generating a set of file vectors

When a user searches keywords, a document set to be searched corresponds to a weight vector set of the document set, and after the updating of the inverted index table is completed, a weight vector set corresponding to the document set is generated according to the counted keyword word frequency and the counted inverse document frequency; the weight calculation framework used by the invention is a TF-IDF framework, wherein TF represents the word frequency of the keyword, and IDF represents the inverse document frequency of the keyword; the value of the calculation weight vector adopted by the invention is shown in formula (7):

wherein w_ijRepresenting a keyword k_iFor document d_jTF-IDF weight of (f)_ijRepresenting a keyword k_iAppear in document d_jThe frequency of (1), i.e., the number of occurrences, N represents the total number of documents in the document set, N_iRepresenting the inclusion of a keyword k in a document set_iNumber of documents, N/N_iRepresenting a keyword k_iThe inverse document frequency of (d);

let ciphertext document d_jThere are t ciphertext keywords k_iAnd independent of each other, define a ciphertext document d_jFor the vector on the t-dimensional space, the ciphertext document d is obtained according to the formula_jIn each ciphertext key word k_iTo generate a plaintext weight vector of the ciphertext document, wherein each ciphertext keyword k_iThe obtained weight values are all plaintext; then d_jThe value of (c) is shown in equation (8):

and step 3: search process

The client encrypts and uploads the plaintext retrieval item to the cloud, the ciphertext retrieval item is preprocessed like a ciphertext document set at the cloud, the ciphertext retrieval item is subjected to word segmentation and filtering, the process is like the word segmentation and the filtering of ciphertext stop words of the ciphertext document at the cloud, then an inverted index of the ciphertext retrieval item is established, and finally a plaintext weight vector of the ciphertext retrieval item is generated, wherein each item is a plaintext weight value of a corresponding ciphertext keyword in the ciphertext retrieval item, and a calculation formula of the plaintext weight value is shown as a formula (9):

let ciphertext search term q have t ciphertext keywords k_iAnd they are independent from each other, and according to the above-mentioned formula every ciphertext key word k in the ciphertext search item q can be obtained_iAnd generating a plaintext weight vector of ciphertext search terms, wherein each ciphertext keyword k_iThe obtained weight values are all plaintext; the value of q is then as shown in equation (10):

in the searching process, the similarity between the plaintext weight vector of the ciphertext searching item and the plaintext weight vector of each ciphertext document in the ciphertext document set is calculated, and the calculation formula of the similarity is as follows:

let the plaintext weight vector of the ciphertext document be

The plaintext weight vector of the ciphertext search term is

According to the definition of the vector space model, the similarity between the two is set as

Then there is

Wherein, | d_jI and | q | are respectively a plaintext vector mode of a ciphertext document and a plaintext vector mode of a ciphertext retrieval item obtained through calculation at the cloud end, and a ciphertext key is obtained through calculation at the cloud end by using a formula (11)And (3) calculating the similarity of the key words and each ciphertext document, sorting the calculation results according to the similarity, arranging the documents with high similarity, namely the documents with high query correlation in front, so that the search by the user is facilitated, and finally, returning the similarity sorting results to the client side by the cloud side for the user to check.

And 4, step 4: document download

After the desired document is retrieved according to the previous step, the selected document can be downloaded, firstly, the name of the document to be downloaded is input at the client, the name of the document to be downloaded is encrypted and uploaded to the cloud for storage by adopting a formula (12), and each name of the document in a document set is encrypted by adopting a formula (13) and uploaded to the cloud for storage; suppose the ciphertext file name of the file to be downloaded stored in the cloud is c_indexAnd the file name of the ciphertext of the cloud document is c_fiThen there is

c_findex＝m_findex+2pq+pqr₁Formula (12)

c_fi＝m_fi+2pq+pqr₂Formula (13)

Similarly, because the ciphertext obtained by encrypting the same data twice by the homomorphic encryption technology is different, the ciphertext document name to be downloaded is required to be matched with each ciphertext document name in the document set by a formula (14), and if the matching result is Retrieval, the ciphertext document name to be downloaded is required to be matched with each ciphertext document name in the document set

The matching operation is performed at the cloud, the document to be downloaded can be retrieved at the cloud through formula (14), and if Retrieval is 0, c is performed_i-c_indexWhen the above formula is used, only N needs to be uploaded to the cloud, where N is pq, so that the key p cannot be obtained from N.

The invention has the technical characteristics and beneficial effects that:

in the field of cloud computing ciphertext retrieval application, a homomorphic encrypted ciphertext retrieval method based on a TF-IDF vector retrieval model has advantages, most of the existing schemes are that a large amount of work such as preprocessing of a document set is placed on a client side to be processed, the client side processes a large amount of work, the cloud side only processes a small amount of work, and the shortcoming of the method is that the pressure of the client side is increased, and the strong computing capacity of the cloud side is not utilized.

In the invention, the preprocessing stage is put in the cloud for implementation, and compared with the scheme that the preprocessing stage is put in the client for processing, the method has the advantages that: the computing pressure of the client is reduced, the strong computing capacity and storage capacity of cloud computing can be fully utilized to operate data, and the retrieval work efficiency is improved.

The innovation points of the invention can be summarized as follows:

(1) a great deal of work such as preprocessing of a document set and the like is migrated to the cloud as much as possible, and TF-IDF weight vectors of the documents are obtained through cloud computing;

(2) the cloud end can perform homomorphic operation and calculation on the ciphertext based on the homomorphic encrypted documents to construct a TF-IDF weight vector of each document, and the TF-IDF weight vector exists in a plaintext mode;

(3) when ciphertext retrieval is carried out, the client encrypts the retrieval item and uploads the retrieval item to the cloud, the cloud operates the ciphertext retrieval item to obtain corresponding TF-IDF weight vector plaintext, and then the similarity between the retrieval item and each document in the document set is calculated, and a plaintext sequencing result is obtained.

Drawings

FIG. 1 is a flow chart of the general steps of the proposed method;

FIG. 2 is a flowchart of the steps 1 and 2 of encrypting and uploading the document and preprocessing operation in detail;

FIG. 3 is a flow chart showing the operation of steps 1,2 and 3 in the present invention;

FIG. 4 is a specific flowchart of the step 4 document downloading in the present invention.

The specific implementation mode is as follows:

the invention provides a homomorphic ciphertext retrieval method facing cloud computing, which is characterized in that a large amount of work such as preprocessing of a document set is moved to a cloud end for preprocessing, the cloud end preprocesses an encrypted document, the computing capability of the cloud end can be fully utilized, and efficient homomorphic ciphertext retrieval is further realized. The detailed description is given below with reference to the accompanying drawings and embodiments.

The flow of the method provided by the invention is shown in figure 1, and comprises the following steps:

(1) Encryption algorithm

Randomly selecting a key p as 131;

q and the random number r are randomly selected within a specified time interval.

Selecting four documents as a document set, wherein 7 keywords are randomly selected from each document, and the document contents are as follows:

test1＝{I happiness on the others my hands}

test2＝{I was father wind happiness I is}

test3＝{my father is a player player birthday}

test4＝{happiness father others was of my others}

the four documents are respectively encrypted, and the obtained ciphertext documents are as follows:

test1＝{dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca01142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510dec9ba2a7}

test2＝{56e1d11107f444633216d92f822262c9ad76c9b1444676c9bbfb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d}

test3＝{4d9586fb6512cf29ba295980a7ccc6ed2fb1e59902540d3e12fb70f21049ec114217285a811d70d123a3f13327b285b7e8b2d8ba081173c7954e3af8bb70f37903e94f0211a56534c3b320756e3511d70d}

test4＝{f8971173af126bedb3f5e53c9a123a48d61543af8c412d2f895df6394a7bc4d95344467dc4a759fe6444764d95363abdc4a7285ba8ebc11300c7c6ff2b759f51b3c692daaa7ebd4464761959fdfa15811300d872d02}

and uploading the encrypted document set to the cloud end in a ciphertext form for storage.

(1) Word segmentation filtering

English generally divides entries by spaces, and after word division, keywords which have no meaning to the retrieval of a document set need to be removed, which are called stop words. The StandardAnalyzer is used, and a constructor StandardAnalyzer () without parameters is used, in the constructor without parameters of the StandardAnalyzer class, a filter character array STOP _ word is appointed, and the STOP _ word needs to be encrypted and uploaded to a cloud end for storage by a homomorphic encryption algorithm.

In this embodiment, the stop word is { on, the, was, is, a }, since the original document set is encrypted at the client, searching for the ciphertext stop word in the encrypted case needs to be performed by using a linear matching method, and the method for performing matching query is as follows:

assume that the ciphertext keyword in the cloud document is t_indexThe ciphertext stop word stored in the cloud is t_iAccording to the formula (3), a ciphertext stop word matching query can be performed on the ciphertext keyword of each document in the dense document set, and if Retrieval is 0, t is_i-t_index0, then t_i＝t_indexThen prove this keyword t_indexThe stop words are searched and deleted and filtered.

Finally, after all searched ciphertext stop words are deleted and filtered, a new document set is generated, wherein each document is used as d_jRepresents;

(2) establishing inverted index

Let c be the ciphertext keyword to be counted at the cloud end_windEach ciphertext keyword stored in the cloud ciphertext document test1 ═ dc47e285afe8b229b2463e12cfe69f1049ea107b84f51c034c4653ca01142196c9bd11a559d61541049eb2e90112cf2bc6f11300d8a78d53e129fb50cb3f56320459fe510dec9ba2a 84 } is labeled as c7 ═ 107b965d18369814bd43b }_i。

Then, it is calculated according to equation (6), and if Retrieval is 0, c is calculated_i-c_wind0, then c_i＝c_windThen, it means that the ciphertext keyword c needing to be counted is found in the ciphertext document test1_windI.e. match to one, word frequency f_ijCount 1, number of documents present n_iCiphertext key c representing the statistics_windThe number of the documents appearing in the document set is matched with four ciphertext documents, and finally, the ciphertext keyword c is obtained through statistics_windN of (A) to (B)_i＝1；

The data obtained by statistics are shown in table 2;

TABLE 2 inverted index Table

As shown in table 2, the document names and the keywords are still encrypted at the cloud, but the document numbers, the occurrence frequencies, and the number of documents are obtained in plaintext at the cloud.

(3) Generating a set of file vectors

Obtained according to equation (7):

d₁＝{i＝1,happiness＝0,others＝1,my＝0,hands＝2,father＝0,wind＝0,player＝0,birthday＝0}

d₂＝{i＝2,happiness＝0,others＝0,my＝0,hands＝0,father＝0.6,wind＝2,player＝0,birthday＝0}

d₃＝{i＝0,happiness＝0,others＝0,my＝0,hands＝0,father＝0.3,wind＝0,player＝4,birthday＝2}

d₄＝{i＝0,happiness＝0,others＝2,my＝0,hands＝0,father＝0.3,wind＝0,player＝0,birthday＝0}

the weight vector set of the document set is finally obtained according to the formula (8) as follows:

and step 3: search process

For example, the query "gather" search term can be obtained according to steps 1 and 2, and the weight vector is

And finally, calculating the similarity of the weight vectors of the retrieval items and the document set according to a formula (11), and returning the similarity to the client side from large to small, wherein the obtained result is as follows:

and 4, step 4: document download

According to the retrieval in the previous step, the similarity between the document 2 and the retrieval item is highest, and if the document 2 is selected for downloading, the specific implementation flow is as follows:

inputting 'test 2' in the document to be downloaded, and uploading the name of the retrieved document to the cloud end by performing homomorphic encryption on the client end, wherein the matching operation is performed on the cloud end.

The name set of the ciphertext document stored in the cloud is { d2fc02540950b013217a7893ebcd83e12110184cc700e27ad4d95f25409d2fbf60331130099c9ad7d2fb1aaa784d95f11a525, the ciphertext of the name of the document to be downloaded is {18d8c3e121129d93222756fb1e }, the document to be downloaded can be retrieved at the cloud end through a formula (14), and if the Retrieval is 0, the document to be downloaded is c_i-c_indexAnd 0, this indicates that the ciphertext document is queried in the ciphertext document, and then the user may confirm to download to the client.

And 4, step 4: downloaded document decryption

The client receives the downloaded ciphertext document test2 { (56 e1d11107f444633216d92f822262c9ad76c9b1444676c9bb # fb50afb4fc6981447609adc10888653213df64837ddf4d9598252359fea4d95edf6214d9542b75d }, and then decrypts the ciphertext document test2, wherein the decryption algorithm is as follows:

decrypt (c): calculate m_i＝c_imod p, get M plaintext message₁m₂m₃...m_tAnd obtaining the decrypted plaintext document test2 ═ { I was fast with hash is }, namely correctly downloading and decrypting the ciphertext document.

Claims

1. A homomorphic encrypted ciphertext retrieval method for cloud computing application is characterized by comprising the following steps:

At a client, a user encrypts each document in a document set by adopting an integer fully homomorphic encryption algorithm, and uploads the documents to a cloud for storage;

(1) encryption algorithm

KeyGen chooses a randomly generated P-bit secure large prime number as the key P, where P ∈ [2 ]^P-1,2^P]；

Encrypt (m) randomly selecting a Q-bit safe big prime number Q, wherein Q ∈ [2 ]^Q-1,2^Q]，P>Q>Plaintext block length, randomly generating a random number r, and grouping M into M₁m₂m₃...m_t(m_iLength of L), the ciphertext c is calculated_i＝m_i+2pq + pqr, and a ciphertext can be obtained;

c₁＝m₁+2pq+pqr₁

c₂＝m₂+2pq+pqr₂

Additive homomorphism analysis:

multiplicative homomorphism analysis:

because (c)₁*c₂)mod p＝m₁*m₂Therefore, the algorithm satisfies the multiplication homomorphism;

(2) keyword m_indexQuery matching algorithm of

If Retrieval is 0, then m_i-m_index0, i.e. when m_i＝m_indexOf (c) is (r)₁-r₂) The Retrieval method comprises the following steps that (1) mod N is 0, so that Retrieval is 0, N is uploaded to a cloud only during Retrieval, wherein N is pq, the cloud cannot push out a secret key p of a user, and a server can directly perform query matching operation on a ciphertext of the user;

The cloud end firstly generates a copy of a ciphertext document set, preprocessing operation is carried out on the copy, and preprocessing is divided into three stages, namely word segmentation filtering, reverse index establishment and weight vector set generation of the document set;

(1) word segmentation filtering

Selecting keywords and retrieving a document set;

for each keyword, judging whether the keyword exists in a stop word list or not, and if so, deleting and filtering the keyword from the document; because the original document set is encrypted at the client, the ciphertext keywords can be searched by adopting a linear matching method when the original document set is encrypted; the method for performing matching query is as follows:

t_i＝m_i+2pq+pqr₂formula (1)

t_index＝m_index+2pq+pqr₁Formula (2)

(2) establishing inverted index

The establishment of the inverted index is carried out at the server side, and the key word k is counted under the encryption condition_iWord frequency TF_iAnd inverse document frequency IDF_iFinally, generating an inverted index table of the ciphertext document set;

c_kindex＝m_kindex+2pq+pqr₁Formula (4)

c_i＝m_i+2pq+pqr₂Formula (5)

In order to ensure that the water-soluble organic acid,

the matching operation is performed in the cloud, and in formula (6), let the result be Retrieval, and if Retrieval is 0, c_i-c_kindexIf 0, it means that the ciphertext keyword c needing to be counted is found in the ciphertext document_kindexI.e. match to one, word frequency f_ijCount increased by 1, number of documents present n_iCiphertext key c representing the statistics_kindexF, the number of the documents appearing in the document set is finally obtained through statistics_ijAnd n_i；

(3) Generating a set of file vectors

After the updating of the reverse index table is completed, generating a weight vector set corresponding to the document set according to the counted word frequency of the keywords and the frequency of the inverse documents; the weight calculation framework used is a TF-IDF framework, wherein TF represents the word frequency of the keyword, and IDF represents the inverse document frequency of the keyword; the values of the calculated weight vector used are shown in equation (7):

and step 3: search process

The client encrypts and uploads the plaintext retrieval items to the cloud, the cloud conducts preprocessing work the ciphertext retrieval items as same as a ciphertext document set, and the ciphertext retrieval items are subjected to word segmentation and filtering operation, wherein each item is a plaintext weighted value of a corresponding ciphertext keyword in the ciphertext retrieval item, and a calculation formula of the plaintext weighted value is as shown in a formula (9):

let the plaintext weight vector of the ciphertext document be

The plaintext weight vector of the ciphertext search term is

Then there is

Wherein, | d_jI and | q | are respectively a plaintext vector mode of a ciphertext document and a plaintext vector mode of a ciphertext retrieval item obtained by calculation at the cloud end, a calculation result of the similarity between the ciphertext keyword and each ciphertext document is obtained by calculation at the cloud end by using a formula (11), the calculation results are sorted according to the similarity, and the similarity is high, namely, the similarity is high with the similarity of the ciphertext documentThe documents with high inquiry correlation degree are arranged in front, so that the searching by a user is facilitated, and finally, the cloud returns the similarity sequencing result to the client for the user to check;

and 4, step 4: document download

After the desired document is retrieved according to the previous step, the selected document can be downloaded, firstly, the name of the document to be downloaded is input at the client, the name of the document to be downloaded is encrypted and uploaded to the cloud for storage by adopting a formula (12), and each name of the document in a document set is encrypted by adopting a formula (13) and uploaded to the cloud for storage; suppose the ciphertext file name of the file to be downloaded stored in the cloud is c_findexAnd the file name of the ciphertext of the cloud document is c_fiThen there is

c_findex＝m_findex+2pq+pqr₁Formula (12)

c_fi＝m_fi+2pq+pqr₂Formula (13)