CN111859421A - Multi-keyword ciphertext storage and retrieval method and system based on word vector - Google Patents
Multi-keyword ciphertext storage and retrieval method and system based on word vector Download PDFInfo
- Publication number
- CN111859421A CN111859421A CN202010651620.7A CN202010651620A CN111859421A CN 111859421 A CN111859421 A CN 111859421A CN 202010651620 A CN202010651620 A CN 202010651620A CN 111859421 A CN111859421 A CN 111859421A
- Authority
- CN
- China
- Prior art keywords
- index
- plaintext
- query
- keyword
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000013598 vector Substances 0.000 title claims abstract description 115
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 9
- 230000002441 reversible effect Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-keyword ciphertext storage and retrieval method and a system based on word vectors, which comprise the following steps: the data owner represents the keywords of the plaintext document as n + 1-dimensional word vectors, calculates and uploads a ciphertext index and an encrypted document corresponding to the plaintext document to the cloud server, and sends an index key, a decryption private key and model parameters to a data user; and the data user generates a trapdoor by inquiring the keyword set, acquires a plurality of encrypted documents with the highest relevancy from the cloud server side, and decrypts the encrypted documents to obtain corresponding plaintext documents. The invention can accurately obtain the implicit semantics of the words through the word vectors, thereby improving the accuracy of the query, ensuring the security of the ciphertext query by using the thought of the MRSC method as a reference, and preventing the background attack of enemies.
Description
Technical Field
The invention belongs to the field of ciphertext retrieval, and particularly relates to a multi-keyword ciphertext storage and retrieval method and system based on word vectors.
Background
Under the push of the rapid development of internet applications, the requirement of users on storage capacity is increasing, so more and more enterprises or individuals (i.e. data owners) will choose to store data on the cloud server to save local storage space. In the process, in order to ensure the security of the data, the data owner encrypts the data first and then uploads the data to the cloud server. The encrypted data loses flexibility, and if a user wants to acquire required data from a large amount of encrypted data, the required data can be acquired only after downloading and decrypting all data on the cloud server. Thus, the efficiency of obtaining the relevant data is very low, and researchers have proposed a ciphertext retrieval technology to solve the problem.
The ciphertext sequencing retrieval technology is a continuation of the ciphertext retrieval technology, and improves the accuracy of ciphertext retrieval on the basis of fuzzy retrieval. From the perspective of the number of query keywords, ciphertext sorting retrieval methods can be divided into single-Keyword ciphertext sorting methods and Multi-Keyword ciphertext sorting methods (Multi-Keyword Ranked Search).
The MRSE method (Multi-Keyword Search over Encrypted Cloud Data) is one of the classic methods in Multi-Keyword cipher text sorting. On the basis of the MRSE method, a subsequent researcher expands related terms of the query keyword through technologies of query expansion, personalized recommendation and the like so as to increase semantic information of the query keyword, but in the process, as the related terms of the query keyword expansion are increased, the phenomenon of query semantic drift occurs, so that the problem of reduction of retrieval accuracy is caused.
Although the chinese patent application CN109271485A discloses a cloud environment encrypted document sorting retrieval method supporting semantics, the LDA topic model adopted in the method only uses the probability distribution of keywords under specific topics to represent the potential contribution of the keywords to the topic semantics, but it cannot sufficiently and directly mine the semantic relationship of the keywords, so the application still has limited improvement on the accuracy of ciphertext retrieval.
Disclosure of Invention
In order to solve the problems, the invention provides a multi-keyword ciphertext storage and retrieval method and system based on word vectors.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-keyword ciphertext storage method based on word vectors comprises the following steps:
1) a data owner randomly generates an n + 1-dimensional binary segmentation vector s and two (n +1) × (n +1) -dimensional first reversible matrixes M according to a security parameter n1And a second invertible matrix M2The index key SK ═ (s, M)1,M2),n≥10;
2) Extracting m keywords from each plaintext document of a plaintext document set respectively, inputting the m keywords of each plaintext document into a model to obtain m n-dimensional keyword word vectors of each plaintext document, wherein the method for obtaining the model comprises the step of inputting a sample keyword set into a word2vec tool for training;
3) expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document;
4) calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair;
5) And uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user.
Further, the dimension of each keyword word vector is expanded from n dimension to n +1 dimension by the following strategy:
1) the first n dimensions of each keyword word vector are kept unchanged;
2) compute n +1 dimensions, cw, of the keyword word vector'j[n+1]=-0.5||cwj||2Cw is the keyword word vector, j ∈ {1,2, …, m }.
Further, the ciphertext index of the plaintext document is calculated by:
1) dividing the plaintext index into a first plaintext index and a second plaintext index by using a binary division vector s;
2) by means of a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first plaintext index and the second plaintext index to obtain a ciphertext index comprising the first ciphertext index and the second ciphertext index.
Further, the plaintext index is partitioned into a first plaintext index and a second plaintext index by the following strategy:
1) if s [ l]1, then d'i[t][l]+d″i[t][l]=di[t][l],s[l]Is a binary division vector of the l dimension, di[t][l]Is a plaintext index of the ith dimension of the ith keyword of the ith plaintext document, d'i[t][l]A first plaintext index, d ″, of the ith dimension of the t keyword of the ith plaintext document i[t][l]The second plaintext index is the ith keyword and the ith dimension of the ith plaintext document, t belongs to {1,2, …, m }, l belongs to {1,2, …, n, n +1}, i belongs to {1,2, …, k }, and k is the number of plaintext documents in the plaintext document set;
2) if s [ l]0, then d'i[t][l]=d″i[t][l]=di[t][l]。
A multi-keyword ciphertext retrieval method based on word vectors comprises the following steps:
1) inputting x query keywords into a model trained by a data owner by a data user to obtain x n-dimensional query keyword word vectors;
2) expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index;
3) according to the received index key SK ═ (s, M)1,M2) Inquiring the index, generating a trapdoor, and uploading the trapdoor to a cloud server;
4) the cloud server calculates the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document, and returns a plurality of encrypted documents with the highest correlation degree to the data user;
5) and the data user obtains a corresponding plaintext document according to the decryption private key.
Further, the dimension of each query keyword word vector is expanded from n dimension to n +1 dimension by the following strategy:
1) the first n dimensions of each query keyword word vector are kept unchanged;
2) calculating n +1 dimension, cqw 'of query keyword word vector' s[n+1]1, cqw is the keyword word vector, s ∈ {1,2, …, x }.
Further, the trapdoor is generated by:
1) generating a query vector r x Q of x (n +1) dimensions by using x query keyword vectors and a random number rw,QwIndexing for queries;
2) dividing the query vector into a first query vector and a second query vector using a binary division vector s, and passing through a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first query vector and the second query vector to obtain the trapdoor containing the first query vector and the second query vector.
Further, the query vector is partitioned into a first query vector and a second query vector by:
1) if s [ l]=1,Q′w[b][l]=Q″w[b][l]=r×Qw[b][l],s[l]Is a binary division vector of the l dimension, Qw[b][l]Query index, Q ', in the l dimension of the b-th query keyword'w[b][l]Is the first query index, Q ″, of the l dimension of the b-th keywordw[b][l]A second query index for the ith dimension of the kth keyword, b ∈ {1,2, …, x }, l ∈ {1,2, …, n, n +1 };
2)s[l]=0,Q′w[b][l]+Q″w[b][l]=r×Qw[b][l]。
further, the method for calculating the relevance of the query keyword set and each encrypted document comprises the following steps: the Kuhn-Munkres algorithm.
A multi-keyword ciphertext retrieval system based on word vectors, comprising:
a data owner for randomly generating an n + 1-dimensional binary segmentation vector s and two (n +1) × (n +1) -dimensional first reversible matrices M according to a security parameter n 1And a second invertible matrix M2The index key SK ═ (s, M)1,M2) N is more than or equal to 10; extracting m keywords from each plaintext document of a plaintext document set respectively, and inputting the m keywords of each plaintext document into a model trained by a sample keyword set to obtain m n-dimensional keyword word vectors of each plaintext document; expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document; calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair; uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user;
the data user is used for inputting x query keywords into a model trained by a data owner to obtain x n-dimensional query keyword word vectors; expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index; generating a trapdoor according to the received index key and the query index, and uploading the trapdoor to a cloud server; the data user obtains a corresponding plaintext document according to the decryption private key;
The cloud server is used for storing the encrypted document and the corresponding ciphertext index; and calculating the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document, and returning a plurality of encrypted documents with the highest correlation degrees to the data user.
Compared with the prior art, the invention has the beneficial effects that:
1) the word vector can be used for accurately acquiring the implicit semantics of the words, so that the query accuracy can be improved;
2) the problem of relevance among the subjects in the Chinese patent CN109271485A is solved;
3) by using the thought of the MRSE method for reference, the security of ciphertext query is ensured, and the background attack of an adversary is prevented.
Drawings
Fig. 1 is a system architecture diagram of ciphertext retrieval.
Fig. 2 is a diagram illustrating the construction of the ciphertext retrieval scheme.
FIG. 3 is a flow chart of index generation and encryption.
Detailed Description
In order that the objects, principles, aspects and advantages of the present invention will become more apparent, the present invention will be described in detail below with reference to specific embodiments thereof and with reference to the accompanying drawings.
The system model of the scheme is shown in fig. 1, and the cloud service is divided into three entities according to different functions: data owner, cloud server and data consumer. The scheme comprises the following steps as shown in figure 2:
The method comprises the following steps: generate key, SK ← setup (1)n) The safety parameter n is given, wherein n is more than or equal to 10, and the preferable value range is [50,200 ]]The algorithm outputs an encryption index key SK.
Step two: and (I, C) ← Geninder (F, SK): inputting a plaintext document set F and an encryption key SK, and the algorithm can generate an index set I 'corresponding to the plaintext document set F by using the plaintext document set F, and simultaneously encrypts the plaintext document set F and the index set I' by using the encryption key to obtain a ciphertext document set C and a ciphertext index I corresponding to the ciphertext document set.
Step three: generation of trapdoor, TD ← GenTrapdoor (Q)wSK): the algorithm uses the index key SK to match the query QwThe trapdoor TD can be obtained by encryption.
Step four: generating a ciphertext query result, EkAnd (E) the cloud server obtains the encryption index I, the trapdoor TD and the parameter k, calculates the correlation between the Query and the ciphertext set C by using the algorithm, and calculates the correlation ciphertext E of top-kkAnd feeding back to the data user.
Step five: obtaining a plaintext challenge result, Fk←Dec(EkSK) data usageThe person receives the ciphertext E returned by the cloud serverkThen, top-k related plaintext F can be obtained through decryption by the algorithmk。
The specific construction method of the first step is as follows:
The data owner sets a security parameter n, namely the dimension of a word vector, then randomly generates an n + 1-dimensional binary vector s as a segmentation indication vector, and simultaneously generates two (n +1) × (n +1) -dimensional reversible matrixes M1And M2Then SK is (s, M)1,M2)。
The word2vec tool in the step is a method for training word vectors by using a neural network, and meanwhile, other deep neural network methods can be replaced to obtain the word vectors.
The word vector in this step is a distributed low-dimensional real vector, and the basic idea is to map words to N-dimensional real vectors using a training corpus. The distance between word vectors may represent an implicit semantic relationship between words.
The binary vector in this step means that each dimension in the vector takes a value of 0 or 1.
The specific construction scheme of the step two (I, C) ← Geninder (F, SK) is as follows:
as shown in FIG. 3, the data owner uses the package genic in the python program to set F ═ F for the plaintext document set1,f2,…,fkTraining to obtain a training model; obtaining word vectors { cw) corresponding to m keyword sets of each document by using the trained model1,cw2,…,cwmM is the number of keywords in the index; the word vector for each keyword is then expanded from n dimensions to n +1 dimensions to form an expanded word vector { cw' 1,cw′2,…,cw′mThe expanding method comprises the following steps: the top n dimension of each keyword word vector remains constant, cw'j[n+1]=-0.5||cwj||2J is e {1,2, …, m }. Then utilizes the word vector cw'jFor each document fiGenerating an mx (n +1) document matrix di(i ∈ {1,2, …, k }) as a plaintext index for the document; and indexing the plaintext d by using the segmentation indication vector siIs divided into d'iAnd d ″)iThe dividing method comprises the following steps: if s [ l]=1,d′i[t][l]+d″i[t][l]=di[t][l]And vice versa d'i[t][l]=d″i[t][l]=di[t][l]T ∈ {1,2, …, m } and l ∈ {1,2, …, n, n +1 }. Then through two matrices M1And M2To d'iAnd d ″)iIs encrypted to obtainAny safe and reliable encryption algorithm can be used for encrypting the plaintext document set F to obtain a ciphertext document set C, and finally the data owner encrypts IiAnd uploading the ciphertext C to a cloud server, and changing the index key SK to (s, M)1,M2) The decryption key of the encryption algorithm and the parameters of the training model are sent to the data user.
Said step three TD ← GenTrapdoor (Q)wSK) is as follows:
the data user inputs the query keyword set { qw using a model trained by the data owner1,qw2,…,qwxObtaining word vectors (cqw) corresponding to the x query keyword sets1,cqw2,…,cqwxThen expand each keyword word vector from n-dimension to n + 1-dimension to form an expanded word vector of cqw' 1,cqw′2,…,cqw′xThe expanding method comprises the following steps: the top n dimension of each query keyword term vector remains unchanged, cqw'sn +1 is 1, s ∈ {1,2, …, x }. Generating a query vector r x Q of dimension x (n +1) using a word vector query QwX is the number of query keywords; query r × Q using a segmentation indication vector swIs divided into Q'wAnd Q ″)wThe dividing method comprises the following steps: if s [ l]=1,Q′w[b][l]=Q″w[b][l]=r×Qw[b][l]And vice versa Q'w[b][l]+Q″w[b][l]=r×Qw[b][l]Where b ∈ {1,2, …, x } and l ∈ {1,2, …, n, n +1 }. Then through two matrices M1And M2To Q'wAnd Q ″)wIs encrypted to obtainAnd finally, the trapdoor TD is uploaded to the cloud service by a data user.
Step four, the cloud server calculates the value of Rscore KM (sim (I)iTD)), the correlation between the query and the ciphertext is obtained, and the top-k ciphertexts are fed back to the data user.
Index IiThe correlation with the trapdoor TD is calculated using the following equation:
the function KM () in the above equation represents the KM algorithm, which is called Kuhn-Munkres in its entirety, and is a maximum weight matching algorithm for computing weighted bipartite graphs. The maximum value of the weight of the matching edge in the bipartite graph can be obtained through a KM algorithm. dis (d)i,Qw) Representing the Euclidean distance between the query matrix and the document matrix.
Step five, the data user obtains the ciphertext EkThe associated plaintext is obtained by using the decryption key
The invention has the following analysis on retrieval accuracy:
the scheme utilizes the original information of a data user and calculates the inner product between the query keyword vector and the document keyword vector to obtain the semantic relation between the query keyword and the document keyword. According to the scheme, the matching degree of the query keywords and the document keywords can be improved by utilizing the KM algorithm and the word semantic correlation degree, so that the accuracy of ciphertext retrieval is improved.
The invention has the following safety analysis:
under the known ciphertext model, the adversary can obtain corresponding ciphertext information, including an encrypted document vector, a query vector and the like, but the encryption key is kept secret. The encryption key of the scheme consists of two parts, namely a segmentation indication vector s with dimension of n +1 and a reversible matrix M of (n +1) × (n +1)1,M2. Due to the fact thatAndreversible matrix obtained by matrix operation under known ciphertext conditionBut the segmentation indication vector cannot be deduced. Calculation 2 is required to calculate the division instruction vector SnThis, in turn, places extremely high demands on the operational performance of the server, so the scheme is secure given the ciphertext.
In the known background information model, the cloud server not only knows the ciphertext index information, but also has the capabilities of recording query results, analyzing the query process, and conjecturing the relationship among different trapdoors, the statistical analysis knowledge of the database and the like. Query disassociation: according to the scheme, the relevance between the query key words and the document key words is obtained by using the thought of the MRSE method for reference, and the corresponding relation between the query matrix and the document matrix cannot be obtained; the key word is safe: the keyword encryption of the scheme adopts a security-enhanced inner vector product calculation mode, and the encryption mode provided by the scheme also meets the keyword security because the encryption mode meets the keyword security.
The multi-keyword ciphertext sequencing retrieval method based on the word vectors can improve the accuracy of ciphertext retrieval on the premise of ensuring the safety of ciphertext retrieval.
It should be understood that the above embodiments are described in some detail and with some particularity, but should not be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the claims.
Claims (10)
1. A multi-keyword ciphertext storage method based on word vectors comprises the following steps:
1) the data owner randomly generates an n +1 dimensional binary segmentation vector s and two (n +1) templates according to the security parameter nA first invertible matrix M of dimension (n +1)1And a second invertible matrix M2The index key SK ═ (s, M)1,M2),n≥10;
2) Extracting m keywords from each plaintext document of a plaintext document set respectively, inputting the m keywords of each plaintext document into a model to obtain m n-dimensional keyword word vectors of each plaintext document, wherein the method for obtaining the model comprises the step of inputting a sample keyword set into a word2vec tool for training;
3) Expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document;
4) calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair;
5) and uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user.
2. The method of claim 1, wherein the dimension of each keyword word vector is expanded from n-dimension to n + 1-dimension by the following strategy:
1) the first n dimensions of each keyword word vector are kept unchanged;
2) compute n +1 dimensions, cw, of the keyword word vector'j[n+1]=-0.5||cwj||2Cw is the keyword word vector, j ∈ {1,2, …, m }.
3. The method of claim 2, wherein the ciphertext index of the plaintext document is computed by:
1) dividing the plaintext index into a first plaintext index and a second plaintext index by using a binary division vector s;
2) by means of a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first plaintext index and the second plaintext index to obtain a ciphertext index comprising the first ciphertext index and the second ciphertext index.
4. The method of claim 3, wherein the plaintext index is partitioned into a first plaintext index and a second plaintext index by:
1) if s [ l]1, then d'i[t][l]+d″i[t][l]=di[t][l],s[l]Is a binary division vector of the l dimension, di[t][l]Is a plaintext index of the ith dimension of the ith keyword of the ith plaintext document, d'i[t][l]A first plaintext index, d ″, of the ith dimension of the t keyword of the ith plaintext documenti[t][l]The second plaintext index is the ith keyword and the ith dimension of the ith plaintext document, t belongs to {1,2, …, m }, l belongs to {1,2, …, n, n +1}, i belongs to {1,2, …, k }, and k is the number of plaintext documents in the plaintext document set;
2) if s [ l]0, then d'i[t][l]=d″i[t][l]=di[t][l]。
5. A multi-keyword ciphertext retrieval method based on word vectors comprises the following steps:
1) inputting x query keywords into a model trained by a data owner by a data user to obtain x n-dimensional query keyword word vectors;
2) expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index;
3) according to the received index key SK ═ (s, M)1,M2) Inquiring the index, generating a trapdoor, and uploading the trapdoor to a cloud server;
4) the cloud server calculates the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document obtained by the method of any one of claims 1 to 5, and returns a plurality of encrypted documents with the highest correlation degree to the data user;
5) And the data user obtains a corresponding plaintext document according to the decryption private key.
6. The method of claim 5, wherein the dimension of each query keyword word vector is expanded from n-dimensions to n + 1-dimensions by the following strategy:
1) the first n dimensions of each query keyword word vector are kept unchanged;
2) calculating n +1 dimension, cqw 'of query keyword word vector's[n+1]1, cqw is the keyword word vector, s ∈ {1,2, …, x }.
7. The method of claim 6, wherein the trapdoor is created by:
1) generating a query vector r x Q of x (n +1) dimensions by using x query keyword vectors and a random number rw,QwIndexing for queries;
2) dividing the query vector into a first query vector and a second query vector using a binary division vector s, and passing through a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first query vector and the second query vector to obtain the trapdoor containing the first query vector and the second query vector.
8. The method of claim 7, wherein the query vector is partitioned into a first query vector and a second query vector by:
1) if s [ l]=1,Q′w[b][l]=Q″w[b][l]=r×Qw[b][l],s[l]Is a binary division vector of the l dimension, Qw[b][l]Query index, Q ', in the l dimension of the b-th query keyword' w[b][l]Is the first query index, Q ″, of the l dimension of the b-th keywordw[b][l]A second query index for the ith dimension of the kth keyword, b ∈ {1,2, …, x }, l ∈ {1,2, …, n, n +1 };
2)s[l]=0,Q′w[b][l]+Q″w[b][l]=r×Qw[b][l]。
9. the method of claim 5, wherein the method of calculating the relevance of the query keyword set to each encrypted document comprises: the Kuhn-Munkres algorithm.
10. A multi-keyword ciphertext retrieval system based on word vectors, comprising:
a data owner for randomly generating an n + 1-dimensional binary segmentation vector s and two (n +1) × (n +1) -dimensional first reversible matrices M according to a security parameter n1And a second invertible matrix M2The index key SK ═ (s, M)1,M2) N is more than or equal to 10; extracting m keywords from each plaintext document of a plaintext document set respectively, and inputting the m keywords of each plaintext document into a model trained by a sample keyword set to obtain m n-dimensional keyword word vectors of each plaintext document; expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document; calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair; uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user;
The data user is used for inputting x query keywords into a model trained by a data owner to obtain x n-dimensional query keyword word vectors; expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index; generating a trapdoor according to the received index key and the query index, and uploading the trapdoor to a cloud server; the data user obtains a corresponding plaintext document according to the decryption private key;
the cloud server is used for storing the encrypted document and the corresponding ciphertext index; and calculating the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document, and returning a plurality of encrypted documents with the highest correlation degrees to the data user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010651620.7A CN111859421A (en) | 2020-07-08 | 2020-07-08 | Multi-keyword ciphertext storage and retrieval method and system based on word vector |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010651620.7A CN111859421A (en) | 2020-07-08 | 2020-07-08 | Multi-keyword ciphertext storage and retrieval method and system based on word vector |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111859421A true CN111859421A (en) | 2020-10-30 |
Family
ID=73152405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010651620.7A Pending CN111859421A (en) | 2020-07-08 | 2020-07-08 | Multi-keyword ciphertext storage and retrieval method and system based on word vector |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111859421A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528064A (en) * | 2020-12-10 | 2021-03-19 | 西安电子科技大学 | Privacy-protecting encrypted image retrieval method and system |
CN117235121A (en) * | 2023-11-15 | 2023-12-15 | 华北电力大学 | Energy big data query method and system |
CN117574435A (en) * | 2024-01-12 | 2024-02-20 | 云阵(杭州)互联网技术有限公司 | Multi-keyword trace query method, device and system based on homomorphic encryption |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130024144A (en) * | 2011-08-30 | 2013-03-08 | 고려대학교 산학협력단 | Weighted keyword searching method for perserving privacy, and apparatus thereof |
CN108563732A (en) * | 2018-04-08 | 2018-09-21 | 浙江理工大学 | Towards encryption cloud data multiple-fault diagnosis sorted search method in a kind of cloud network |
CN108632248A (en) * | 2018-03-22 | 2018-10-09 | 平安科技(深圳)有限公司 | Data ciphering method, data query method, apparatus, equipment and storage medium |
US20180357434A1 (en) * | 2017-06-08 | 2018-12-13 | The Government Of The United States, As Represented By The Secretary Of The Army | Secure Generalized Bloom Filter |
CN109063509A (en) * | 2018-08-07 | 2018-12-21 | 上海海事大学 | It is a kind of that encryption method can search for based on keywords semantics sequence |
CN110069944A (en) * | 2019-04-03 | 2019-07-30 | 南方电网科学研究院有限责任公司 | It is a kind of can search for encryption data retrieval method and system |
-
2020
- 2020-07-08 CN CN202010651620.7A patent/CN111859421A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130024144A (en) * | 2011-08-30 | 2013-03-08 | 고려대학교 산학협력단 | Weighted keyword searching method for perserving privacy, and apparatus thereof |
US20180357434A1 (en) * | 2017-06-08 | 2018-12-13 | The Government Of The United States, As Represented By The Secretary Of The Army | Secure Generalized Bloom Filter |
CN108632248A (en) * | 2018-03-22 | 2018-10-09 | 平安科技(深圳)有限公司 | Data ciphering method, data query method, apparatus, equipment and storage medium |
CN108563732A (en) * | 2018-04-08 | 2018-09-21 | 浙江理工大学 | Towards encryption cloud data multiple-fault diagnosis sorted search method in a kind of cloud network |
CN109063509A (en) * | 2018-08-07 | 2018-12-21 | 上海海事大学 | It is a kind of that encryption method can search for based on keywords semantics sequence |
CN110069944A (en) * | 2019-04-03 | 2019-07-30 | 南方电网科学研究院有限责任公司 | It is a kind of can search for encryption data retrieval method and system |
Non-Patent Citations (1)
Title |
---|
张楠;陈兰香;: "一种高效的支持排序的关键词可搜索加密系统研究", 信息网络安全, no. 02 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528064A (en) * | 2020-12-10 | 2021-03-19 | 西安电子科技大学 | Privacy-protecting encrypted image retrieval method and system |
CN117235121A (en) * | 2023-11-15 | 2023-12-15 | 华北电力大学 | Energy big data query method and system |
CN117235121B (en) * | 2023-11-15 | 2024-02-20 | 华北电力大学 | Energy big data query method and system |
CN117574435A (en) * | 2024-01-12 | 2024-02-20 | 云阵(杭州)互联网技术有限公司 | Multi-keyword trace query method, device and system based on homomorphic encryption |
CN117574435B (en) * | 2024-01-12 | 2024-04-23 | 云阵(杭州)互联网技术有限公司 | Multi-keyword trace query method, device and system based on homomorphic encryption |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | PIC: Enable large-scale privacy preserving content-based image search on cloud | |
Zhang et al. | SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems | |
CN108388807B (en) | Efficient and verifiable multi-keyword sequencing searchable encryption method supporting preference search and logic search | |
CN107480163B (en) | Efficient ciphertext image retrieval method supporting privacy protection in cloud environment | |
CN108712366A (en) | That morphology meaning of a word fuzzy search is supported in cloud environment can search for encryption method and system | |
CN108647529A (en) | A kind of semantic-based multi-key word sorted search intimacy protection system and method | |
CN111859421A (en) | Multi-keyword ciphertext storage and retrieval method and system based on word vector | |
CN109885640B (en) | Multi-keyword ciphertext sorting and searching method based on alpha-fork index tree | |
CN109471964B (en) | Synonym set-based fuzzy multi-keyword searchable encryption method | |
CN110659379B (en) | Searchable encrypted image retrieval method based on deep convolution network characteristics | |
CN109063509A (en) | It is a kind of that encryption method can search for based on keywords semantics sequence | |
CN109992995B (en) | Searchable encryption method supporting location protection and privacy inquiry | |
CN109992978B (en) | Information transmission method and device and storage medium | |
WO2022099495A1 (en) | Ciphertext search method, system, and device in cloud computing environment | |
CN111026788A (en) | Homomorphic encryption-based multi-keyword ciphertext sorting and retrieving method in hybrid cloud | |
CN112332979B (en) | Ciphertext search method, system and equipment in cloud computing environment | |
CN109885650B (en) | Outsourcing cloud environment privacy protection ciphertext sorting retrieval method | |
CN115314295B (en) | Block chain-based searchable encryption technical method | |
CN111797409A (en) | Big data Chinese text carrier-free information hiding method | |
CN109739945B (en) | Multi-keyword ciphertext sorting and searching method based on mixed index | |
CN116109372B (en) | Cold chain logistics product federal recommendation method and device based on multi-level block chain | |
CN112257455A (en) | Semantic-understanding ciphertext space keyword retrieval method and system | |
CN109255244B (en) | Data encryption method and device and data encryption retrieval system | |
CN115310125A (en) | Encrypted data retrieval system, method, computer equipment and storage medium | |
Wang et al. | An efficient and privacy-preserving range query over encrypted cloud data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |