CN111859421A - Multi-keyword ciphertext storage and retrieval method and system based on word vector - Google Patents

Multi-keyword ciphertext storage and retrieval method and system based on word vector Download PDF

Info

Publication number
CN111859421A
CN111859421A CN202010651620.7A CN202010651620A CN111859421A CN 111859421 A CN111859421 A CN 111859421A CN 202010651620 A CN202010651620 A CN 202010651620A CN 111859421 A CN111859421 A CN 111859421A
Authority
CN
China
Prior art keywords
index
plaintext
query
keyword
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010651620.7A
Other languages
Chinese (zh)
Inventor
韩光
田宝松
许彩云
杨杨
兰静
哈兰
崔永进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Software & Service Co ltd
Original Assignee
China National Software & Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Software & Service Co ltd filed Critical China National Software & Service Co ltd
Priority to CN202010651620.7A priority Critical patent/CN111859421A/en
Publication of CN111859421A publication Critical patent/CN111859421A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-keyword ciphertext storage and retrieval method and a system based on word vectors, which comprise the following steps: the data owner represents the keywords of the plaintext document as n + 1-dimensional word vectors, calculates and uploads a ciphertext index and an encrypted document corresponding to the plaintext document to the cloud server, and sends an index key, a decryption private key and model parameters to a data user; and the data user generates a trapdoor by inquiring the keyword set, acquires a plurality of encrypted documents with the highest relevancy from the cloud server side, and decrypts the encrypted documents to obtain corresponding plaintext documents. The invention can accurately obtain the implicit semantics of the words through the word vectors, thereby improving the accuracy of the query, ensuring the security of the ciphertext query by using the thought of the MRSC method as a reference, and preventing the background attack of enemies.

Description

Multi-keyword ciphertext storage and retrieval method and system based on word vector
Technical Field
The invention belongs to the field of ciphertext retrieval, and particularly relates to a multi-keyword ciphertext storage and retrieval method and system based on word vectors.
Background
Under the push of the rapid development of internet applications, the requirement of users on storage capacity is increasing, so more and more enterprises or individuals (i.e. data owners) will choose to store data on the cloud server to save local storage space. In the process, in order to ensure the security of the data, the data owner encrypts the data first and then uploads the data to the cloud server. The encrypted data loses flexibility, and if a user wants to acquire required data from a large amount of encrypted data, the required data can be acquired only after downloading and decrypting all data on the cloud server. Thus, the efficiency of obtaining the relevant data is very low, and researchers have proposed a ciphertext retrieval technology to solve the problem.
The ciphertext sequencing retrieval technology is a continuation of the ciphertext retrieval technology, and improves the accuracy of ciphertext retrieval on the basis of fuzzy retrieval. From the perspective of the number of query keywords, ciphertext sorting retrieval methods can be divided into single-Keyword ciphertext sorting methods and Multi-Keyword ciphertext sorting methods (Multi-Keyword Ranked Search).
The MRSE method (Multi-Keyword Search over Encrypted Cloud Data) is one of the classic methods in Multi-Keyword cipher text sorting. On the basis of the MRSE method, a subsequent researcher expands related terms of the query keyword through technologies of query expansion, personalized recommendation and the like so as to increase semantic information of the query keyword, but in the process, as the related terms of the query keyword expansion are increased, the phenomenon of query semantic drift occurs, so that the problem of reduction of retrieval accuracy is caused.
Although the chinese patent application CN109271485A discloses a cloud environment encrypted document sorting retrieval method supporting semantics, the LDA topic model adopted in the method only uses the probability distribution of keywords under specific topics to represent the potential contribution of the keywords to the topic semantics, but it cannot sufficiently and directly mine the semantic relationship of the keywords, so the application still has limited improvement on the accuracy of ciphertext retrieval.
Disclosure of Invention
In order to solve the problems, the invention provides a multi-keyword ciphertext storage and retrieval method and system based on word vectors.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-keyword ciphertext storage method based on word vectors comprises the following steps:
1) a data owner randomly generates an n + 1-dimensional binary segmentation vector s and two (n +1) × (n +1) -dimensional first reversible matrixes M according to a security parameter n1And a second invertible matrix M2The index key SK ═ (s, M)1,M2),n≥10;
2) Extracting m keywords from each plaintext document of a plaintext document set respectively, inputting the m keywords of each plaintext document into a model to obtain m n-dimensional keyword word vectors of each plaintext document, wherein the method for obtaining the model comprises the step of inputting a sample keyword set into a word2vec tool for training;
3) expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document;
4) calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair;
5) And uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user.
Further, the dimension of each keyword word vector is expanded from n dimension to n +1 dimension by the following strategy:
1) the first n dimensions of each keyword word vector are kept unchanged;
2) compute n +1 dimensions, cw, of the keyword word vector'j[n+1]=-0.5||cwj||2Cw is the keyword word vector, j ∈ {1,2, …, m }.
Further, the ciphertext index of the plaintext document is calculated by:
1) dividing the plaintext index into a first plaintext index and a second plaintext index by using a binary division vector s;
2) by means of a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first plaintext index and the second plaintext index to obtain a ciphertext index comprising the first ciphertext index and the second ciphertext index.
Further, the plaintext index is partitioned into a first plaintext index and a second plaintext index by the following strategy:
1) if s [ l]1, then d'i[t][l]+d″i[t][l]=di[t][l],s[l]Is a binary division vector of the l dimension, di[t][l]Is a plaintext index of the ith dimension of the ith keyword of the ith plaintext document, d'i[t][l]A first plaintext index, d ″, of the ith dimension of the t keyword of the ith plaintext document i[t][l]The second plaintext index is the ith keyword and the ith dimension of the ith plaintext document, t belongs to {1,2, …, m }, l belongs to {1,2, …, n, n +1}, i belongs to {1,2, …, k }, and k is the number of plaintext documents in the plaintext document set;
2) if s [ l]0, then d'i[t][l]=d″i[t][l]=di[t][l]。
A multi-keyword ciphertext retrieval method based on word vectors comprises the following steps:
1) inputting x query keywords into a model trained by a data owner by a data user to obtain x n-dimensional query keyword word vectors;
2) expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index;
3) according to the received index key SK ═ (s, M)1,M2) Inquiring the index, generating a trapdoor, and uploading the trapdoor to a cloud server;
4) the cloud server calculates the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document, and returns a plurality of encrypted documents with the highest correlation degree to the data user;
5) and the data user obtains a corresponding plaintext document according to the decryption private key.
Further, the dimension of each query keyword word vector is expanded from n dimension to n +1 dimension by the following strategy:
1) the first n dimensions of each query keyword word vector are kept unchanged;
2) calculating n +1 dimension, cqw 'of query keyword word vector' s[n+1]1, cqw is the keyword word vector, s ∈ {1,2, …, x }.
Further, the trapdoor is generated by:
1) generating a query vector r x Q of x (n +1) dimensions by using x query keyword vectors and a random number rw,QwIndexing for queries;
2) dividing the query vector into a first query vector and a second query vector using a binary division vector s, and passing through a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first query vector and the second query vector to obtain the trapdoor containing the first query vector and the second query vector.
Further, the query vector is partitioned into a first query vector and a second query vector by:
1) if s [ l]=1,Q′w[b][l]=Q″w[b][l]=r×Qw[b][l],s[l]Is a binary division vector of the l dimension, Qw[b][l]Query index, Q ', in the l dimension of the b-th query keyword'w[b][l]Is the first query index, Q ″, of the l dimension of the b-th keywordw[b][l]A second query index for the ith dimension of the kth keyword, b ∈ {1,2, …, x }, l ∈ {1,2, …, n, n +1 };
2)s[l]=0,Q′w[b][l]+Q″w[b][l]=r×Qw[b][l]。
further, the method for calculating the relevance of the query keyword set and each encrypted document comprises the following steps: the Kuhn-Munkres algorithm.
A multi-keyword ciphertext retrieval system based on word vectors, comprising:
a data owner for randomly generating an n + 1-dimensional binary segmentation vector s and two (n +1) × (n +1) -dimensional first reversible matrices M according to a security parameter n 1And a second invertible matrix M2The index key SK ═ (s, M)1,M2) N is more than or equal to 10; extracting m keywords from each plaintext document of a plaintext document set respectively, and inputting the m keywords of each plaintext document into a model trained by a sample keyword set to obtain m n-dimensional keyword word vectors of each plaintext document; expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document; calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair; uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user;
the data user is used for inputting x query keywords into a model trained by a data owner to obtain x n-dimensional query keyword word vectors; expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index; generating a trapdoor according to the received index key and the query index, and uploading the trapdoor to a cloud server; the data user obtains a corresponding plaintext document according to the decryption private key;
The cloud server is used for storing the encrypted document and the corresponding ciphertext index; and calculating the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document, and returning a plurality of encrypted documents with the highest correlation degrees to the data user.
Compared with the prior art, the invention has the beneficial effects that:
1) the word vector can be used for accurately acquiring the implicit semantics of the words, so that the query accuracy can be improved;
2) the problem of relevance among the subjects in the Chinese patent CN109271485A is solved;
3) by using the thought of the MRSE method for reference, the security of ciphertext query is ensured, and the background attack of an adversary is prevented.
Drawings
Fig. 1 is a system architecture diagram of ciphertext retrieval.
Fig. 2 is a diagram illustrating the construction of the ciphertext retrieval scheme.
FIG. 3 is a flow chart of index generation and encryption.
Detailed Description
In order that the objects, principles, aspects and advantages of the present invention will become more apparent, the present invention will be described in detail below with reference to specific embodiments thereof and with reference to the accompanying drawings.
The system model of the scheme is shown in fig. 1, and the cloud service is divided into three entities according to different functions: data owner, cloud server and data consumer. The scheme comprises the following steps as shown in figure 2:
The method comprises the following steps: generate key, SK ← setup (1)n) The safety parameter n is given, wherein n is more than or equal to 10, and the preferable value range is [50,200 ]]The algorithm outputs an encryption index key SK.
Step two: and (I, C) ← Geninder (F, SK): inputting a plaintext document set F and an encryption key SK, and the algorithm can generate an index set I 'corresponding to the plaintext document set F by using the plaintext document set F, and simultaneously encrypts the plaintext document set F and the index set I' by using the encryption key to obtain a ciphertext document set C and a ciphertext index I corresponding to the ciphertext document set.
Step three: generation of trapdoor, TD ← GenTrapdoor (Q)wSK): the algorithm uses the index key SK to match the query QwThe trapdoor TD can be obtained by encryption.
Step four: generating a ciphertext query result, EkAnd (E) the cloud server obtains the encryption index I, the trapdoor TD and the parameter k, calculates the correlation between the Query and the ciphertext set C by using the algorithm, and calculates the correlation ciphertext E of top-kkAnd feeding back to the data user.
Step five: obtaining a plaintext challenge result, Fk←Dec(EkSK) data usageThe person receives the ciphertext E returned by the cloud serverkThen, top-k related plaintext F can be obtained through decryption by the algorithmk
The specific construction method of the first step is as follows:
The data owner sets a security parameter n, namely the dimension of a word vector, then randomly generates an n + 1-dimensional binary vector s as a segmentation indication vector, and simultaneously generates two (n +1) × (n +1) -dimensional reversible matrixes M1And M2Then SK is (s, M)1,M2)。
The word2vec tool in the step is a method for training word vectors by using a neural network, and meanwhile, other deep neural network methods can be replaced to obtain the word vectors.
The word vector in this step is a distributed low-dimensional real vector, and the basic idea is to map words to N-dimensional real vectors using a training corpus. The distance between word vectors may represent an implicit semantic relationship between words.
The binary vector in this step means that each dimension in the vector takes a value of 0 or 1.
The specific construction scheme of the step two (I, C) ← Geninder (F, SK) is as follows:
as shown in FIG. 3, the data owner uses the package genic in the python program to set F ═ F for the plaintext document set1,f2,…,fkTraining to obtain a training model; obtaining word vectors { cw) corresponding to m keyword sets of each document by using the trained model1,cw2,…,cwmM is the number of keywords in the index; the word vector for each keyword is then expanded from n dimensions to n +1 dimensions to form an expanded word vector { cw' 1,cw′2,…,cw′mThe expanding method comprises the following steps: the top n dimension of each keyword word vector remains constant, cw'j[n+1]=-0.5||cwj||2J is e {1,2, …, m }. Then utilizes the word vector cw'jFor each document fiGenerating an mx (n +1) document matrix di(i ∈ {1,2, …, k }) as a plaintext index for the document; and indexing the plaintext d by using the segmentation indication vector siIs divided into d'iAnd d ″)iThe dividing method comprises the following steps: if s [ l]=1,d′i[t][l]+d″i[t][l]=di[t][l]And vice versa d'i[t][l]=d″i[t][l]=di[t][l]T ∈ {1,2, …, m } and l ∈ {1,2, …, n, n +1 }. Then through two matrices M1And M2To d'iAnd d ″)iIs encrypted to obtain
Figure BDA0002575176690000051
Any safe and reliable encryption algorithm can be used for encrypting the plaintext document set F to obtain a ciphertext document set C, and finally the data owner encrypts IiAnd uploading the ciphertext C to a cloud server, and changing the index key SK to (s, M)1,M2) The decryption key of the encryption algorithm and the parameters of the training model are sent to the data user.
Said step three TD ← GenTrapdoor (Q)wSK) is as follows:
the data user inputs the query keyword set { qw using a model trained by the data owner1,qw2,…,qwxObtaining word vectors (cqw) corresponding to the x query keyword sets1,cqw2,…,cqwxThen expand each keyword word vector from n-dimension to n + 1-dimension to form an expanded word vector of cqw' 1,cqw′2,…,cqw′xThe expanding method comprises the following steps: the top n dimension of each query keyword term vector remains unchanged, cqw'sn +1 is 1, s ∈ {1,2, …, x }. Generating a query vector r x Q of dimension x (n +1) using a word vector query QwX is the number of query keywords; query r × Q using a segmentation indication vector swIs divided into Q'wAnd Q ″)wThe dividing method comprises the following steps: if s [ l]=1,Q′w[b][l]=Q″w[b][l]=r×Qw[b][l]And vice versa Q'w[b][l]+Q″w[b][l]=r×Qw[b][l]Where b ∈ {1,2, …, x } and l ∈ {1,2, …, n, n +1 }. Then through two matrices M1And M2To Q'wAnd Q ″)wIs encrypted to obtain
Figure BDA0002575176690000061
And finally, the trapdoor TD is uploaded to the cloud service by a data user.
Step four, the cloud server calculates the value of Rscore KM (sim (I)iTD)), the correlation between the query and the ciphertext is obtained, and the top-k ciphertexts are fed back to the data user.
Index IiThe correlation with the trapdoor TD is calculated using the following equation:
Figure BDA0002575176690000062
the function KM () in the above equation represents the KM algorithm, which is called Kuhn-Munkres in its entirety, and is a maximum weight matching algorithm for computing weighted bipartite graphs. The maximum value of the weight of the matching edge in the bipartite graph can be obtained through a KM algorithm. dis (d)i,Qw) Representing the Euclidean distance between the query matrix and the document matrix.
Step five, the data user obtains the ciphertext EkThe associated plaintext is obtained by using the decryption key
The invention has the following analysis on retrieval accuracy:
the scheme utilizes the original information of a data user and calculates the inner product between the query keyword vector and the document keyword vector to obtain the semantic relation between the query keyword and the document keyword. According to the scheme, the matching degree of the query keywords and the document keywords can be improved by utilizing the KM algorithm and the word semantic correlation degree, so that the accuracy of ciphertext retrieval is improved.
The invention has the following safety analysis:
under the known ciphertext model, the adversary can obtain corresponding ciphertext information, including an encrypted document vector, a query vector and the like, but the encryption key is kept secret. The encryption key of the scheme consists of two parts, namely a segmentation indication vector s with dimension of n +1 and a reversible matrix M of (n +1) × (n +1)1,M2. Due to the fact that
Figure BDA0002575176690000063
And
Figure BDA0002575176690000064
reversible matrix obtained by matrix operation under known ciphertext condition
Figure BDA0002575176690000065
But the segmentation indication vector cannot be deduced. Calculation 2 is required to calculate the division instruction vector SnThis, in turn, places extremely high demands on the operational performance of the server, so the scheme is secure given the ciphertext.
In the known background information model, the cloud server not only knows the ciphertext index information, but also has the capabilities of recording query results, analyzing the query process, and conjecturing the relationship among different trapdoors, the statistical analysis knowledge of the database and the like. Query disassociation: according to the scheme, the relevance between the query key words and the document key words is obtained by using the thought of the MRSE method for reference, and the corresponding relation between the query matrix and the document matrix cannot be obtained; the key word is safe: the keyword encryption of the scheme adopts a security-enhanced inner vector product calculation mode, and the encryption mode provided by the scheme also meets the keyword security because the encryption mode meets the keyword security.
The multi-keyword ciphertext sequencing retrieval method based on the word vectors can improve the accuracy of ciphertext retrieval on the premise of ensuring the safety of ciphertext retrieval.
It should be understood that the above embodiments are described in some detail and with some particularity, but should not be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the claims.

Claims (10)

1. A multi-keyword ciphertext storage method based on word vectors comprises the following steps:
1) the data owner randomly generates an n +1 dimensional binary segmentation vector s and two (n +1) templates according to the security parameter nA first invertible matrix M of dimension (n +1)1And a second invertible matrix M2The index key SK ═ (s, M)1,M2),n≥10;
2) Extracting m keywords from each plaintext document of a plaintext document set respectively, inputting the m keywords of each plaintext document into a model to obtain m n-dimensional keyword word vectors of each plaintext document, wherein the method for obtaining the model comprises the step of inputting a sample keyword set into a word2vec tool for training;
3) Expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document;
4) calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair;
5) and uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user.
2. The method of claim 1, wherein the dimension of each keyword word vector is expanded from n-dimension to n + 1-dimension by the following strategy:
1) the first n dimensions of each keyword word vector are kept unchanged;
2) compute n +1 dimensions, cw, of the keyword word vector'j[n+1]=-0.5||cwj||2Cw is the keyword word vector, j ∈ {1,2, …, m }.
3. The method of claim 2, wherein the ciphertext index of the plaintext document is computed by:
1) dividing the plaintext index into a first plaintext index and a second plaintext index by using a binary division vector s;
2) by means of a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first plaintext index and the second plaintext index to obtain a ciphertext index comprising the first ciphertext index and the second ciphertext index.
4. The method of claim 3, wherein the plaintext index is partitioned into a first plaintext index and a second plaintext index by:
1) if s [ l]1, then d'i[t][l]+d″i[t][l]=di[t][l],s[l]Is a binary division vector of the l dimension, di[t][l]Is a plaintext index of the ith dimension of the ith keyword of the ith plaintext document, d'i[t][l]A first plaintext index, d ″, of the ith dimension of the t keyword of the ith plaintext documenti[t][l]The second plaintext index is the ith keyword and the ith dimension of the ith plaintext document, t belongs to {1,2, …, m }, l belongs to {1,2, …, n, n +1}, i belongs to {1,2, …, k }, and k is the number of plaintext documents in the plaintext document set;
2) if s [ l]0, then d'i[t][l]=d″i[t][l]=di[t][l]。
5. A multi-keyword ciphertext retrieval method based on word vectors comprises the following steps:
1) inputting x query keywords into a model trained by a data owner by a data user to obtain x n-dimensional query keyword word vectors;
2) expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index;
3) according to the received index key SK ═ (s, M)1,M2) Inquiring the index, generating a trapdoor, and uploading the trapdoor to a cloud server;
4) the cloud server calculates the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document obtained by the method of any one of claims 1 to 5, and returns a plurality of encrypted documents with the highest correlation degree to the data user;
5) And the data user obtains a corresponding plaintext document according to the decryption private key.
6. The method of claim 5, wherein the dimension of each query keyword word vector is expanded from n-dimensions to n + 1-dimensions by the following strategy:
1) the first n dimensions of each query keyword word vector are kept unchanged;
2) calculating n +1 dimension, cqw 'of query keyword word vector's[n+1]1, cqw is the keyword word vector, s ∈ {1,2, …, x }.
7. The method of claim 6, wherein the trapdoor is created by:
1) generating a query vector r x Q of x (n +1) dimensions by using x query keyword vectors and a random number rw,QwIndexing for queries;
2) dividing the query vector into a first query vector and a second query vector using a binary division vector s, and passing through a first invertible matrix M1And a second invertible matrix M2And respectively encrypting the first query vector and the second query vector to obtain the trapdoor containing the first query vector and the second query vector.
8. The method of claim 7, wherein the query vector is partitioned into a first query vector and a second query vector by:
1) if s [ l]=1,Q′w[b][l]=Q″w[b][l]=r×Qw[b][l],s[l]Is a binary division vector of the l dimension, Qw[b][l]Query index, Q ', in the l dimension of the b-th query keyword' w[b][l]Is the first query index, Q ″, of the l dimension of the b-th keywordw[b][l]A second query index for the ith dimension of the kth keyword, b ∈ {1,2, …, x }, l ∈ {1,2, …, n, n +1 };
2)s[l]=0,Q′w[b][l]+Q″w[b][l]=r×Qw[b][l]。
9. the method of claim 5, wherein the method of calculating the relevance of the query keyword set to each encrypted document comprises: the Kuhn-Munkres algorithm.
10. A multi-keyword ciphertext retrieval system based on word vectors, comprising:
a data owner for randomly generating an n + 1-dimensional binary segmentation vector s and two (n +1) × (n +1) -dimensional first reversible matrices M according to a security parameter n1And a second invertible matrix M2The index key SK ═ (s, M)1,M2) N is more than or equal to 10; extracting m keywords from each plaintext document of a plaintext document set respectively, and inputting the m keywords of each plaintext document into a model trained by a sample keyword set to obtain m n-dimensional keyword word vectors of each plaintext document; expanding the dimensionality of each keyword word vector from n dimensionality to n +1 dimensionality to obtain a plaintext index of each plaintext document; calculating the ciphertext index of each plaintext document according to the index key and each plaintext index, and obtaining the encrypted document of each plaintext document through generating an encrypted and decrypted public and private key pair; uploading each ciphertext index and each encrypted document to a cloud server, and sending an index key, a decryption private key and model parameters obtained by training to a data user;
The data user is used for inputting x query keywords into a model trained by a data owner to obtain x n-dimensional query keyword word vectors; expanding the dimensionality of each query keyword word vector from n dimensionality to n +1 dimensionality to obtain a query index; generating a trapdoor according to the received index key and the query index, and uploading the trapdoor to a cloud server; the data user obtains a corresponding plaintext document according to the decryption private key;
the cloud server is used for storing the encrypted document and the corresponding ciphertext index; and calculating the correlation degree of the query key words and each encrypted document according to the trapdoors and the ciphertext index of each encrypted document, and returning a plurality of encrypted documents with the highest correlation degrees to the data user.
CN202010651620.7A 2020-07-08 2020-07-08 Multi-keyword ciphertext storage and retrieval method and system based on word vector Pending CN111859421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010651620.7A CN111859421A (en) 2020-07-08 2020-07-08 Multi-keyword ciphertext storage and retrieval method and system based on word vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010651620.7A CN111859421A (en) 2020-07-08 2020-07-08 Multi-keyword ciphertext storage and retrieval method and system based on word vector

Publications (1)

Publication Number Publication Date
CN111859421A true CN111859421A (en) 2020-10-30

Family

ID=73152405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010651620.7A Pending CN111859421A (en) 2020-07-08 2020-07-08 Multi-keyword ciphertext storage and retrieval method and system based on word vector

Country Status (1)

Country Link
CN (1) CN111859421A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528064A (en) * 2020-12-10 2021-03-19 西安电子科技大学 Privacy-protecting encrypted image retrieval method and system
CN117235121A (en) * 2023-11-15 2023-12-15 华北电力大学 Energy big data query method and system
CN117574435A (en) * 2024-01-12 2024-02-20 云阵(杭州)互联网技术有限公司 Multi-keyword trace query method, device and system based on homomorphic encryption

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130024144A (en) * 2011-08-30 2013-03-08 고려대학교 산학협력단 Weighted keyword searching method for perserving privacy, and apparatus thereof
CN108563732A (en) * 2018-04-08 2018-09-21 浙江理工大学 Towards encryption cloud data multiple-fault diagnosis sorted search method in a kind of cloud network
CN108632248A (en) * 2018-03-22 2018-10-09 平安科技(深圳)有限公司 Data ciphering method, data query method, apparatus, equipment and storage medium
US20180357434A1 (en) * 2017-06-08 2018-12-13 The Government Of The United States, As Represented By The Secretary Of The Army Secure Generalized Bloom Filter
CN109063509A (en) * 2018-08-07 2018-12-21 上海海事大学 It is a kind of that encryption method can search for based on keywords semantics sequence
CN110069944A (en) * 2019-04-03 2019-07-30 南方电网科学研究院有限责任公司 It is a kind of can search for encryption data retrieval method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130024144A (en) * 2011-08-30 2013-03-08 고려대학교 산학협력단 Weighted keyword searching method for perserving privacy, and apparatus thereof
US20180357434A1 (en) * 2017-06-08 2018-12-13 The Government Of The United States, As Represented By The Secretary Of The Army Secure Generalized Bloom Filter
CN108632248A (en) * 2018-03-22 2018-10-09 平安科技(深圳)有限公司 Data ciphering method, data query method, apparatus, equipment and storage medium
CN108563732A (en) * 2018-04-08 2018-09-21 浙江理工大学 Towards encryption cloud data multiple-fault diagnosis sorted search method in a kind of cloud network
CN109063509A (en) * 2018-08-07 2018-12-21 上海海事大学 It is a kind of that encryption method can search for based on keywords semantics sequence
CN110069944A (en) * 2019-04-03 2019-07-30 南方电网科学研究院有限责任公司 It is a kind of can search for encryption data retrieval method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张楠;陈兰香;: "一种高效的支持排序的关键词可搜索加密系统研究", 信息网络安全, no. 02 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528064A (en) * 2020-12-10 2021-03-19 西安电子科技大学 Privacy-protecting encrypted image retrieval method and system
CN117235121A (en) * 2023-11-15 2023-12-15 华北电力大学 Energy big data query method and system
CN117235121B (en) * 2023-11-15 2024-02-20 华北电力大学 Energy big data query method and system
CN117574435A (en) * 2024-01-12 2024-02-20 云阵(杭州)互联网技术有限公司 Multi-keyword trace query method, device and system based on homomorphic encryption
CN117574435B (en) * 2024-01-12 2024-04-23 云阵(杭州)互联网技术有限公司 Multi-keyword trace query method, device and system based on homomorphic encryption

Similar Documents

Publication Publication Date Title
Zhang et al. PIC: Enable large-scale privacy preserving content-based image search on cloud
Zhang et al. SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems
CN108388807B (en) Efficient and verifiable multi-keyword sequencing searchable encryption method supporting preference search and logic search
CN107480163B (en) Efficient ciphertext image retrieval method supporting privacy protection in cloud environment
CN108712366A (en) That morphology meaning of a word fuzzy search is supported in cloud environment can search for encryption method and system
CN108647529A (en) A kind of semantic-based multi-key word sorted search intimacy protection system and method
CN111859421A (en) Multi-keyword ciphertext storage and retrieval method and system based on word vector
CN109885640B (en) Multi-keyword ciphertext sorting and searching method based on alpha-fork index tree
CN109471964B (en) Synonym set-based fuzzy multi-keyword searchable encryption method
CN110659379B (en) Searchable encrypted image retrieval method based on deep convolution network characteristics
CN109063509A (en) It is a kind of that encryption method can search for based on keywords semantics sequence
CN109992995B (en) Searchable encryption method supporting location protection and privacy inquiry
CN109992978B (en) Information transmission method and device and storage medium
WO2022099495A1 (en) Ciphertext search method, system, and device in cloud computing environment
CN111026788A (en) Homomorphic encryption-based multi-keyword ciphertext sorting and retrieving method in hybrid cloud
CN112332979B (en) Ciphertext search method, system and equipment in cloud computing environment
CN109885650B (en) Outsourcing cloud environment privacy protection ciphertext sorting retrieval method
CN115314295B (en) Block chain-based searchable encryption technical method
CN111797409A (en) Big data Chinese text carrier-free information hiding method
CN109739945B (en) Multi-keyword ciphertext sorting and searching method based on mixed index
CN116109372B (en) Cold chain logistics product federal recommendation method and device based on multi-level block chain
CN112257455A (en) Semantic-understanding ciphertext space keyword retrieval method and system
CN109255244B (en) Data encryption method and device and data encryption retrieval system
CN115310125A (en) Encrypted data retrieval system, method, computer equipment and storage medium
Wang et al. An efficient and privacy-preserving range query over encrypted cloud data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination