CN112257455A - Semantic-understanding ciphertext space keyword retrieval method and system - Google Patents
Semantic-understanding ciphertext space keyword retrieval method and system Download PDFInfo
- Publication number
- CN112257455A CN112257455A CN202011135390.5A CN202011135390A CN112257455A CN 112257455 A CN112257455 A CN 112257455A CN 202011135390 A CN202011135390 A CN 202011135390A CN 112257455 A CN112257455 A CN 112257455A
- Authority
- CN
- China
- Prior art keywords
- query
- probability distribution
- text
- ciphertext
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000010276 construction Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 174
- 238000012163 sequencing technique Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000003058 natural language processing Methods 0.000 claims description 6
- 230000003190 augmentative effect Effects 0.000 claims description 3
- 230000002441 reversible effect Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012946 outsourcing Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a space keyword ciphertext retrieval method and a space keyword ciphertext retrieval system for semantic understanding. By extracting the semantic features of the space object and the user query, the user can query the space object which is in accordance with the self query intention and is close to the self query intention on the ciphertext. The ciphertext index construction method adopted by the invention improves the precision of ciphertext space keyword retrieval and simultaneously meets the query requirements of users for distance and text. In addition, the ciphertext query algorithm of the scheme improves the query efficiency while ensuring the security and privacy of the space object data and the user retrieval information.
Description
Technical Field
The invention relates to the technical field of searchable encryption, in particular to a method and a system for searching ciphertext space keywords with semantic understanding.
Background
The spatial key word retrieval involves a large amount of spatial object data, and the query process needs to consume a large amount of computing overhead. At this time, the data owner often chooses to outsource the space object data to the cloud server, and the cloud server stores and calculates the space object data. However, since the data owner loses direct control over the packet data, data security and privacy cannot be guaranteed. In particular, the spatial object data includes sensitive information such as position coordinates, and it is essential to encrypt the spatial object data.
The traditional space keyword query algorithm only aims at plaintext data, and a user cannot directly query ciphertext data stored in a cloud server. Secondly, the traditional spatial keyword query algorithm cannot extract semantic information of keywords retrieved by a user, so that the query intention of the user cannot be obtained, and the query accuracy is not high.
Qianxinghu proposed a semantic understanding-based spatial keyword query method in its published paper, "semantic understanding-based spatial keyword query" (Suzhou university. 2018). The method adds semantic information extraction of the text description of the spatial object on the basis of the traditional spatial keyword query. The specific method is that aiming at text semantics, a latent Dirichlet distributed topic model (LDA model) is used for extracting text semantic features. However, the method only supports plaintext inquiry, and the privacy of the data owner cannot be protected.
The data ciphertext query method based on fine-grained sequencing in a single user environment is disclosed in a patent document ' a data ciphertext query method based on fine-grained sequencing in a single user environment ' applied by the university of electronic science and technology of xi ' an, and the method has the following defects: semantic information in the query of the data user cannot be extracted, so that the search accuracy is limited. And the document index in the scheme is a vector with the same length as the dictionary. The dimension is large, so that the calculation cost is large and the query efficiency is low.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a ciphertext space keyword retrieval method for semantic understanding.
The invention is realized by the following technical scheme:
a ciphertext space keyword retrieval method for semantic understanding comprises the following steps:
step 1, generating an AES key and searching an encryption algorithm key;
step 2, extracting text-theme probability distribution vectors described by the space object text and word-theme probability distribution vectors of each word on the theme, determining text set-theme probability distribution vectors according to the text-theme probability distribution vectors, and determining word-text set probability distribution vectors of the words on the text set according to the text-theme probability distribution vectors and the word-theme probability distribution vectors;
step 3, constructing a plaintext index for each space object according to the text-theme probability distribution vector of the space object and the corresponding space position coordinate, and encrypting the plaintext index to form a ciphertext index;
step 4, extracting query-subject probability distribution vectors of keywords in query sentences according to the text set-subject probability distribution vectors, the word-subject probability distribution vectors and the word-text set probability distribution vectors obtained in the step 2, combining the spatial position coordinates of the data users with the query-subject probability distribution vectors to generate query vectors, and encrypting the query vectors by using a searchable encryption algorithm key to obtain query trapdoors;
and 5, determining the mixing similarity between the space object and the query statement according to the query trapdoor and the ciphertext indexes, sequencing, sending the encrypted data of the space object corresponding to the k ciphertext indexes with the top sequencing to a data user, and decrypting the encrypted data by the data user by using an AES (advanced encryption Standard) key.
Preferably, step 2 extracts a text-topic probability distribution vector V of the text description of each spatial object on each topic using a natural language processing modelDAnd a term-topic probability distribution vector V for each term on the respective topicK。
Preferably, the method for determining the probability distribution vector of the word-text set in step 2 is as follows:
text-topic probability distribution vector V for text description of all objects by data ownerDAdding and dividing the number of the objects to obtain a text set-theme probability distribution vector P reflecting the appearance of each theme in the text sett;
Data owner based on text set-topic probability distribution vector PtAnd a word-topic probability distribution vector V for each wordKCalculating a word-text set probability distribution vector P of each word appearing in the text setω。
Preferably, the method for constructing the ciphertext index in step 3 is as follows:
adding spatial object position coordinates to a text-to-topic probability distribution vectorThen combined to form a plaintext index DiExpanding the dimensionality of the plaintext index, and adopting a searchable encryption algorithm key SK to carry out the expansion on the plaintext indexEncrypting to obtain ciphertext index I of spatial objecti。
Preferably, the augmented plaintext indexWhen encrypting, firstly, the encryption is divided, and then the encryption is carried outAndrespectively encrypting;
the segmentation rule is as follows: if the jth bit of binary vector S in searchable encryption algorithm key SK is 0,andare all provided withIf the j-th bit of S is 1,andis set as two random numbers, the sum of which is
The encryption process is as follows: using { M in searchable encryption algorithm key SK1,M2Get dot products separatelyGet each space object oiIs used for indexing the ciphertext
Preferably, the query-topic probability distribution vector Q of the keywords in the query statement in step 4wThe determination method of (2) is as follows:
wherein, PtFor text set-topic probability distributionVector quantity; pwIs a word-text set probability distribution vector; mKIs a word-topic probability distribution matrix, QdTo query a set of keywords, | QdL is the number of the query key words; the o-symbol is the hadamard product between the vectors.
Preferably, the encryption method of the query vector Q in step 4 is as follows,
first-choice expanding dimensionality of query vector Q to obtain expanded query vectorThen to the query vectorSegmenting to obtain segmented query vectorsAndfinally, encrypting the divided query vectors respectively to obtain query trapdoors;
the segmentation rule is as follows: if the ith bit of binary vector S in searchable encryption algorithm key SK is 1,andare all provided withIf the ith bit of S is 0,andis set as two random numbers, the sum of which is
The encryption process is as follows: using { M in searchable encryption algorithm key SK1,M2Get dot products separatelyObtaining a query trapdoor
Preferably, the method for calculating the mixture similarity in step 5 is as follows:
wherein:to query for trapdoors;a ciphertext index for an ith spatial object; m1And M2Two invertible matrices in the searchable encryption algorithm key SK.
Equal to the semantic relatedness of the keyword and the ith spatial object in the query statement, (| λ)i||2-2λiλq+||λq| | l) is equal to the square of the euclidean distance between the data user query location and the spatial object coordinates, δ being the query weight.
Preferably, the ciphertext data is encrypted by AES by using an AES key to perform AES encryption on the name, the geographical location coordinate, and the text description data of each spatial object.
A system of a ciphertext space keyword retrieval method for semantic understanding comprises a key module, a semantic information extraction module, an encryption index construction module, a trapdoor generation module and a query module;
the key module is used for generating an AES key and a searchable encryption algorithm key by a data owner and sending the searchable encryption algorithm key to a data user;
the semantic information extraction module is used for extracting a text-theme probability distribution vector of each space object text appearing under each theme and a word-theme probability distribution vector of each word appearing on each theme by using a natural language processing model, calculating a text set-theme probability distribution vector according to the text-theme probability distribution vector, and determining the word-text set probability distribution vector of each word appearing in the text set according to the text set-theme probability distribution vector and the word-theme probability distribution vector;
the encryption index construction module is used for constructing a plaintext index for each space object by a data owner according to the text-theme probability distribution vector and the space position coordinates of the space object, encrypting the plaintext index by utilizing a searchable encryption algorithm, simultaneously carrying out AES (advanced encryption standard) encryption on data of each space object, and finally sending a ciphertext index and ciphertext data formed by encryption to the cloud server;
a trap door generation module for the data user to extract the query-subject probability distribution vector Q of the query sentence according to the text set-subject probability distribution vector, the term-subject probability distribution vector and the term-text set probability distribution vectorwThe spatial location coordinates of the data user and the query-topic probability distribution vector QwGenerating a query vector Q in a combined manner, encrypting the query vector by adopting a key capable of searching an encryption algorithm to obtain a query trapdoor, and sending the query trapdoor to a cloud server;
and the query module is used for the cloud server to perform mixed similarity calculation on the query trapdoor and the ciphertext indexes of all the space objects, perform sequencing and send the encrypted files of k objects before sequencing to the data user. Compared with the prior art, the invention has the following beneficial technical effects:
the ciphertext space keyword retrieval method for semantic understanding provided by the invention constructs ciphertext indexes by utilizing a searchable encryption algorithm according to the text-subject probability distribution vector and the corresponding space position coordinate of a space object, ensures the security and privacy of the text description and the position coordinate of the space object, combines the space position coordinate of a data user with the query-subject probability distribution vector to generate a query vector, encrypts the query vector to generate a query trapdoor, and thereby protects query information. The whole scheme meets the requirement of outsourcing data privacy protection, the spatial object index is constructed by adopting the topic probability distribution of the text, and compared with the prior art for realizing keyword ciphertext retrieval based on a dictionary, the method has the advantages of low calculation cost, high query accuracy and support of semantic perception. By extracting the text description of the space object and the semantic features of the query sentences of the user, the user can query the space object which is in accordance with the query intention of the user and is close to the user on the ciphertext, the efficiency of ciphertext space keyword retrieval is improved, the safety and privacy of space text data and user retrieval information are ensured, and the query efficiency is improved.
Drawings
FIG. 1 is a flow chart of a retrieval method of the present invention;
FIG. 2 is a flow chart of spatial object index generation according to the present invention;
FIG. 3 is a flow chart of query trapdoor generation according to the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the attached drawings, which are illustrative, but not limiting, of the present invention.
Referring to fig. 1, a ciphertext space keyword retrieval method for semantic understanding includes the following steps:
step 1, the data owner generates an AES key skAESThe searchable encryption algorithm key SK is sent to the data user by the data owner;
AES key skAESThe method is used for AES encryption of space objects, and the searchable encryption algorithm key SK is used for establishing a ciphertext index and inquiring a trapdoor.
The searchable encryption algorithm key SK comprises a randomly generated binary vector S and a reversible matrix M1,M2. Wherein the vector S has a length of n +4, M1And M2Are all (n +4) × (n +4) square matrices.
Step 2, the data owner adopts an LDA topic model belonging to a natural language processing model to extract semantic features in the spatial object text description to obtain a text-topic probability distribution vector V corresponding to each spatial object text descriptionDAnd a term-topic probability distribution vector V for each term on the respective topicKAnd according to the topic probability distribution vector VDGet the text set-topic distribution vector PtAccording to PtAnd word probability distribution vector VKWord-text set probability distribution vector P for jointly determining occurrence of words in text setω。
Referring to fig. 2, the specific process is as follows:
s2.1, preprocessing the text description of each space object by a data owner, firstly removing the punctuation marks, special characters and meaningless words in the text, then extracting word stems of the words, and finally vectorizing the text description;
s2.2 the data owner uses the preprocessed text to train an LDA (LatentDirichletAllocation) topic model, and the LDA topic model outputs probability distribution vectors (text-topic probability distribution vectors) V of the text description of each space object on each topicDAnd a probability distribution vector for each word on the respective topic (word-topic probability distribution vector) VK,VDAnd VkIs n-dimensional;
s2.3 data owner with text-topic probability distribution vector VDConstructing a text-topic probability distribution matrix M for row vectorsDWith word-topic probability distribution vector VKConstructing a word-topic probability distribution matrix M for row vectorsK;
S2.4 subject probability distribution vector V of text description of all objects by data ownerDAdding and dividing the number of the objects to obtain a probability vector (text set-theme probability distribution vector) P reflecting the appearance of each theme in the text sett;
S2.5 probability vector P of occurrence of data owner in text set according to subjecttAnd words-Topic probability distribution matrix MKCalculating a probability vector (word-text set probability distribution vector) P of each word appearing in the text setω。
Pω=Pt·MK T
Step 3, according to the text-theme probability distribution vector V of the space objectDAnd (x, y) constructing a plaintext index for each space object, and segmenting and encrypting the index by using a searchable encryption algorithm key.
Specifically, the spatial object position coordinates are added to the topic probability distribution vector VDAnd then merging to form a plaintext index, expanding the merged vector to n +4 dimensions, and encrypting the expanded vector by adopting a searchable encryption algorithm key SK to obtain a ciphertext index of the space object.
The specific process is as follows (taking the ith space object as an example):
s3.1 data owner will be the ith spatial object oiCorresponding text-to-topic probability distribution vectorsWith its spatial position coordinate lambdaiCombined as a vector (x, y)And using the space object as a plaintext index of the space object;
s3.2 indexing the plaintext DiIs extended, i.e. at DiAdding-0.5 | | | lambda in sequence at the taili||2And 1, the augmented vector is noted
S3.3 data owner uses searchable encryption algorithm key SK to convert data vectors of spatial objectsThe segmentation and encryption are performed to generate a ciphertext index for the ciphertext.
The segmentation rule is as follows: if the jth bit of S is 0,andare all provided withIf the j-th bit of S is 1,andis set as two sumsThe random number of (2).
The encryption process is as follows: data owner uses M in searchable encryption algorithm key SK1,M2Separately encryptGet each space object oiIs used for indexing the ciphertext
S3.5, the data owner carries out AES encryption on the name, the geographic position coordinate and the text description data of each space object by using the AES key to form encrypted data;
and S3.6, uploading the encrypted data and the ciphertext index to a cloud server by the data owner.
S3.7 the data owner sends the word-subject probability distribution vector, the text set-subject probability distribution vector and the word-text set probability distribution vector to the data user as auxiliary vectors in subsequent calculations.
Step 4, the data user utilizes the training result (namely the auxiliary vector) of the LDA topic model to extract the semantic features in the query sentenceDeriving a probability vector (query-topic probability distribution vector) Q of occurrence of keywords on each topic in the query statementwThe user's coordinates are compared with a query-topic probability distribution vector QwAnd combining the generated query vector Q, and encrypting the query vector by adopting a searchable encryption algorithm key SK to obtain a query trapdoor.
Referring to fig. 3, the specific process is as follows:
s4.1 according to the word-text set probability distribution vector P of the word on the text setωComputing query-topic probability distribution vector Q for keywords in a query statementw;
Wherein, PtIs a text set-topic probability distribution vector; pwIs a word-text set probability distribution vector; mKIs a term-topic probability distribution matrix; qdTo query a set of keywords, | QdL is the number of the query key words; the o symbol is a Hadamard product (Hadamard product) between the computed vectors.
S4.2 data consumers query keyword-subject probability distribution vector QwGeographic position coordinates λ with query pointq=(xq,yq) Combining to generate a query vector Q;
s4.3 the data user expands the query vector Q to n +4 dimensions, and the expanded query vector is marked asWhereinThe n +3 th position of (1), the n +4 th position of-0.5 | | | lambdaq||2;
S4.4, the data user sets the query weight delta according to the query preference (the space distance of emphasis or the similarity of emphasis texts), so as to adjust the query result.
S4.5 data consumers will query the vector using the binary vector S in the Key SK of the searchable encryption AlgorithmSplit into two n +4 dimensional random vectors
The segmentation rule is as follows: if the j-th bit of S is 1,andare all provided withIf the jth bit of S is 0,andis set as two random numbers, the sum of which is
S4.6 reversible matrix { M ] in secret SK of searchable encryption algorithm for data user1,M2Pair of random vectorsEncrypting to obtain the trapdoorAnd sending the generated trapdoor T to a cloud server.
Step 5, the cloud server indexes and searches the ciphertext of the space objectAnd the trapdoor is inquired to calculate the inner product, and the encrypted data of the space object corresponding to the k ciphertext indexes with the maximum calculation result is sent to a data user. Specifically, the cloud server determines the mixed similarity between the space object and the query statement according to the query trapdoor T and the ciphertext indexes I, sorts the mixed similarity of the ciphertext indexes from large to small, returns the space object encrypted data corresponding to the first k ranked ciphertext indexes to the data user, and the data user uses the AES key skAESAnd decrypting the received k ciphertext data to obtain corresponding space object plaintext information, namely the name, the geographic position and the text description of the space object. Wherein,
the calculation formula for calculating the index mixing similarity is as follows:
wherein,representing the semantic relevance of the keyword and the ith spatial object in the query statement, (| λ)i||2-2λiλq+||λq| |) represents the square of the euclidean distance of the data user query location and spatial object coordinates. After the weight delta is adjusted, the cloud server obtains the mixed correlation degree of the user query and the space object.
A ciphertext space keyword retrieval system for semantic understanding comprises a key module, a semantic information extraction module, an encryption index construction module, a trapdoor generation module and a query module which are sequentially connected.
Key module, data owner generates AES key skAESAnd a searchable encryption algorithm key SK; the data owner sends the searchable encryption algorithm key to the data consumer.
A semantic information extraction module for extracting the probability (text-theme probability distribution vector) V of each space object text under each theme by using a natural language processing modelDAnd the probability of occurrence of each word on the respective topic (word-topic probability distribution vector) VK. And calculating a text set-subject probability distribution vector P according to the text-subject probability distribution vectort. According to PtAnd VKCalculating the probability vector (word-text set probability distribution vector) P of each word appearing in the text setω。
The encryption index construction module is used for constructing a plaintext index for each space object by a data owner according to the text-theme probability distribution vector and the space position coordinates of the space object, encrypting the plaintext index by utilizing a searchable encryption algorithm, simultaneously carrying out AES (advanced encryption standard) encryption on data of each space object, and finally sending a ciphertext index and ciphertext data formed by encryption to the cloud server;
a trap door generation module for the data user to extract the query-subject probability distribution vector Q of the query sentence according to the text set-subject probability distribution vector, the term-subject probability distribution vector and the term-text set probability distribution vectorwThe spatial location coordinates of the data user and the query-topic probability distribution vector QwGenerating a query vector Q in a combined manner, encrypting the query vector by adopting a key capable of searching an encryption algorithm to obtain a query trapdoor, and sending the query trapdoor to a cloud server;
and the query module is used for the cloud server to perform mixed similarity calculation on the ciphertext indexes of the spatial objects by using the query trapdoor in the query module, and after the similarities are sequenced, the encrypted files of the first k objects are sent to the data user, and the data user decrypts the received ciphertext data.
According to the ciphertext space keyword retrieval method with semantic understanding, provided by the invention, a data owner encrypts the space data and indexes thereof before outsourcing the space data and the indexes thereof to a cloud server, so that the security and privacy of the text description and position coordinates of the space object are ensured. The data user generates the query trapdoor through encryption before sending the query statement, thereby protecting the query information. The whole scheme meets the requirement of outsourcing data privacy protection.
Secondly, the invention extracts semantic information in the space object and the query statement by utilizing the LDA topic model, combines the semantic information with the position coordinate, obtains the ciphertext index and the query trapdoor of the space object supporting the mixed query, and returns the ciphertext index and the query trapdoor to the space object which is in line with the query intention of the user and is close to the position of the user. Compared with the prior art of realizing keyword ciphertext retrieval based on a dictionary, the method for constructing the ciphertext index of the spatial object by adopting the topic probability distribution has the advantages of low calculation cost, high query efficiency and semantic perception support.
The method can be used for mixed query with a semantic perception function of the ciphertext indexes of the space objects in the cloud server by the user in the cloud storage background, simultaneously meets the query requirements of the user for distance and text, and can adjust the returned result on the condition of completely meeting the search intention of the user and being close to the position of the user under the condition of protecting the data security and the privacy.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (10)
1. A ciphertext space keyword retrieval method for semantic understanding is characterized by comprising the following steps:
step 1, generating an AES key and searching an encryption algorithm key;
step 2, extracting text-theme probability distribution vectors described by the space object text and word-theme probability distribution vectors of each word on the theme, determining text set-theme probability distribution vectors according to the text-theme probability distribution vectors, and determining word-text set probability distribution vectors of the words on the text set according to the text-theme probability distribution vectors and the word-theme probability distribution vectors;
step 3, constructing a plaintext index for each space object according to the text-theme probability distribution vector of the space object and the corresponding space position coordinate, and encrypting the plaintext index to form a ciphertext index;
step 4, extracting query-subject probability distribution vectors of keywords in query sentences according to the text set-subject probability distribution vectors, the word-subject probability distribution vectors and the word-text set probability distribution vectors obtained in the step 2, combining the spatial position coordinates of the data users with the query-subject probability distribution vectors to generate query vectors, and encrypting the query vectors by using a searchable encryption algorithm key to obtain query trapdoors;
and 5, determining the mixing similarity between the space object and the query statement according to the query trapdoor and the ciphertext indexes, sequencing, sending the encrypted data of the space object corresponding to the k ciphertext indexes with the top sequencing to a data user, and decrypting the encrypted data by the data user by using an AES (advanced encryption Standard) key.
2. The method for retrieving semantically understood ciphertext space keywords according to claim 1, wherein the step 2 uses a natural language processing model to extract a text-subject probability distribution vector V of the text description of each space object on each subjectDAnd a term-topic probability distribution vector V for each term on the respective topicK。
3. The method for searching the semantically understood ciphertext space keyword according to claim 1, wherein the method for determining the word-text set probability distribution vector in the step 2 is as follows:
text-topic probability distribution vector V for text description of all objects by data ownerDAdding and dividing the number of the objects to obtain a text set-theme probability distribution vector P reflecting the appearance of each theme in the text sett;
Data owner based on text set-topic probability distribution vector PtAnd a word-topic probability distribution vector V for each wordKCalculating a word-text set probability distribution vector P of each word appearing in the text setω。
4. The method for searching the semantically understood ciphertext space key words according to claim 1, wherein the method for constructing the ciphertext index in the step 3 is as follows:
adding spatial object position coordinates to a text-to-topic probability distribution vectorThen combined to form a plaintext index DiExpanding the dimensionality of the plaintext index, and adopting a searchable encryption algorithm key SK to carry out the expansion on the plaintext indexEncrypting to obtain ciphertext index I of spatial objecti。
5. The plaintext index splitting and encrypting method for ciphertext spatial key retrieval method according to claim 4, wherein the augmented plaintext index isWhen encrypting, firstly, the encryption is divided, and then the encryption is carried outAndrespectively encrypting;
the segmentation rule is as follows: if the jth bit of binary vector S in searchable encryption algorithm key SK is 0,andare all provided withIf the j-th bit of S is 1,andis set as two random numbers, the sum of which is
6. The method as claimed in claim 1, wherein the query-topic probability distribution vector Q of the keyword in the query sentence in step 4 iswThe determination method of (2) is as follows:
wherein, PtIs a text set-topic probability distribution vector; pwIs a word-text set probability distribution vector; mKIs a word-topic probability distribution matrix, QdTo query a set of keywords, | QdL is the number of the query key words; the o-symbol is the hadamard product between the vectors.
7. The method for segmenting and encrypting the query vector in the method for retrieving the semantically understood ciphertext space keyword as claimed in claim 6, wherein the encryption method of the query vector Q in the step 4 is as follows,
first-choice expanding dimensionality of query vector Q to obtain expanded query vectorThen to the query vectorSegmenting to obtain segmented query vectorsAndfinally, encrypting the divided query vectors respectively to obtain query trapdoors;
the segmentation rule is as follows: if the ith bit of binary vector S in searchable encryption algorithm key SK is 1,andare all provided withIf the ith bit of S is 0,andis set as two random numbers, the sum of which is
8. The method for searching the semantically understood ciphertext space keywords according to claim 7, wherein the method for calculating the hybrid similarity in the step 5 is as follows:
wherein:to query for trapdoors;a ciphertext index for an ith spatial object; m1And M2Two reversible matrixes in a searchable encryption algorithm key SK;
9. The semantically understood ciphertext spatial key word retrieval method according to claim 1, wherein the ciphertext data is encrypted by performing AES encryption on the name, the geographic position coordinates and the text description data of each spatial object by using an AES key to form encrypted data.
10. The system for searching the semantically understood ciphertext space keyword is characterized by comprising a key module, a semantic information extraction module, an encryption index construction module, a trapdoor generation module and a query module;
the key module is used for generating an AES key and a searchable encryption algorithm key by a data owner and sending the searchable encryption algorithm key to a data user;
the semantic information extraction module is used for extracting a text-theme probability distribution vector of each space object text appearing under each theme and a word-theme probability distribution vector of each word appearing on each theme by using a natural language processing model, calculating a text set-theme probability distribution vector according to the text-theme probability distribution vector, and determining the word-text set probability distribution vector of each word appearing in the text set according to the text set-theme probability distribution vector and the word-theme probability distribution vector;
the encryption index construction module is used for constructing a plaintext index for each space object by a data owner according to the text-theme probability distribution vector and the space position coordinates of the space object, encrypting the plaintext index by utilizing a searchable encryption algorithm, simultaneously carrying out AES (advanced encryption standard) encryption on data of each space object, and finally sending a ciphertext index and ciphertext data formed by encryption to the cloud server;
a trap door generation module for the data user to extract the query-subject probability distribution vector Q of the query sentence according to the text set-subject probability distribution vector, the term-subject probability distribution vector and the term-text set probability distribution vectorwThe spatial location coordinates of the data user and the query-topic probability distribution vector QwGenerating a query vector Q in a combined manner, encrypting the query vector by adopting a key capable of searching an encryption algorithm to obtain a query trapdoor, and sending the query trapdoor to a cloud server;
and the query module is used for the cloud server to perform mixed similarity calculation on the query trapdoor and the ciphertext indexes of all the space objects, perform sequencing and send the encrypted files of k objects before sequencing to the data user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011135390.5A CN112257455B (en) | 2020-10-21 | 2020-10-21 | Semantic understanding ciphertext space keyword retrieval method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011135390.5A CN112257455B (en) | 2020-10-21 | 2020-10-21 | Semantic understanding ciphertext space keyword retrieval method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112257455A true CN112257455A (en) | 2021-01-22 |
CN112257455B CN112257455B (en) | 2024-04-30 |
Family
ID=74264582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011135390.5A Active CN112257455B (en) | 2020-10-21 | 2020-10-21 | Semantic understanding ciphertext space keyword retrieval method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112257455B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158087A (en) * | 2021-04-09 | 2021-07-23 | 深圳前海微众银行股份有限公司 | Query method and device for space text |
CN113254743A (en) * | 2021-05-31 | 2021-08-13 | 西安电子科技大学 | Secure semantic perception search method for dynamic spatial data in Internet of vehicles |
CN113434895A (en) * | 2021-08-27 | 2021-09-24 | 平安科技(深圳)有限公司 | Text decryption method, device, equipment and storage medium |
CN114398660A (en) * | 2021-11-29 | 2022-04-26 | 北京航空航天大学 | High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE |
WO2023065477A1 (en) * | 2021-10-18 | 2023-04-27 | 深圳前海微众银行股份有限公司 | Spatial text query method and apparatus |
CN118264482A (en) * | 2024-05-24 | 2024-06-28 | 杭州宇泛智能科技股份有限公司 | File semantic information fusion one-text one-secret security encryption method and device |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006209649A (en) * | 2005-01-31 | 2006-08-10 | Nec Corp | Confidential document retrieval system, confidential document retrieval method and confidential document retrieval program |
CN105743888A (en) * | 2016-01-22 | 2016-07-06 | 河南理工大学 | Agent re-encryption scheme based on keyword research |
CN106326360A (en) * | 2016-08-10 | 2017-01-11 | 武汉科技大学 | Fuzzy multi-keyword retrieval method of encrypted data in cloud environment |
US20170078251A1 (en) * | 2015-09-11 | 2017-03-16 | Skyhigh Networks, Inc. | Wildcard search in encrypted text using order preserving encryption |
US9679155B1 (en) * | 2015-06-12 | 2017-06-13 | Skyhigh Networks, Inc. | Prefix search in encrypted text |
CN108228849A (en) * | 2018-01-10 | 2018-06-29 | 浙江理工大学 | Ciphertext sorted search method based on classification packet index in cloud network |
CN108647529A (en) * | 2018-05-09 | 2018-10-12 | 上海海事大学 | A kind of semantic-based multi-key word sorted search intimacy protection system and method |
CN109063509A (en) * | 2018-08-07 | 2018-12-21 | 上海海事大学 | It is a kind of that encryption method can search for based on keywords semantics sequence |
CN109271485A (en) * | 2018-09-19 | 2019-01-25 | 南京邮电大学 | It is a kind of to support semantic cloud environment encrypted document ordering searching method |
CN109471964A (en) * | 2018-10-23 | 2019-03-15 | 哈尔滨工程大学 | A kind of fuzzy multi-key word based on synset can search for encryption method |
CN109739945A (en) * | 2018-12-13 | 2019-05-10 | 南京邮电大学 | A kind of multi-key word ciphertext ordering searching method based on hybrid index |
CN109992995A (en) * | 2019-03-05 | 2019-07-09 | 华南理工大学 | A kind of protection of support position and inquiry privacy can search for encryption method |
CN110222012A (en) * | 2019-06-08 | 2019-09-10 | 西安电子科技大学 | Data cryptogram search method based on fine granularity sequence under sole user's environment |
CN110222081A (en) * | 2019-06-08 | 2019-09-10 | 西安电子科技大学 | Data cryptogram search method based on fine granularity sequence under multi-user environment |
CN110727951A (en) * | 2019-10-14 | 2020-01-24 | 桂林电子科技大学 | Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function |
-
2020
- 2020-10-21 CN CN202011135390.5A patent/CN112257455B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006209649A (en) * | 2005-01-31 | 2006-08-10 | Nec Corp | Confidential document retrieval system, confidential document retrieval method and confidential document retrieval program |
US9679155B1 (en) * | 2015-06-12 | 2017-06-13 | Skyhigh Networks, Inc. | Prefix search in encrypted text |
US20170078251A1 (en) * | 2015-09-11 | 2017-03-16 | Skyhigh Networks, Inc. | Wildcard search in encrypted text using order preserving encryption |
CN105743888A (en) * | 2016-01-22 | 2016-07-06 | 河南理工大学 | Agent re-encryption scheme based on keyword research |
CN106326360A (en) * | 2016-08-10 | 2017-01-11 | 武汉科技大学 | Fuzzy multi-keyword retrieval method of encrypted data in cloud environment |
CN108228849A (en) * | 2018-01-10 | 2018-06-29 | 浙江理工大学 | Ciphertext sorted search method based on classification packet index in cloud network |
CN108647529A (en) * | 2018-05-09 | 2018-10-12 | 上海海事大学 | A kind of semantic-based multi-key word sorted search intimacy protection system and method |
CN109063509A (en) * | 2018-08-07 | 2018-12-21 | 上海海事大学 | It is a kind of that encryption method can search for based on keywords semantics sequence |
CN109271485A (en) * | 2018-09-19 | 2019-01-25 | 南京邮电大学 | It is a kind of to support semantic cloud environment encrypted document ordering searching method |
CN109471964A (en) * | 2018-10-23 | 2019-03-15 | 哈尔滨工程大学 | A kind of fuzzy multi-key word based on synset can search for encryption method |
CN109739945A (en) * | 2018-12-13 | 2019-05-10 | 南京邮电大学 | A kind of multi-key word ciphertext ordering searching method based on hybrid index |
CN109992995A (en) * | 2019-03-05 | 2019-07-09 | 华南理工大学 | A kind of protection of support position and inquiry privacy can search for encryption method |
CN110222012A (en) * | 2019-06-08 | 2019-09-10 | 西安电子科技大学 | Data cryptogram search method based on fine granularity sequence under sole user's environment |
CN110222081A (en) * | 2019-06-08 | 2019-09-10 | 西安电子科技大学 | Data cryptogram search method based on fine granularity sequence under multi-user environment |
CN110727951A (en) * | 2019-10-14 | 2020-01-24 | 桂林电子科技大学 | Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158087A (en) * | 2021-04-09 | 2021-07-23 | 深圳前海微众银行股份有限公司 | Query method and device for space text |
CN113254743A (en) * | 2021-05-31 | 2021-08-13 | 西安电子科技大学 | Secure semantic perception search method for dynamic spatial data in Internet of vehicles |
CN113254743B (en) * | 2021-05-31 | 2022-12-09 | 西安电子科技大学 | Security semantic perception searching method for dynamic spatial data in Internet of vehicles |
CN113434895A (en) * | 2021-08-27 | 2021-09-24 | 平安科技(深圳)有限公司 | Text decryption method, device, equipment and storage medium |
CN113434895B (en) * | 2021-08-27 | 2021-11-23 | 平安科技(深圳)有限公司 | Text decryption method, device, equipment and storage medium |
WO2023065477A1 (en) * | 2021-10-18 | 2023-04-27 | 深圳前海微众银行股份有限公司 | Spatial text query method and apparatus |
CN114398660A (en) * | 2021-11-29 | 2022-04-26 | 北京航空航天大学 | High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE |
CN118264482A (en) * | 2024-05-24 | 2024-06-28 | 杭州宇泛智能科技股份有限公司 | File semantic information fusion one-text one-secret security encryption method and device |
CN118264482B (en) * | 2024-05-24 | 2024-07-26 | 杭州宇泛智能科技股份有限公司 | File semantic information fusion one-text one-secret security encryption method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112257455B (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112257455B (en) | Semantic understanding ciphertext space keyword retrieval method and system | |
CN107220343B (en) | Chinese multi-keyword fuzzy sorting ciphertext searching method based on locality sensitive hashing | |
CN108712366B (en) | Searchable encryption method and system supporting word form and word meaning fuzzy retrieval in cloud environment | |
CN106951411B (en) | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing | |
CN107480163B (en) | Efficient ciphertext image retrieval method supporting privacy protection in cloud environment | |
Zhang et al. | SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems | |
CN111797409B (en) | Carrier-free information hiding method for big data Chinese text | |
CN108647529A (en) | A kind of semantic-based multi-key word sorted search intimacy protection system and method | |
CN109992995B (en) | Searchable encryption method supporting location protection and privacy inquiry | |
CN109739945B (en) | Multi-keyword ciphertext sorting and searching method based on mixed index | |
CN103927340A (en) | Ciphertext retrieval method | |
CN116881739B (en) | Ciphertext security retrieval method oriented to similarity of spatial keywords | |
CN109255244B (en) | Data encryption method and device and data encryption retrieval system | |
CN111859421B (en) | Word vector-based multi-keyword ciphertext storage and retrieval method and system | |
Long et al. | Coverless information hiding method based on web text | |
Han et al. | Unified neural topic model via contrastive learning and term weighting | |
CN108829714A (en) | A kind of ciphertext data multi-key word searches for method generally | |
KR102526055B1 (en) | Device and method for embedding relational table | |
CN116821965A (en) | Personalized retrieval method | |
CN109271485B (en) | Cloud environment encrypted document sequencing and searching method supporting semantics | |
CN109165520B (en) | Data encryption method and device and data encryption retrieval system | |
CN114528370B (en) | Dynamic multi-keyword fuzzy ordering searching method and system | |
CN111966778B (en) | Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index | |
CN116244453A (en) | Efficient encrypted image retrieval method based on neural network | |
CN114398660A (en) | High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |