CN112257455A - Semantic-understanding ciphertext space keyword retrieval method and system - Google Patents

Semantic-understanding ciphertext space keyword retrieval method and system Download PDF

Info

Publication number
CN112257455A
CN112257455A CN202011135390.5A CN202011135390A CN112257455A CN 112257455 A CN112257455 A CN 112257455A CN 202011135390 A CN202011135390 A CN 202011135390A CN 112257455 A CN112257455 A CN 112257455A
Authority
CN
China
Prior art keywords
query
probability distribution
text
ciphertext
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011135390.5A
Other languages
Chinese (zh)
Other versions
CN112257455B (en
Inventor
马建峰
李佳忆
苗银宾
杨帆
李颖莹
马卓然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202011135390.5A priority Critical patent/CN112257455B/en
Publication of CN112257455A publication Critical patent/CN112257455A/en
Application granted granted Critical
Publication of CN112257455B publication Critical patent/CN112257455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a space keyword ciphertext retrieval method and a space keyword ciphertext retrieval system for semantic understanding. By extracting the semantic features of the space object and the user query, the user can query the space object which is in accordance with the self query intention and is close to the self query intention on the ciphertext. The ciphertext index construction method adopted by the invention improves the precision of ciphertext space keyword retrieval and simultaneously meets the query requirements of users for distance and text. In addition, the ciphertext query algorithm of the scheme improves the query efficiency while ensuring the security and privacy of the space object data and the user retrieval information.

Description

Semantic-understanding ciphertext space keyword retrieval method and system
Technical Field
The invention relates to the technical field of searchable encryption, in particular to a method and a system for searching ciphertext space keywords with semantic understanding.
Background
The spatial key word retrieval involves a large amount of spatial object data, and the query process needs to consume a large amount of computing overhead. At this time, the data owner often chooses to outsource the space object data to the cloud server, and the cloud server stores and calculates the space object data. However, since the data owner loses direct control over the packet data, data security and privacy cannot be guaranteed. In particular, the spatial object data includes sensitive information such as position coordinates, and it is essential to encrypt the spatial object data.
The traditional space keyword query algorithm only aims at plaintext data, and a user cannot directly query ciphertext data stored in a cloud server. Secondly, the traditional spatial keyword query algorithm cannot extract semantic information of keywords retrieved by a user, so that the query intention of the user cannot be obtained, and the query accuracy is not high.
Qianxinghu proposed a semantic understanding-based spatial keyword query method in its published paper, "semantic understanding-based spatial keyword query" (Suzhou university. 2018). The method adds semantic information extraction of the text description of the spatial object on the basis of the traditional spatial keyword query. The specific method is that aiming at text semantics, a latent Dirichlet distributed topic model (LDA model) is used for extracting text semantic features. However, the method only supports plaintext inquiry, and the privacy of the data owner cannot be protected.
The data ciphertext query method based on fine-grained sequencing in a single user environment is disclosed in a patent document ' a data ciphertext query method based on fine-grained sequencing in a single user environment ' applied by the university of electronic science and technology of xi ' an, and the method has the following defects: semantic information in the query of the data user cannot be extracted, so that the search accuracy is limited. And the document index in the scheme is a vector with the same length as the dictionary. The dimension is large, so that the calculation cost is large and the query efficiency is low.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a ciphertext space keyword retrieval method for semantic understanding.
The invention is realized by the following technical scheme:
a ciphertext space keyword retrieval method for semantic understanding comprises the following steps:
step 1, generating an AES key and searching an encryption algorithm key;
step 2, extracting text-theme probability distribution vectors described by the space object text and word-theme probability distribution vectors of each word on the theme, determining text set-theme probability distribution vectors according to the text-theme probability distribution vectors, and determining word-text set probability distribution vectors of the words on the text set according to the text-theme probability distribution vectors and the word-theme probability distribution vectors;
step 3, constructing a plaintext index for each space object according to the text-theme probability distribution vector of the space object and the corresponding space position coordinate, and encrypting the plaintext index to form a ciphertext index;
step 4, extracting query-subject probability distribution vectors of keywords in query sentences according to the text set-subject probability distribution vectors, the word-subject probability distribution vectors and the word-text set probability distribution vectors obtained in the step 2, combining the spatial position coordinates of the data users with the query-subject probability distribution vectors to generate query vectors, and encrypting the query vectors by using a searchable encryption algorithm key to obtain query trapdoors;
and 5, determining the mixing similarity between the space object and the query statement according to the query trapdoor and the ciphertext indexes, sequencing, sending the encrypted data of the space object corresponding to the k ciphertext indexes with the top sequencing to a data user, and decrypting the encrypted data by the data user by using an AES (advanced encryption Standard) key.
Preferably, step 2 extracts a text-topic probability distribution vector V of the text description of each spatial object on each topic using a natural language processing modelDAnd a term-topic probability distribution vector V for each term on the respective topicK
Preferably, the method for determining the probability distribution vector of the word-text set in step 2 is as follows:
text-topic probability distribution vector V for text description of all objects by data ownerDAdding and dividing the number of the objects to obtain a text set-theme probability distribution vector P reflecting the appearance of each theme in the text sett
Data owner based on text set-topic probability distribution vector PtAnd a word-topic probability distribution vector V for each wordKCalculating a word-text set probability distribution vector P of each word appearing in the text setω
Preferably, the method for constructing the ciphertext index in step 3 is as follows:
adding spatial object position coordinates to a text-to-topic probability distribution vector
Figure BDA0002736366360000031
Then combined to form a plaintext index DiExpanding the dimensionality of the plaintext index, and adopting a searchable encryption algorithm key SK to carry out the expansion on the plaintext index
Figure BDA0002736366360000032
Encrypting to obtain ciphertext index I of spatial objecti
Preferably, the augmented plaintext index
Figure BDA0002736366360000033
When encrypting, firstly, the encryption is divided, and then the encryption is carried out
Figure BDA0002736366360000034
And
Figure BDA0002736366360000035
respectively encrypting;
the segmentation rule is as follows: if the jth bit of binary vector S in searchable encryption algorithm key SK is 0,
Figure BDA0002736366360000036
and
Figure BDA0002736366360000037
are all provided with
Figure BDA0002736366360000038
If the j-th bit of S is 1,
Figure BDA0002736366360000039
and
Figure BDA00027363663600000310
is set as two random numbers, the sum of which is
Figure BDA00027363663600000311
The encryption process is as follows: using { M in searchable encryption algorithm key SK1,M2Get dot products separately
Figure BDA00027363663600000312
Get each space object oiIs used for indexing the ciphertext
Figure BDA00027363663600000313
Preferably, the query-topic probability distribution vector Q of the keywords in the query statement in step 4wThe determination method of (2) is as follows:
Figure BDA0002736366360000041
wherein, PtFor text set-topic probability distributionVector quantity; pwIs a word-text set probability distribution vector; mKIs a word-topic probability distribution matrix, QdTo query a set of keywords, | QdL is the number of the query key words; the o-symbol is the hadamard product between the vectors.
Preferably, the encryption method of the query vector Q in step 4 is as follows,
first-choice expanding dimensionality of query vector Q to obtain expanded query vector
Figure BDA0002736366360000042
Then to the query vector
Figure BDA0002736366360000043
Segmenting to obtain segmented query vectors
Figure BDA0002736366360000044
And
Figure BDA0002736366360000045
finally, encrypting the divided query vectors respectively to obtain query trapdoors;
the segmentation rule is as follows: if the ith bit of binary vector S in searchable encryption algorithm key SK is 1,
Figure BDA0002736366360000046
and
Figure BDA0002736366360000047
are all provided with
Figure BDA0002736366360000048
If the ith bit of S is 0,
Figure BDA0002736366360000049
and
Figure BDA00027363663600000410
is set as two random numbers, the sum of which is
Figure BDA00027363663600000411
The encryption process is as follows: using { M in searchable encryption algorithm key SK1,M2Get dot products separately
Figure BDA00027363663600000412
Obtaining a query trapdoor
Figure BDA00027363663600000413
Preferably, the method for calculating the mixture similarity in step 5 is as follows:
Figure BDA00027363663600000414
wherein:
Figure BDA00027363663600000415
to query for trapdoors;
Figure BDA00027363663600000416
a ciphertext index for an ith spatial object; m1And M2Two invertible matrices in the searchable encryption algorithm key SK.
Figure BDA00027363663600000417
Equal to the semantic relatedness of the keyword and the ith spatial object in the query statement, (| λ)i||2-2λiλq+||λq| | l) is equal to the square of the euclidean distance between the data user query location and the spatial object coordinates, δ being the query weight.
Preferably, the ciphertext data is encrypted by AES by using an AES key to perform AES encryption on the name, the geographical location coordinate, and the text description data of each spatial object.
A system of a ciphertext space keyword retrieval method for semantic understanding comprises a key module, a semantic information extraction module, an encryption index construction module, a trapdoor generation module and a query module;
the key module is used for generating an AES key and a searchable encryption algorithm key by a data owner and sending the searchable encryption algorithm key to a data user;
the semantic information extraction module is used for extracting a text-theme probability distribution vector of each space object text appearing under each theme and a word-theme probability distribution vector of each word appearing on each theme by using a natural language processing model, calculating a text set-theme probability distribution vector according to the text-theme probability distribution vector, and determining the word-text set probability distribution vector of each word appearing in the text set according to the text set-theme probability distribution vector and the word-theme probability distribution vector;
the encryption index construction module is used for constructing a plaintext index for each space object by a data owner according to the text-theme probability distribution vector and the space position coordinates of the space object, encrypting the plaintext index by utilizing a searchable encryption algorithm, simultaneously carrying out AES (advanced encryption standard) encryption on data of each space object, and finally sending a ciphertext index and ciphertext data formed by encryption to the cloud server;
a trap door generation module for the data user to extract the query-subject probability distribution vector Q of the query sentence according to the text set-subject probability distribution vector, the term-subject probability distribution vector and the term-text set probability distribution vectorwThe spatial location coordinates of the data user and the query-topic probability distribution vector QwGenerating a query vector Q in a combined manner, encrypting the query vector by adopting a key capable of searching an encryption algorithm to obtain a query trapdoor, and sending the query trapdoor to a cloud server;
and the query module is used for the cloud server to perform mixed similarity calculation on the query trapdoor and the ciphertext indexes of all the space objects, perform sequencing and send the encrypted files of k objects before sequencing to the data user. Compared with the prior art, the invention has the following beneficial technical effects:
the ciphertext space keyword retrieval method for semantic understanding provided by the invention constructs ciphertext indexes by utilizing a searchable encryption algorithm according to the text-subject probability distribution vector and the corresponding space position coordinate of a space object, ensures the security and privacy of the text description and the position coordinate of the space object, combines the space position coordinate of a data user with the query-subject probability distribution vector to generate a query vector, encrypts the query vector to generate a query trapdoor, and thereby protects query information. The whole scheme meets the requirement of outsourcing data privacy protection, the spatial object index is constructed by adopting the topic probability distribution of the text, and compared with the prior art for realizing keyword ciphertext retrieval based on a dictionary, the method has the advantages of low calculation cost, high query accuracy and support of semantic perception. By extracting the text description of the space object and the semantic features of the query sentences of the user, the user can query the space object which is in accordance with the query intention of the user and is close to the user on the ciphertext, the efficiency of ciphertext space keyword retrieval is improved, the safety and privacy of space text data and user retrieval information are ensured, and the query efficiency is improved.
Drawings
FIG. 1 is a flow chart of a retrieval method of the present invention;
FIG. 2 is a flow chart of spatial object index generation according to the present invention;
FIG. 3 is a flow chart of query trapdoor generation according to the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the attached drawings, which are illustrative, but not limiting, of the present invention.
Referring to fig. 1, a ciphertext space keyword retrieval method for semantic understanding includes the following steps:
step 1, the data owner generates an AES key skAESThe searchable encryption algorithm key SK is sent to the data user by the data owner;
AES key skAESThe method is used for AES encryption of space objects, and the searchable encryption algorithm key SK is used for establishing a ciphertext index and inquiring a trapdoor.
The searchable encryption algorithm key SK comprises a randomly generated binary vector S and a reversible matrix M1,M2. Wherein the vector S has a length of n +4, M1And M2Are all (n +4) × (n +4) square matrices.
Step 2, the data owner adopts an LDA topic model belonging to a natural language processing model to extract semantic features in the spatial object text description to obtain a text-topic probability distribution vector V corresponding to each spatial object text descriptionDAnd a term-topic probability distribution vector V for each term on the respective topicKAnd according to the topic probability distribution vector VDGet the text set-topic distribution vector PtAccording to PtAnd word probability distribution vector VKWord-text set probability distribution vector P for jointly determining occurrence of words in text setω
Referring to fig. 2, the specific process is as follows:
s2.1, preprocessing the text description of each space object by a data owner, firstly removing the punctuation marks, special characters and meaningless words in the text, then extracting word stems of the words, and finally vectorizing the text description;
s2.2 the data owner uses the preprocessed text to train an LDA (LatentDirichletAllocation) topic model, and the LDA topic model outputs probability distribution vectors (text-topic probability distribution vectors) V of the text description of each space object on each topicDAnd a probability distribution vector for each word on the respective topic (word-topic probability distribution vector) VK,VDAnd VkIs n-dimensional;
s2.3 data owner with text-topic probability distribution vector VDConstructing a text-topic probability distribution matrix M for row vectorsDWith word-topic probability distribution vector VKConstructing a word-topic probability distribution matrix M for row vectorsK
S2.4 subject probability distribution vector V of text description of all objects by data ownerDAdding and dividing the number of the objects to obtain a probability vector (text set-theme probability distribution vector) P reflecting the appearance of each theme in the text sett
S2.5 probability vector P of occurrence of data owner in text set according to subjecttAnd words-Topic probability distribution matrix MKCalculating a probability vector (word-text set probability distribution vector) P of each word appearing in the text setω
Pω=Pt·MK T
Step 3, according to the text-theme probability distribution vector V of the space objectDAnd (x, y) constructing a plaintext index for each space object, and segmenting and encrypting the index by using a searchable encryption algorithm key.
Specifically, the spatial object position coordinates are added to the topic probability distribution vector VDAnd then merging to form a plaintext index, expanding the merged vector to n +4 dimensions, and encrypting the expanded vector by adopting a searchable encryption algorithm key SK to obtain a ciphertext index of the space object.
The specific process is as follows (taking the ith space object as an example):
s3.1 data owner will be the ith spatial object oiCorresponding text-to-topic probability distribution vectors
Figure BDA0002736366360000081
With its spatial position coordinate lambdaiCombined as a vector (x, y)
Figure BDA0002736366360000082
And using the space object as a plaintext index of the space object;
s3.2 indexing the plaintext DiIs extended, i.e. at DiAdding-0.5 | | | lambda in sequence at the taili||2And 1, the augmented vector is noted
Figure BDA00027363663600000810
S3.3 data owner uses searchable encryption algorithm key SK to convert data vectors of spatial objects
Figure BDA0002736366360000083
The segmentation and encryption are performed to generate a ciphertext index for the ciphertext.
The segmentation rule is as follows: if the jth bit of S is 0,
Figure BDA0002736366360000084
and
Figure BDA0002736366360000085
are all provided with
Figure BDA0002736366360000086
If the j-th bit of S is 1,
Figure BDA0002736366360000087
and
Figure BDA0002736366360000088
is set as two sums
Figure BDA0002736366360000089
The random number of (2).
The encryption process is as follows: data owner uses M in searchable encryption algorithm key SK1,M2Separately encrypt
Figure BDA0002736366360000091
Get each space object oiIs used for indexing the ciphertext
Figure BDA0002736366360000092
S3.5, the data owner carries out AES encryption on the name, the geographic position coordinate and the text description data of each space object by using the AES key to form encrypted data;
and S3.6, uploading the encrypted data and the ciphertext index to a cloud server by the data owner.
S3.7 the data owner sends the word-subject probability distribution vector, the text set-subject probability distribution vector and the word-text set probability distribution vector to the data user as auxiliary vectors in subsequent calculations.
Step 4, the data user utilizes the training result (namely the auxiliary vector) of the LDA topic model to extract the semantic features in the query sentenceDeriving a probability vector (query-topic probability distribution vector) Q of occurrence of keywords on each topic in the query statementwThe user's coordinates are compared with a query-topic probability distribution vector QwAnd combining the generated query vector Q, and encrypting the query vector by adopting a searchable encryption algorithm key SK to obtain a query trapdoor.
Referring to fig. 3, the specific process is as follows:
s4.1 according to the word-text set probability distribution vector P of the word on the text setωComputing query-topic probability distribution vector Q for keywords in a query statementw
Figure BDA0002736366360000093
Wherein, PtIs a text set-topic probability distribution vector; pwIs a word-text set probability distribution vector; mKIs a term-topic probability distribution matrix; qdTo query a set of keywords, | QdL is the number of the query key words; the o symbol is a Hadamard product (Hadamard product) between the computed vectors.
S4.2 data consumers query keyword-subject probability distribution vector QwGeographic position coordinates λ with query pointq=(xq,yq) Combining to generate a query vector Q;
s4.3 the data user expands the query vector Q to n +4 dimensions, and the expanded query vector is marked as
Figure BDA0002736366360000101
Wherein
Figure BDA0002736366360000102
The n +3 th position of (1), the n +4 th position of-0.5 | | | lambdaq||2
Figure BDA0002736366360000103
S4.4, the data user sets the query weight delta according to the query preference (the space distance of emphasis or the similarity of emphasis texts), so as to adjust the query result.
S4.5 data consumers will query the vector using the binary vector S in the Key SK of the searchable encryption Algorithm
Figure BDA0002736366360000104
Split into two n +4 dimensional random vectors
Figure BDA0002736366360000105
The segmentation rule is as follows: if the j-th bit of S is 1,
Figure BDA0002736366360000106
and
Figure BDA0002736366360000107
are all provided with
Figure BDA0002736366360000108
If the jth bit of S is 0,
Figure BDA0002736366360000109
and
Figure BDA00027363663600001010
is set as two random numbers, the sum of which is
Figure BDA00027363663600001011
S4.6 reversible matrix { M ] in secret SK of searchable encryption algorithm for data user1,M2Pair of random vectors
Figure BDA00027363663600001012
Encrypting to obtain the trapdoor
Figure BDA00027363663600001013
And sending the generated trapdoor T to a cloud server.
Step 5, the cloud server indexes and searches the ciphertext of the space objectAnd the trapdoor is inquired to calculate the inner product, and the encrypted data of the space object corresponding to the k ciphertext indexes with the maximum calculation result is sent to a data user. Specifically, the cloud server determines the mixed similarity between the space object and the query statement according to the query trapdoor T and the ciphertext indexes I, sorts the mixed similarity of the ciphertext indexes from large to small, returns the space object encrypted data corresponding to the first k ranked ciphertext indexes to the data user, and the data user uses the AES key skAESAnd decrypting the received k ciphertext data to obtain corresponding space object plaintext information, namely the name, the geographic position and the text description of the space object. Wherein,
the calculation formula for calculating the index mixing similarity is as follows:
Figure BDA00027363663600001014
Figure BDA0002736366360000111
wherein,
Figure BDA0002736366360000112
representing the semantic relevance of the keyword and the ith spatial object in the query statement, (| λ)i||2-2λiλq+||λq| |) represents the square of the euclidean distance of the data user query location and spatial object coordinates. After the weight delta is adjusted, the cloud server obtains the mixed correlation degree of the user query and the space object.
A ciphertext space keyword retrieval system for semantic understanding comprises a key module, a semantic information extraction module, an encryption index construction module, a trapdoor generation module and a query module which are sequentially connected.
Key module, data owner generates AES key skAESAnd a searchable encryption algorithm key SK; the data owner sends the searchable encryption algorithm key to the data consumer.
A semantic information extraction module for extracting the probability (text-theme probability distribution vector) V of each space object text under each theme by using a natural language processing modelDAnd the probability of occurrence of each word on the respective topic (word-topic probability distribution vector) VK. And calculating a text set-subject probability distribution vector P according to the text-subject probability distribution vectort. According to PtAnd VKCalculating the probability vector (word-text set probability distribution vector) P of each word appearing in the text setω
The encryption index construction module is used for constructing a plaintext index for each space object by a data owner according to the text-theme probability distribution vector and the space position coordinates of the space object, encrypting the plaintext index by utilizing a searchable encryption algorithm, simultaneously carrying out AES (advanced encryption standard) encryption on data of each space object, and finally sending a ciphertext index and ciphertext data formed by encryption to the cloud server;
a trap door generation module for the data user to extract the query-subject probability distribution vector Q of the query sentence according to the text set-subject probability distribution vector, the term-subject probability distribution vector and the term-text set probability distribution vectorwThe spatial location coordinates of the data user and the query-topic probability distribution vector QwGenerating a query vector Q in a combined manner, encrypting the query vector by adopting a key capable of searching an encryption algorithm to obtain a query trapdoor, and sending the query trapdoor to a cloud server;
and the query module is used for the cloud server to perform mixed similarity calculation on the ciphertext indexes of the spatial objects by using the query trapdoor in the query module, and after the similarities are sequenced, the encrypted files of the first k objects are sent to the data user, and the data user decrypts the received ciphertext data.
According to the ciphertext space keyword retrieval method with semantic understanding, provided by the invention, a data owner encrypts the space data and indexes thereof before outsourcing the space data and the indexes thereof to a cloud server, so that the security and privacy of the text description and position coordinates of the space object are ensured. The data user generates the query trapdoor through encryption before sending the query statement, thereby protecting the query information. The whole scheme meets the requirement of outsourcing data privacy protection.
Secondly, the invention extracts semantic information in the space object and the query statement by utilizing the LDA topic model, combines the semantic information with the position coordinate, obtains the ciphertext index and the query trapdoor of the space object supporting the mixed query, and returns the ciphertext index and the query trapdoor to the space object which is in line with the query intention of the user and is close to the position of the user. Compared with the prior art of realizing keyword ciphertext retrieval based on a dictionary, the method for constructing the ciphertext index of the spatial object by adopting the topic probability distribution has the advantages of low calculation cost, high query efficiency and semantic perception support.
The method can be used for mixed query with a semantic perception function of the ciphertext indexes of the space objects in the cloud server by the user in the cloud storage background, simultaneously meets the query requirements of the user for distance and text, and can adjust the returned result on the condition of completely meeting the search intention of the user and being close to the position of the user under the condition of protecting the data security and the privacy.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A ciphertext space keyword retrieval method for semantic understanding is characterized by comprising the following steps:
step 1, generating an AES key and searching an encryption algorithm key;
step 2, extracting text-theme probability distribution vectors described by the space object text and word-theme probability distribution vectors of each word on the theme, determining text set-theme probability distribution vectors according to the text-theme probability distribution vectors, and determining word-text set probability distribution vectors of the words on the text set according to the text-theme probability distribution vectors and the word-theme probability distribution vectors;
step 3, constructing a plaintext index for each space object according to the text-theme probability distribution vector of the space object and the corresponding space position coordinate, and encrypting the plaintext index to form a ciphertext index;
step 4, extracting query-subject probability distribution vectors of keywords in query sentences according to the text set-subject probability distribution vectors, the word-subject probability distribution vectors and the word-text set probability distribution vectors obtained in the step 2, combining the spatial position coordinates of the data users with the query-subject probability distribution vectors to generate query vectors, and encrypting the query vectors by using a searchable encryption algorithm key to obtain query trapdoors;
and 5, determining the mixing similarity between the space object and the query statement according to the query trapdoor and the ciphertext indexes, sequencing, sending the encrypted data of the space object corresponding to the k ciphertext indexes with the top sequencing to a data user, and decrypting the encrypted data by the data user by using an AES (advanced encryption Standard) key.
2. The method for retrieving semantically understood ciphertext space keywords according to claim 1, wherein the step 2 uses a natural language processing model to extract a text-subject probability distribution vector V of the text description of each space object on each subjectDAnd a term-topic probability distribution vector V for each term on the respective topicK
3. The method for searching the semantically understood ciphertext space keyword according to claim 1, wherein the method for determining the word-text set probability distribution vector in the step 2 is as follows:
text-topic probability distribution vector V for text description of all objects by data ownerDAdding and dividing the number of the objects to obtain a text set-theme probability distribution vector P reflecting the appearance of each theme in the text sett
Data owner based on text set-topic probability distribution vector PtAnd a word-topic probability distribution vector V for each wordKCalculating a word-text set probability distribution vector P of each word appearing in the text setω
4. The method for searching the semantically understood ciphertext space key words according to claim 1, wherein the method for constructing the ciphertext index in the step 3 is as follows:
adding spatial object position coordinates to a text-to-topic probability distribution vector
Figure FDA0002736366350000021
Then combined to form a plaintext index DiExpanding the dimensionality of the plaintext index, and adopting a searchable encryption algorithm key SK to carry out the expansion on the plaintext index
Figure FDA0002736366350000022
Encrypting to obtain ciphertext index I of spatial objecti
5. The plaintext index splitting and encrypting method for ciphertext spatial key retrieval method according to claim 4, wherein the augmented plaintext index is
Figure FDA0002736366350000023
When encrypting, firstly, the encryption is divided, and then the encryption is carried out
Figure FDA0002736366350000024
And
Figure FDA0002736366350000025
respectively encrypting;
the segmentation rule is as follows: if the jth bit of binary vector S in searchable encryption algorithm key SK is 0,
Figure FDA0002736366350000026
and
Figure FDA0002736366350000027
are all provided with
Figure FDA0002736366350000028
If the j-th bit of S is 1,
Figure FDA0002736366350000029
and
Figure FDA00027363663500000210
is set as two random numbers, the sum of which is
Figure FDA00027363663500000211
The encryption process is as follows: using { M in searchable encryption algorithm key SK1,M2Get dot products separately
Figure FDA00027363663500000212
Get each space object oiIs used for indexing the ciphertext
Figure FDA00027363663500000213
6. The method as claimed in claim 1, wherein the query-topic probability distribution vector Q of the keyword in the query sentence in step 4 iswThe determination method of (2) is as follows:
Figure FDA0002736366350000031
wherein, PtIs a text set-topic probability distribution vector; pwIs a word-text set probability distribution vector; mKIs a word-topic probability distribution matrix, QdTo query a set of keywords, | QdL is the number of the query key words; the o-symbol is the hadamard product between the vectors.
7. The method for segmenting and encrypting the query vector in the method for retrieving the semantically understood ciphertext space keyword as claimed in claim 6, wherein the encryption method of the query vector Q in the step 4 is as follows,
first-choice expanding dimensionality of query vector Q to obtain expanded query vector
Figure FDA0002736366350000032
Then to the query vector
Figure FDA0002736366350000033
Segmenting to obtain segmented query vectors
Figure FDA0002736366350000034
And
Figure FDA0002736366350000035
finally, encrypting the divided query vectors respectively to obtain query trapdoors;
the segmentation rule is as follows: if the ith bit of binary vector S in searchable encryption algorithm key SK is 1,
Figure FDA0002736366350000036
and
Figure FDA0002736366350000037
are all provided with
Figure FDA0002736366350000038
If the ith bit of S is 0,
Figure FDA0002736366350000039
and
Figure FDA00027363663500000310
is set as two random numbers, the sum of which is
Figure FDA00027363663500000311
The encryption process is as follows: using { M in searchable encryption algorithm key SK1,M2Get dot products separately
Figure FDA00027363663500000312
Obtaining a query trapdoor
Figure FDA00027363663500000313
8. The method for searching the semantically understood ciphertext space keywords according to claim 7, wherein the method for calculating the hybrid similarity in the step 5 is as follows:
Figure FDA00027363663500000314
wherein:
Figure FDA00027363663500000315
to query for trapdoors;
Figure FDA00027363663500000316
a ciphertext index for an ith spatial object; m1And M2Two reversible matrixes in a searchable encryption algorithm key SK;
Figure FDA0002736366350000041
equal to the semantic relatedness of the keyword and the ith spatial object in the query statement, (| λ)i||2-2λiλq+||λq| | l) is equal to the square of the euclidean distance between the data user query location and the spatial object coordinates, δ being the query weight.
9. The semantically understood ciphertext spatial key word retrieval method according to claim 1, wherein the ciphertext data is encrypted by performing AES encryption on the name, the geographic position coordinates and the text description data of each spatial object by using an AES key to form encrypted data.
10. The system for searching the semantically understood ciphertext space keyword is characterized by comprising a key module, a semantic information extraction module, an encryption index construction module, a trapdoor generation module and a query module;
the key module is used for generating an AES key and a searchable encryption algorithm key by a data owner and sending the searchable encryption algorithm key to a data user;
the semantic information extraction module is used for extracting a text-theme probability distribution vector of each space object text appearing under each theme and a word-theme probability distribution vector of each word appearing on each theme by using a natural language processing model, calculating a text set-theme probability distribution vector according to the text-theme probability distribution vector, and determining the word-text set probability distribution vector of each word appearing in the text set according to the text set-theme probability distribution vector and the word-theme probability distribution vector;
the encryption index construction module is used for constructing a plaintext index for each space object by a data owner according to the text-theme probability distribution vector and the space position coordinates of the space object, encrypting the plaintext index by utilizing a searchable encryption algorithm, simultaneously carrying out AES (advanced encryption standard) encryption on data of each space object, and finally sending a ciphertext index and ciphertext data formed by encryption to the cloud server;
a trap door generation module for the data user to extract the query-subject probability distribution vector Q of the query sentence according to the text set-subject probability distribution vector, the term-subject probability distribution vector and the term-text set probability distribution vectorwThe spatial location coordinates of the data user and the query-topic probability distribution vector QwGenerating a query vector Q in a combined manner, encrypting the query vector by adopting a key capable of searching an encryption algorithm to obtain a query trapdoor, and sending the query trapdoor to a cloud server;
and the query module is used for the cloud server to perform mixed similarity calculation on the query trapdoor and the ciphertext indexes of all the space objects, perform sequencing and send the encrypted files of k objects before sequencing to the data user.
CN202011135390.5A 2020-10-21 2020-10-21 Semantic understanding ciphertext space keyword retrieval method and system Active CN112257455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011135390.5A CN112257455B (en) 2020-10-21 2020-10-21 Semantic understanding ciphertext space keyword retrieval method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011135390.5A CN112257455B (en) 2020-10-21 2020-10-21 Semantic understanding ciphertext space keyword retrieval method and system

Publications (2)

Publication Number Publication Date
CN112257455A true CN112257455A (en) 2021-01-22
CN112257455B CN112257455B (en) 2024-04-30

Family

ID=74264582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011135390.5A Active CN112257455B (en) 2020-10-21 2020-10-21 Semantic understanding ciphertext space keyword retrieval method and system

Country Status (1)

Country Link
CN (1) CN112257455B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158087A (en) * 2021-04-09 2021-07-23 深圳前海微众银行股份有限公司 Query method and device for space text
CN113254743A (en) * 2021-05-31 2021-08-13 西安电子科技大学 Secure semantic perception search method for dynamic spatial data in Internet of vehicles
CN113434895A (en) * 2021-08-27 2021-09-24 平安科技(深圳)有限公司 Text decryption method, device, equipment and storage medium
CN114398660A (en) * 2021-11-29 2022-04-26 北京航空航天大学 High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE
WO2023065477A1 (en) * 2021-10-18 2023-04-27 深圳前海微众银行股份有限公司 Spatial text query method and apparatus
CN118264482A (en) * 2024-05-24 2024-06-28 杭州宇泛智能科技股份有限公司 File semantic information fusion one-text one-secret security encryption method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006209649A (en) * 2005-01-31 2006-08-10 Nec Corp Confidential document retrieval system, confidential document retrieval method and confidential document retrieval program
CN105743888A (en) * 2016-01-22 2016-07-06 河南理工大学 Agent re-encryption scheme based on keyword research
CN106326360A (en) * 2016-08-10 2017-01-11 武汉科技大学 Fuzzy multi-keyword retrieval method of encrypted data in cloud environment
US20170078251A1 (en) * 2015-09-11 2017-03-16 Skyhigh Networks, Inc. Wildcard search in encrypted text using order preserving encryption
US9679155B1 (en) * 2015-06-12 2017-06-13 Skyhigh Networks, Inc. Prefix search in encrypted text
CN108228849A (en) * 2018-01-10 2018-06-29 浙江理工大学 Ciphertext sorted search method based on classification packet index in cloud network
CN108647529A (en) * 2018-05-09 2018-10-12 上海海事大学 A kind of semantic-based multi-key word sorted search intimacy protection system and method
CN109063509A (en) * 2018-08-07 2018-12-21 上海海事大学 It is a kind of that encryption method can search for based on keywords semantics sequence
CN109271485A (en) * 2018-09-19 2019-01-25 南京邮电大学 It is a kind of to support semantic cloud environment encrypted document ordering searching method
CN109471964A (en) * 2018-10-23 2019-03-15 哈尔滨工程大学 A kind of fuzzy multi-key word based on synset can search for encryption method
CN109739945A (en) * 2018-12-13 2019-05-10 南京邮电大学 A kind of multi-key word ciphertext ordering searching method based on hybrid index
CN109992995A (en) * 2019-03-05 2019-07-09 华南理工大学 A kind of protection of support position and inquiry privacy can search for encryption method
CN110222012A (en) * 2019-06-08 2019-09-10 西安电子科技大学 Data cryptogram search method based on fine granularity sequence under sole user's environment
CN110222081A (en) * 2019-06-08 2019-09-10 西安电子科技大学 Data cryptogram search method based on fine granularity sequence under multi-user environment
CN110727951A (en) * 2019-10-14 2020-01-24 桂林电子科技大学 Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006209649A (en) * 2005-01-31 2006-08-10 Nec Corp Confidential document retrieval system, confidential document retrieval method and confidential document retrieval program
US9679155B1 (en) * 2015-06-12 2017-06-13 Skyhigh Networks, Inc. Prefix search in encrypted text
US20170078251A1 (en) * 2015-09-11 2017-03-16 Skyhigh Networks, Inc. Wildcard search in encrypted text using order preserving encryption
CN105743888A (en) * 2016-01-22 2016-07-06 河南理工大学 Agent re-encryption scheme based on keyword research
CN106326360A (en) * 2016-08-10 2017-01-11 武汉科技大学 Fuzzy multi-keyword retrieval method of encrypted data in cloud environment
CN108228849A (en) * 2018-01-10 2018-06-29 浙江理工大学 Ciphertext sorted search method based on classification packet index in cloud network
CN108647529A (en) * 2018-05-09 2018-10-12 上海海事大学 A kind of semantic-based multi-key word sorted search intimacy protection system and method
CN109063509A (en) * 2018-08-07 2018-12-21 上海海事大学 It is a kind of that encryption method can search for based on keywords semantics sequence
CN109271485A (en) * 2018-09-19 2019-01-25 南京邮电大学 It is a kind of to support semantic cloud environment encrypted document ordering searching method
CN109471964A (en) * 2018-10-23 2019-03-15 哈尔滨工程大学 A kind of fuzzy multi-key word based on synset can search for encryption method
CN109739945A (en) * 2018-12-13 2019-05-10 南京邮电大学 A kind of multi-key word ciphertext ordering searching method based on hybrid index
CN109992995A (en) * 2019-03-05 2019-07-09 华南理工大学 A kind of protection of support position and inquiry privacy can search for encryption method
CN110222012A (en) * 2019-06-08 2019-09-10 西安电子科技大学 Data cryptogram search method based on fine granularity sequence under sole user's environment
CN110222081A (en) * 2019-06-08 2019-09-10 西安电子科技大学 Data cryptogram search method based on fine granularity sequence under multi-user environment
CN110727951A (en) * 2019-10-14 2020-01-24 桂林电子科技大学 Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158087A (en) * 2021-04-09 2021-07-23 深圳前海微众银行股份有限公司 Query method and device for space text
CN113254743A (en) * 2021-05-31 2021-08-13 西安电子科技大学 Secure semantic perception search method for dynamic spatial data in Internet of vehicles
CN113254743B (en) * 2021-05-31 2022-12-09 西安电子科技大学 Security semantic perception searching method for dynamic spatial data in Internet of vehicles
CN113434895A (en) * 2021-08-27 2021-09-24 平安科技(深圳)有限公司 Text decryption method, device, equipment and storage medium
CN113434895B (en) * 2021-08-27 2021-11-23 平安科技(深圳)有限公司 Text decryption method, device, equipment and storage medium
WO2023065477A1 (en) * 2021-10-18 2023-04-27 深圳前海微众银行股份有限公司 Spatial text query method and apparatus
CN114398660A (en) * 2021-11-29 2022-04-26 北京航空航天大学 High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE
CN118264482A (en) * 2024-05-24 2024-06-28 杭州宇泛智能科技股份有限公司 File semantic information fusion one-text one-secret security encryption method and device
CN118264482B (en) * 2024-05-24 2024-07-26 杭州宇泛智能科技股份有限公司 File semantic information fusion one-text one-secret security encryption method and device

Also Published As

Publication number Publication date
CN112257455B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN112257455B (en) Semantic understanding ciphertext space keyword retrieval method and system
CN107220343B (en) Chinese multi-keyword fuzzy sorting ciphertext searching method based on locality sensitive hashing
CN108712366B (en) Searchable encryption method and system supporting word form and word meaning fuzzy retrieval in cloud environment
CN106951411B (en) The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
CN107480163B (en) Efficient ciphertext image retrieval method supporting privacy protection in cloud environment
Zhang et al. SE-PPFM: A searchable encryption scheme supporting privacy-preserving fuzzy multikeyword in cloud systems
CN111797409B (en) Carrier-free information hiding method for big data Chinese text
CN108647529A (en) A kind of semantic-based multi-key word sorted search intimacy protection system and method
CN109992995B (en) Searchable encryption method supporting location protection and privacy inquiry
CN109739945B (en) Multi-keyword ciphertext sorting and searching method based on mixed index
CN103927340A (en) Ciphertext retrieval method
CN116881739B (en) Ciphertext security retrieval method oriented to similarity of spatial keywords
CN109255244B (en) Data encryption method and device and data encryption retrieval system
CN111859421B (en) Word vector-based multi-keyword ciphertext storage and retrieval method and system
Long et al. Coverless information hiding method based on web text
Han et al. Unified neural topic model via contrastive learning and term weighting
CN108829714A (en) A kind of ciphertext data multi-key word searches for method generally
KR102526055B1 (en) Device and method for embedding relational table
CN116821965A (en) Personalized retrieval method
CN109271485B (en) Cloud environment encrypted document sequencing and searching method supporting semantics
CN109165520B (en) Data encryption method and device and data encryption retrieval system
CN114528370B (en) Dynamic multi-keyword fuzzy ordering searching method and system
CN111966778B (en) Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index
CN116244453A (en) Efficient encrypted image retrieval method based on neural network
CN114398660A (en) High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant