CN111552988A - Monte Carlo sampling-based forward safety k neighbor retrieval method and system - Google Patents

Monte Carlo sampling-based forward safety k neighbor retrieval method and system Download PDF

Info

Publication number
CN111552988A
CN111552988A CN202010319210.2A CN202010319210A CN111552988A CN 111552988 A CN111552988 A CN 111552988A CN 202010319210 A CN202010319210 A CN 202010319210A CN 111552988 A CN111552988 A CN 111552988A
Authority
CN
China
Prior art keywords
dictionary
data
key
search
ciphertext
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010319210.2A
Other languages
Chinese (zh)
Other versions
CN111552988B (en
Inventor
彭延国
王腾宇
吕桢
王龙
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010319210.2A priority Critical patent/CN111552988B/en
Publication of CN111552988A publication Critical patent/CN111552988A/en
Application granted granted Critical
Publication of CN111552988B publication Critical patent/CN111552988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a forward safe k neighbor retrieval method and a system based on Monte Carlo sampling, wherein the method comprises the following steps: acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets; encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary; finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary; performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data; and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set. The forward security k neighbor retrieval method provided by the invention ensures the forward security attribute of the ciphertext data uploaded to the server by a data provider by using an encryption scheme of proxy re-encryption and the ciphertext data in the bidirectional dictionary storage server.

Description

Monte Carlo sampling-based forward safety k neighbor retrieval method and system
Technical Field
The invention belongs to the technical field of data security, and particularly relates to a forward security k nearest neighbor retrieval method and system based on Monte Carlo sampling.
Background
In the era of big data and cloud computing, more and more data providers select to upload data to a cloud server for storage and use the data by a payment method. However, the cloud server is often not trusted, and if the data is uploaded directly without encryption, a large amount of private data of a data provider is leaked. Therefore, the safe and efficient encryption technology applied to scenes with different requirements is designed, and the capacity of dynamically updating the ciphertext data is supported, so that the method is a hotspot of the current cloud data safety research.
The k-nearest neighbor search is an important technology in the current big data field, and has a large application range, such as nearest neighbor search of spatial data, a recommendation system, and the like. Conventional k-nearest neighbor retrieval technologies, such as nearest neighbor graphs and Local Sensitive Hash (LSH), are retrieval technologies in plain text, and when directly uploading to the cloud, a large amount of privacy information of a user is leaked, so that data needs to be encrypted and then uploaded to a server, and k-nearest neighbor retrieval and dynamic updating capabilities of ciphertext data are retained. In solving the problem of k-nearest neighbor search, the LSH method is an important solution, and performs dimensionality reduction on data points through a locality sensitive hash function, and then compresses the data points into corresponding hash buckets, wherein the probability that points closer to each other are compressed into the same hash bucket is higher, and the probability that points farther from each other are compressed into the same hash bucket is lower.
However, although the existing k-nearest neighbor search can ensure that data is encrypted and supports the capabilities of nearest neighbor query and dynamic update, forward security attributes are ignored, that is, a data provider usually allows a paying user to use ciphertext data outsourced to a cloud server by the provider in a paying manner, but when the term of user payment is expired, the newly added data in the server can still be acquired and decrypted by using a token obtained before; furthermore, since data distribution tends to be uneven, this results in a large difference in the number of points within each hash bucket when using the LSH method, thereby reducing data security.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a forward secure k-nearest neighbor search method and system based on monte carlo sampling. The technical problem to be solved by the invention is realized by the following technical scheme:
a forward safe k neighbor retrieval method based on Monte Carlo sampling comprises the following steps:
acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets;
encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary;
finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary;
performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data;
and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
In one embodiment of the present invention, acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary includes:
acquiring a data set, and generating the data set into a plurality of uniform hash buckets according to an LSH function based on Monte Carlo sampling;
greedy merging is carried out on the uniform hash buckets, and false points are added to obtain a plurality of complex buckets with the same data volume;
and generating a corresponding data set dictionary according to the complex barrel.
In an embodiment of the present invention, the encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary includes:
generating a first key pair, a second key pair and a corresponding encrypted data set by adopting a proxy re-encryption algorithm for each complex bucket;
generating a first key dictionary according to the first key pairs and the second key pairs corresponding to all the complex buckets;
initializing a bidirectional dictionary and storing the encrypted data sets corresponding to all the complex buckets in the bidirectional dictionary.
In an embodiment of the present invention, finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary, includes:
acquiring the searching quantity of points to be searched and adjacent points;
finding a complex bucket corresponding to the point to be searched in the data set dictionary, and generating a third key pair and a corresponding third ciphertext;
finding a first key pair and a second key pair corresponding to the point to be searched in the first key dictionary, and generating a corresponding first ciphertext and a corresponding second ciphertext;
re-encrypting the first key pair and the second key pair according to the third key pair to generate a first re-encryption key;
generating a search token according to the first ciphertext, the second ciphertext, the third ciphertext and the first re-encryption key;
and forming a second key dictionary according to the third key pair and the corresponding third ciphertext.
In an embodiment of the present invention, after forming a second key dictionary according to the third key pair and a corresponding third ciphertext thereof, the method further includes:
updating the first key dictionary.
In an embodiment of the present invention, performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data includes:
generating a temporary search dictionary according to the search token;
searching data in the bidirectional dictionary according to the search token, and re-encrypting a search result to obtain a search result data set;
and inserting the search result data set into the temporary search dictionary to obtain ciphertext data.
In an embodiment of the present invention, after searching data in the bidirectional dictionary according to the search token and re-encrypting the search result to obtain a search result data set, the method further includes:
and updating the bidirectional dictionary.
In an embodiment of the present invention, decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set includes:
finding a third key pair corresponding to the complex bucket of the points to be searched in the second key dictionary;
decrypting the ciphertext data according to the third key to obtain a plaintext point set to be screened;
if the number of the to-be-screened plaintext point sets is judged to be smaller than a preset search number k, expanding a search range to 2r-1 complex buckets, and repeatedly searching in the bidirectional dictionary until the number of the to-be-screened plaintext point sets is larger than or equal to the preset search number k; wherein r and k are positive integers;
and selecting the first k nearest points as a final plaintext point set.
Another embodiment of the present invention further provides a forward secure k-nearest neighbor search method based on monte carlo sampling, including:
acquiring a data set and respectively preprocessing the data set according to a plurality of LSH functions to obtain a plurality of data set dictionaries;
performing proxy re-encryption on the plurality of data set dictionaries respectively to obtain a plurality of bidirectional dictionaries;
respectively carrying out data search on each bidirectional dictionary to obtain a plurality of groups of ciphertext data;
carrying out re-encryption processing on the plurality of groups of ciphertext data to obtain ciphertext data with a plurality of unified keys;
taking intersection from the ciphertext data with the plurality of unified keys to obtain final ciphertext data;
and decrypting the final ciphertext data to obtain a plaintext point set.
Yet another embodiment of the present invention further provides a forward secure k-nearest neighbor retrieval system based on monte carlo sampling, comprising: data set supplying means, searching means, and decrypting means, wherein,
the data set supplying apparatus includes:
the data acquisition module is used for acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary;
the initialization module is used for encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary and transmitting the bidirectional dictionary to the searching device;
the encryption module is used for finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary, carrying out re-encryption processing to obtain a search token and a second key dictionary, and simultaneously transmitting the search token and the second key dictionary to the decryption device;
the searching device comprises a data searching module, a decryption device and a searching module, wherein the data searching module is used for searching data in the bidirectional dictionary according to the searching token to obtain ciphertext data and transmitting the ciphertext data to the decryption device;
the decryption apparatus includes:
the first data receiving module is used for receiving the search token and uploading the search token to the search module so as to enable the search module to search data;
and the second data receiving module is used for receiving the second key dictionary and the ciphertext data and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
The invention has the beneficial effects that:
1. the Monte Carlo sampling-based forward security k neighbor retrieval method provided by the invention ensures the forward security attribute of the ciphertext data uploaded to the server by a data provider by using an encryption scheme of proxy re-encryption and the ciphertext data in the bidirectional dictionary storage server;
2. the Monte Carlo sampling-based forward safe k nearest neighbor retrieval method provided by the invention is based on the Monte Carlo sampling technology, the hash values obtained by mapping hash vectors of the sampled points of the data set are uniformly divided, a set of known density functions is designed to be uniformly subjected to local sensitive hash compression, and then complex buckets with the same number of data points are generated by greedy combination and addition of false points, so that the safety of data is further improved;
3. the forward safety k neighbor retrieval method based on Monte Carlo sampling provided by the invention can also be applied to multi-dimensional data, and can ensure the safety of the multi-dimensional data.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
Fig. 1 is a schematic diagram of a forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention;
fig. 2 is a schematic diagram of another forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention
Fig. 3 is a schematic structural diagram of a forward secure k-nearest neighbor retrieval system based on monte carlo sampling according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
Example one
Referring to fig. 1, fig. 1 is a schematic diagram of a forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention, including:
s1: acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets;
further, step S1 includes:
s1.1: acquiring a data set, and generating the data set into a plurality of uniform hash buckets according to an LSH function based on Monte Carlo sampling; the specific operation is as follows:
(1) first, for a given data set, knowing the probability density function as f (x, y), a suitable, easily modeled proposed distribution (e.g., a two-dimensional normal distribution) with probability density p (x, y) is selected, and a value M is large enough that Mp (x, y) ≧ f (x, y);
(2) according to the proposed distribution of the known model described above, a point (x) is randomly sampled0,y0) And from a uniform distribution of U ([0,1 ]]) Sampling to obtain a value u if satisfied
Figure BDA0002460711160000071
The sampling point is accepted, otherwise, the sampling point is rejected;
(3) repeatedly sampling according to actual needs until the number of points sampled reaches a proper required amount;
(4) given a code length
Figure BDA0002460711160000072
And one random vector α, where α survival21 and calculating the threshold value of the locality sensitive hash function
Figure BDA0002460711160000073
Carrying out p- α operation on each point in the point set obtained by sampling to obtain a projection value corresponding to each point, and then sequencing according to the size of the projection values from small to large;
(5) according to the number of the threshold values of the locality sensitive hash function, the number of the divided intervals is known to be
Figure BDA0002460711160000074
Then dividing the total number of the sampling points by the number of the current interval to obtain the number n of points of each interval of the sampling point set; then taking out n points from small to large according to the sorted projection values, and taking the maximum value of the previous interval as ciThe maximum value in this interval is ci+1And so on to determine each value in the set of limits;
(6) finally, obtaining LSH function based on Monte Carlo sampling according to the threshold valueNumber, i.e. h (p) ═ { i | ci-1<p·α≤ci}; wherein, c1=0。
(7) And generating a plurality of uniform hash buckets with basically same data quantity according to the LSH function.
S1.2: greedy combination is carried out on the uniform hash buckets, false points are added, a plurality of complex buckets with the same data volume are obtained, and corresponding data set dictionaries are generated, and the method specifically comprises the following steps:
(a) firstly, the partial sensitive hash function based on Monte Carlo sampling given above is used to calculate the corresponding hash value for all the points of our data set, wherein the points with equal hash value are stored in the corresponding hash bucket
Figure BDA0002460711160000081
In (1),
Figure BDA0002460711160000082
wherein p isi~pjIf and only if hi=hj
(b) Greedy merging of hash buckets is performed to generate complex buckets:
specifically, an LSH mapping dictionary Dic is first createdHComplex bucket for storing each point after merging
Figure BDA0002460711160000083
The hash buckets are initialized to be empty sets and arranged from small to large.
Then, taking a value Max containing the maximum number of data points in the hash buckets as a capacity standard of the complex bucket, initializing i to 1, j to 1, and performing the following loop: initialization BiIs empty, if B is presentiIf the number of points in is less than Max, then it will be
Figure BDA0002460711160000084
All points in the column are added with BiAnd (h) isj,Bi) Joining dictionary DicHIn (1), increasing the value of j
Figure BDA0002460711160000085
Entering the next judgment according to the value corresponding to the number of the midpoints; if at this time BiIf the number of the points in (1) is greater than or equal to Max, a judgment cycle is skipped, and i is added by 1.
When all hash buckets mentioned above
Figure BDA0002460711160000086
After all the integration into a complex bucket, the cycle is tripped out.
After the integration process is completed, the point number corresponding to the bucket with the maximum number in all the complex buckets needs to be selected, and false points are added to other buckets with the point number lower than the maximum point number by taking the point number as a boundary until the point number reaches the maximum point number, wherein the false points are easily distinguished from real data points, so that a plurality of complex buckets B with the same data volume are obtained.
S1.3: and generating a corresponding data set dictionary according to the complex barrel.
In particular, a complex bucket B containing all data sets is generated into a data set dictionary DicH
The Monte Carlo sampling-based forward secure k-nearest neighbor retrieval method provided by the embodiment is based on the Monte Carlo sampling technology, the hash values obtained by mapping hash vectors of the sampled points of the data set are uniformly divided, a set of known density functions is designed to perform uniform local sensitive hash compression, the number of the final hash buckets is basically the same, complex buckets with the same number of data points are generated by means of greedy combination and false point adding, and the data security is improved.
S2: encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary;
here, the proxy re-encryption technique and the bidirectional dictionary will be briefly described.
The proxy re-encryption technology mainly comprises six parts:
firstly, initializing a generation parameter prm;
(II) generating a pair of keys: (<pkA,skA>);
(III) Pair Key skATo skBRe-encryption to generate rkA→B
(IV) passing the plaintext data m through the key pkAEncrypt to generate ciphertext cA
(V) use the re-encryption key rkA→BTo cACarry out encryption to generate cB
(VI) decrypting the ciphertext with the corresponding key, say skADecryption of cA,skBDecryption of cBWherein the ciphertext can only be decrypted using the corresponding key.
Stored in the bidirectional dictionary is the set ei=(Δi (B)i (ID),datai) And the three elements respectively represent the number of the complex bucket, the id value corresponding to each point and the coordinate value of each point.
Bidirectional dictionary
Figure BDA0002460711160000091
Mainly composed of two parts DicIAnd DicFThe components are respectively used for storage and search, and the specific steps are as follows:
when storing, insert one (delta)i (B)i (ID),datai) Concretely, will be (Δ)i (B),datai) Insert into DicIIn, simultaneously handle (delta)i (ID)i (B),&(datai) Inserted Dic)FPerforming the following steps;
when searching, the number of the complex bucket is given, which can be at DicITo obtain a corresponding (Δ)(ID)Data), when given the id value of the data, will return the corresponding (Δ)(B),data);
And when updating, the corresponding coordinate value is found according to the id value of the data to update.
In this embodiment, step S2 specifically includes:
s2.1: generating a first key pair, a second key pair and a corresponding encrypted data set by adopting a proxy re-encryption algorithm for each complex bucket;
specifically, a safety parameter 1 is first selectedλGiven the hash function H, the proxy re-encryption algorithm PRE and a permutation pi, the data set has been stored in the corresponding complex bucket B and a data set dictionary Dic is generated, via step S1H
For each complex barrel BiGenerating two pairs of keys using a proxy re-encryption algorithm<cpki,cski>And<upki,uski>i.e. the first key pair, the second key pair, and simultaneously encrypt all data points to generate a corresponding encrypted data set, wherein the encryption method is Δ(ID)=π(ID),Δ(B)=H(B,cski),data=PRE.Enc(cpki,p)。
S2.2: generating a first key dictionary according to the first key pairs and the second key pairs corresponding to all the complex buckets;
in particular, according to two pairs of keys<cpki,cski>And<upki,uski>generating a key dictionary DickeyFor storing the corresponding key B in each complex bucketi,(<cpki,cski>,<upki,uski>,<npki=null,nski=null>) This key dictionary, the first key dictionary, is stored at the data provider side.
S2.3: initializing a bidirectional dictionary and storing the encrypted data sets corresponding to all the complex buckets in the bidirectional dictionary.
Specifically, a bidirectional dictionary is initialized, and all encrypted data sets in step S2.1 are stored therein to obtain a bidirectional dictionary
Figure BDA0002460711160000111
And uploading the bidirectional dictionary to a server for storage for subsequent searching.
S3: finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary; the method specifically comprises the following steps:
s3.1: acquiring the searching quantity of points to be searched and adjacent points;
specifically, a value of k, that is, the number of neighboring points to be searched, is given, and a point p to be searched is given.
S3.2: finding a complex bucket corresponding to the point to be searched in the data set dictionary, and generating a third key pair and a corresponding third ciphertext;
specifically, first, the dictionary Dic is passed through the data setHFinding the complex barrel B corresponding to the point p, and then generating a key pair<npk,nsk>I.e. the third key pair, and generates a third ciphertext delta(B)'=H(B,nsk)。
S3.3: finding a first key pair and a second key pair corresponding to the point to be searched in the first key dictionary, and generating a corresponding first ciphertext and a corresponding second ciphertext;
in particular, by means of the first key dictionary DickeyFinding the first key pair corresponding to the complex barrel B<cpk,csk>And a second key pair<upk,usk>Then generates a first ciphertext delta1 (B)H (B, csk) and a second ciphertext Δ2 (B)=H(B,usk)。
S3.4: re-encrypting the first key pair and the second key pair according to the third key pair to generate a first re-encryption key;
specifically, keys csk and usk are re-encrypted with nsk to generate re-encrypted key rkc→nAnd rku→nI.e. the first re-encryption key.
S3.5: generating a search token according to the first ciphertext, the second ciphertext, the third ciphertext and the first re-encryption key;
specifically, a set { Δ } corresponding to the first ciphertext, the second ciphertext, the third ciphertext, and the first re-encryption key is set1 (B)2 (B)(B)',rkc→n,rku→nAdding the token into a token to form a search token, transmitting the token to a user side, and transmitting the token to a server by the user side for data search and decryption.
S3.6: forming a second key dictionary according to the third key pair and a third ciphertext corresponding to the third key pair;
specifically, the third key pair and its corresponding third ciphertext { B, nsk } are inserted into the dictionary DicskIn (2), a second key dictionary is formed, and the second key dictionary Dic is usedskThere is a local place.
Further, after step S3.6, updating the first key dictionary is also included. Specifically, the second key dictionary Dic is obtainedskThereafter, the key pair is generated again<upk,usk>And updating the first key dictionary DickeyIs composed of
Figure BDA0002460711160000121
S4: performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data; the method comprises the following steps:
s4.1: generating a temporary search dictionary according to the search token;
specifically, in this embodiment, the search token is obtained by the user providing information to be searched (including the number of points to be searched and neighboring points) to the data provider, and uploading the information to the server, and the server receives the token and generates a temporary search dictionary Diccan
S4.2: searching data in the bidirectional dictionary according to the search token, and re-encrypting a search result to obtain a search result data set;
in particular, the first ciphertext delta in the pass token1 (B)In a bidirectional dictionary
Figure BDA0002460711160000122
Find the corresponding set { (Δ)(ID),data)}1By the second cryptogram Delta2 (B)In a bidirectional dictionary
Figure BDA0002460711160000123
Find the corresponding set { (Δ)(ID),data)}2For { (Δ)(ID),data)}1With a first re-encryption key rkc→nRe-encrypting the data to generate data' and converting (delta)(ID)Data') into the set CanSet, Δ is added after all elements are completed1 (B)By substitution of Δ(B)'. For { (Δ)(ID),data)}2The same operation is carried out to obtain a search result data set (delta)(B)',CanSet)。
S4.3: inserting the search result data set into the temporary search dictionary to obtain ciphertext data;
specifically, the dictionary Dic is searched temporarilycanAdding the search result data set (delta)(B)', Canset) to obtain the final ciphertext data dictionary DiccanAnd returned to the user.
In this embodiment, after the search operation is completed, updating the bidirectional dictionary is further included. Specifically, after the data 'is generated by re-encrypting the data with the first re-encryption key, the data' is updated back to the bidirectional dictionary
Figure BDA0002460711160000131
In (1).
Meanwhile, after the search is completed, the data provider needs to update the data of the server. Specifically, for a data point to be updated, first go through DicHFind its corresponding complex bucket, then perform Δ1 (B)=H(B,csk)、Δ2 (B)H (B, usk) and Δ(ID)Pi (ID), and encrypts data, i.e., data is generated by encrypting the data with cpk and upk, respectively1And data2Then the two values are respectively at
Figure BDA0002460711160000132
And updating, and if the ciphertext of the corresponding complex bucket is found, executing corresponding updating operation.
Further, the data provider needs to add, subtract or delete data of the server. In particular, for increasing data points, when a new set of points is to be inserted, the corresponding complex is first found by the previous methodMiscellaneous bucket, then for each point, perform Δ(ID)=π(ID),Δ(B)=H(B,uski),data=PRE.Enc(upkiP) and inserted into a bidirectional dictionary in the server. For deleting data points, true points do not need to be deleted really, and only false points are used for replacing original true points to perform updating operation.
S5: decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set, which specifically comprises:
s5.1: finding a third key pair corresponding to the complex bucket of the points to be searched in the second key dictionary;
s5.2: decrypting the ciphertext data according to the third key to obtain a plaintext point set to be screened;
specifically, after receiving the ciphertext data returned by the server, the client first needs to use the second key dictionary DicskThe key nsk corresponding to the complex bucket B of the point to be searched is found, and then the returned ciphertext data' is decrypted through nsk, so that a corresponding plaintext point set is obtained, and screening and confirmation are performed.
S5.3: if the number of the to-be-screened plaintext point sets is judged to be smaller than a preset search number k, expanding a search range to 2r-1 complex buckets, and repeatedly searching in the bidirectional dictionary until the number of the to-be-screened plaintext point sets is larger than or equal to the preset search number k; wherein r and k are positive integers;
s5.4: and selecting the first k nearest points as a final plaintext point set.
Specifically, when the number of the plaintext point sets is larger than k, directly screening to obtain the first k nearest points; when the number of the obtained plaintext point sets is smaller than k, the number of the obtained plaintext point sets is required to be multiplied by the multiple obtained by dividing the size of the result set (namely the number of the plaintext point sets to be screened), and then the multiple is multiplied by the original searching range to obtain the searching range r of the time, and finally the number of the complex buckets required to be searched is 2 r-1. (assuming that the initial range is r is 1, that is, only the complex bucket where the current search point is located is searched, when r is 2, two buckets are respectively extended to the left and right of the complex bucket where the initial point is located, that is, three complex buckets need to be searched by the above method, and finally, the first k largest points are taken, and so on), and finally, the first k nearest points are selected as the required plaintext point set.
The Monte Carlo sampling-based forward security k-nearest neighbor retrieval method provided by the embodiment ensures the forward security attribute of the ciphertext data uploaded to the server by a data provider by using the proxy re-encryption scheme and the ciphertext data in the bidirectional dictionary storage server. In addition, the proxy re-encryption algorithm provided by the embodiment can also be applied to other algorithms.
Example two
Since the forward secure k-nearest neighbor search method based on monte carlo sampling provided in the first embodiment uses a single LSH function, which may cause a large error in returned nearest neighbors, the present embodiment selects multiple LSHs to implement a more accurate k-nearest neighbor search based on the first embodiment.
Referring to fig. 2, fig. 2 is a schematic diagram of another forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention, including:
step 1: acquiring a data set and respectively preprocessing the data set according to a plurality of LSH functions to obtain a plurality of data set dictionaries;
step 2: performing proxy re-encryption on the plurality of data set dictionaries respectively to obtain a plurality of bidirectional dictionaries;
and step 3: respectively carrying out data search on each bidirectional dictionary to obtain a plurality of groups of ciphertext data;
and 4, step 4: carrying out re-encryption processing on the plurality of groups of ciphertext data to obtain ciphertext data with a plurality of unified keys;
and 5: taking intersection from the ciphertext data with the plurality of unified keys to obtain final ciphertext data;
step 6: and decrypting the final ciphertext data to obtain a plaintext point set.
Specifically, m LSH functions are selected, and only m bidirectional dictionaries need to be maintained at the server according to the method in the first embodiment
Figure BDA0002460711160000151
And maintaining m data set dictionaries Dic at data providersHAnd a first key dictionary Dickey. In the operation of adding and deleting data, as in the method of the first embodiment, only the operations of adding and deleting m and the instance need to be performed respectively.
In k-nearest neighbor searching process, m bidirectional dictionaries are used
Figure BDA0002460711160000152
Since each used encryption key is different, even at the same point, the corresponding data ciphertexts are different, and normal decryption cannot be performed after the intersection is obtained. For this purpose, we need to reuse an honest deletion cloud as middleware medium (which can be created and maintained by users), and we use each two-way dictionary
Figure BDA0002460711160000161
To find out the corresponding ciphertext data set (delta)(ID)Data) is uploaded to the delete cloud, and the Dic needs to be removed firstskFind each corresponding nsk, then generate a new key pair<rpk,rsk>All nsk re-encrypt rsk to generate skn→rAnd then, the cipher text data set searched out by each example is re-encrypted by using the re-encryption key, and the generated cipher text can be decrypted by using the uniform key. Then, for the work of taking intersection from the ciphertext, only pi of each instance is needed-1Upload to delete cloud, then Pair Δ(ID)And decrypting to obtain the id value of the plaintext, and then screening the data to obtain the intersection.
After the screening is completed, if the number n of points in the obtained intersection is less than k, as in the method of the first embodiment, the search range of the complex bucket in each example needs to be expanded by k divided by n (the initial range is r is 1, that is, only the complex bucket where the current search point is located is searched, when r is 2, two buckets are respectively extended to the left and right of the complex bucket where the initial point is located, that is, three complex buckets need to be searched by the method, and finally k maximum points are taken, and so on), the process is repeated until the number of points in the searched intersection is greater than k, the search is ended, and the final plaintext point set is obtained.
The forward secure k-nearest neighbor retrieval method based on monte carlo sampling provided by this embodiment adopts a mode of multiple LSH functions to perform multiple searches and take an intersection, thereby improving the search accuracy. In addition, the forward secure k-nearest neighbor retrieval method based on monte carlo sampling provided by the embodiment can also be applied to higher-dimensional data search.
EXAMPLE III
In this embodiment, a forward secure k-neighbor search system based on monte carlo sampling is provided on the basis of the first embodiment, please refer to fig. 3, where fig. 3 is a schematic structural diagram of a forward secure k-neighbor search system based on monte carlo sampling according to an embodiment of the present invention, and includes: data set supplying means 1, search means 2 and decryption means 3, wherein,
the data set supplying apparatus 1 includes:
the data acquisition module 11 is used for acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary;
the initialization module 12 is configured to encrypt each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary, and transmit the bidirectional dictionary to the search apparatus 2;
the encryption module 13 is configured to find data corresponding to a point to be searched in the data set dictionary and the first key dictionary, perform re-encryption processing to obtain a search token and a second key dictionary, and transmit the search token and the second key dictionary to the decryption device 3;
the searching device 2 comprises a data searching module 21, which is used for searching data in the bidirectional dictionary according to the search token to obtain ciphertext data and transmitting the ciphertext data to the decryption device 3;
the decryption apparatus 3 includes:
a first data receiving module 31, configured to receive the search token and upload the search token to the search module 21 for the search module 21 to perform data search;
and the second data receiving module 32 is configured to receive the second key dictionary and the ciphertext data, and decrypt the ciphertext data according to the second key dictionary to obtain a plaintext point set.
In the present embodiment, the data set supplying apparatus is mainly applied to data providers; the searching device is mainly applied to a server, and the decryption device is mainly applied to a user, namely a client.
The system provided in this embodiment may implement the forward secure k neighbor retrieval method based on monte carlo sampling described in the first embodiment, and details of the process are not described again.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A forward secure k-nearest neighbor retrieval method based on Monte Carlo sampling is characterized by comprising the following steps:
acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets;
encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary;
finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary;
performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data;
and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
2. The forward secure k-nearest neighbor retrieval method of claim 1, wherein obtaining a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary from the complex buckets comprises:
acquiring a data set, and generating the data set into a plurality of uniform hash buckets according to an LSH function based on Monte Carlo sampling;
greedy merging is carried out on the uniform hash buckets, and false points are added to obtain a plurality of complex buckets with the same data volume;
and generating a corresponding data set dictionary according to the complex barrel.
3. The forward secure k-nearest neighbor retrieval method of claim 1, wherein said encrypting each of said complex buckets according to a proxy re-encryption algorithm to obtain a first key dictionary and a bi-directional dictionary comprises:
generating a first key pair, a second key pair and a corresponding encrypted data set by adopting a proxy re-encryption algorithm for each complex bucket;
generating a first key dictionary according to the first key pairs and the second key pairs corresponding to all the complex buckets;
initializing a bidirectional dictionary and storing the encrypted data sets corresponding to all the complex buckets in the bidirectional dictionary.
4. The forward secure k-nearest neighbor retrieval method according to claim 1, wherein the step of finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary and performing re-encryption processing to obtain a search token and a second key dictionary comprises:
acquiring the searching quantity of points to be searched and adjacent points;
finding a complex bucket corresponding to the point to be searched in the data set dictionary, and generating a third key pair and a corresponding third ciphertext;
finding a first key pair and a second key pair corresponding to the point to be searched in the first key dictionary, and generating a corresponding first ciphertext and a corresponding second ciphertext;
re-encrypting the first key pair and the second key pair according to the third key pair to generate a first re-encryption key;
generating a search token according to the first ciphertext, the second ciphertext, the third ciphertext and the first re-encryption key;
and forming a second key dictionary according to the third key pair and the corresponding third ciphertext.
5. The forward secure k-nearest neighbor retrieval method of claim 4, further comprising, after forming a second key dictionary from said third key pair and its corresponding third ciphertext:
updating the first key dictionary.
6. The forward secure k-nearest neighbor retrieval method of claim 1, wherein performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data comprises:
generating a temporary search dictionary according to the search token;
searching data in the bidirectional dictionary according to the search token, and re-encrypting a search result to obtain a search result data set;
and inserting the search result data set into the temporary search dictionary to obtain ciphertext data.
7. The method of claim 6, wherein after searching the bidirectional dictionary for data according to the search token and re-encrypting the search result to obtain a search result data set, the method further comprises:
and updating the bidirectional dictionary.
8. The forward secure k-nearest neighbor retrieval method of claim 1, wherein decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set comprises:
finding a third key pair corresponding to the complex bucket of the points to be searched in the second key dictionary;
decrypting the ciphertext data according to the third key to obtain a plaintext point set to be screened;
if the number of the to-be-screened plaintext point sets is judged to be smaller than a preset search number k, expanding a search range to 2r-1 complex buckets, and repeatedly searching in the bidirectional dictionary until the number of the to-be-screened plaintext point sets is larger than or equal to the preset search number k; wherein r and k are positive integers;
and selecting the first k nearest points as a final plaintext point set.
9. A forward secure k-nearest neighbor retrieval method based on Monte Carlo sampling is characterized by comprising the following steps:
acquiring a data set and respectively preprocessing the data set according to a plurality of LSH functions to obtain a plurality of data set dictionaries;
performing proxy re-encryption on the plurality of data set dictionaries respectively to obtain a plurality of bidirectional dictionaries;
respectively carrying out data search on each bidirectional dictionary to obtain a plurality of groups of ciphertext data;
carrying out re-encryption processing on the plurality of groups of ciphertext data to obtain ciphertext data with a plurality of unified keys;
taking intersection from the ciphertext data with the plurality of unified keys to obtain final ciphertext data;
and decrypting the final ciphertext data to obtain a plaintext point set.
10. A forward secure k-nearest neighbor retrieval system based on monte carlo sampling, comprising: data set supplying means (1), search means (2) and decryption means (3), wherein,
the data set supplying apparatus (1) includes:
the data acquisition module (11) is used for acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary;
the initialization module (12) is used for encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary and transmitting the bidirectional dictionary to the search device (2);
an encryption module (13) for finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary, performing re-encryption processing to obtain a search token and a second key dictionary, and transmitting the search token and the second key dictionary to the decryption device (3);
the searching device (2) comprises a data searching module (21) which is used for searching data in the bidirectional dictionary according to the search token to obtain ciphertext data and transmitting the ciphertext data to the decrypting device (3);
the decryption device (3) comprises:
a first data receiving module (31) for receiving the search token and uploading the search token to the search module (21) for data search by the search module (21);
and the second data receiving module (32) is used for receiving the second key dictionary and the ciphertext data and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
CN202010319210.2A 2020-04-21 2020-04-21 Forward safe k neighbor retrieval method and system based on Monte Carlo sampling Active CN111552988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010319210.2A CN111552988B (en) 2020-04-21 2020-04-21 Forward safe k neighbor retrieval method and system based on Monte Carlo sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010319210.2A CN111552988B (en) 2020-04-21 2020-04-21 Forward safe k neighbor retrieval method and system based on Monte Carlo sampling

Publications (2)

Publication Number Publication Date
CN111552988A true CN111552988A (en) 2020-08-18
CN111552988B CN111552988B (en) 2023-05-02

Family

ID=72005827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010319210.2A Active CN111552988B (en) 2020-04-21 2020-04-21 Forward safe k neighbor retrieval method and system based on Monte Carlo sampling

Country Status (1)

Country Link
CN (1) CN111552988B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417073A (en) * 2022-03-28 2022-04-29 之江实验室 Neighbor node query method and device of encryption graph and electronic equipment
CN115733617A (en) * 2022-10-31 2023-03-03 支付宝(杭州)信息技术有限公司 Biological characteristic authentication method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787076A (en) * 2016-03-02 2016-07-20 南京信息工程大学 Monochrome mutual nearest neighbor query processing method for uncertain spatial data
CN108629970A (en) * 2018-04-25 2018-10-09 浙江大学 Intersection signal parameter optimization method based on the search of Monte Carlo tree
CN108959478A (en) * 2018-06-21 2018-12-07 中南林业科技大学 Ciphertext image search method and system under a kind of cloud environment
CN109543061A (en) * 2018-11-16 2019-03-29 西安电子科技大学 A kind of encrypted image search method for supporting multi-key cipher
US20190272344A1 (en) * 2018-03-01 2019-09-05 Yangdi Lu Random draw forest index structure for searching large scale unstructured data
CN110334526A (en) * 2019-05-30 2019-10-15 西安电子科技大学 It is a kind of that the forward secrecy verified is supported to can search for encryption storage system and method
CN110351679A (en) * 2019-04-22 2019-10-18 鲁东大学 A kind of wireless sensor network resource allocation methods based on improvement simulated annealing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787076A (en) * 2016-03-02 2016-07-20 南京信息工程大学 Monochrome mutual nearest neighbor query processing method for uncertain spatial data
US20190272344A1 (en) * 2018-03-01 2019-09-05 Yangdi Lu Random draw forest index structure for searching large scale unstructured data
CN108629970A (en) * 2018-04-25 2018-10-09 浙江大学 Intersection signal parameter optimization method based on the search of Monte Carlo tree
CN108959478A (en) * 2018-06-21 2018-12-07 中南林业科技大学 Ciphertext image search method and system under a kind of cloud environment
CN109543061A (en) * 2018-11-16 2019-03-29 西安电子科技大学 A kind of encrypted image search method for supporting multi-key cipher
CN110351679A (en) * 2019-04-22 2019-10-18 鲁东大学 A kind of wireless sensor network resource allocation methods based on improvement simulated annealing
CN110334526A (en) * 2019-05-30 2019-10-15 西安电子科技大学 It is a kind of that the forward secrecy verified is supported to can search for encryption storage system and method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WANQI LIU 等: "I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space", 《2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)》 *
YANGUO PENG 等: "Towards Secure Approximate k -Nearest Neighbor Query Over Encrypted High-Dimensional Data", 《 IEEE ACCESS》 *
吴瑾 等: "基于局部敏感哈希的安全相似性查询方案", 《密码学报》 *
翟建峰: "云环境下安全密文检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417073A (en) * 2022-03-28 2022-04-29 之江实验室 Neighbor node query method and device of encryption graph and electronic equipment
CN115733617A (en) * 2022-10-31 2023-03-03 支付宝(杭州)信息技术有限公司 Biological characteristic authentication method and system
CN115733617B (en) * 2022-10-31 2024-01-23 支付宝(杭州)信息技术有限公司 Biological feature authentication method and system

Also Published As

Publication number Publication date
CN111552988B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN109660555B (en) Content secure sharing method and system based on proxy re-encryption
Chai et al. Preserving privacy while revealing thumbnail for content-based encrypted image retrieval in the cloud
Salam et al. Implementation of searchable symmetric encryption for privacy-preserving keyword search on cloud storage
CN108768951B (en) Data encryption and retrieval method for protecting file privacy in cloud environment
CN112671802B (en) Data sharing method and system based on oblivious transmission protocol
US20090138698A1 (en) Method of searching encrypted data using inner product operation and terminal and server therefor
CN109361644B (en) Fuzzy attribute based encryption method supporting rapid search and decryption
CN111541679B (en) Image security retrieval method based on secret sharing in cloud environment
CN104780161A (en) Searchable encryption method supporting multiple users in cloud storage
CN108111587B (en) Cloud storage searching method based on time release
CN108491184B (en) Entropy source acquisition method of random number generator, computer equipment and storage medium
CN110866135B (en) Response length hiding-based k-NN image retrieval method and system
US20180278414A1 (en) Encrypted data sharing with a hierarchical key structure
CN114417073B (en) Neighbor node query method and device of encryption graph and electronic equipment
CN111552988A (en) Monte Carlo sampling-based forward safety k neighbor retrieval method and system
CN112685753A (en) Method and equipment for storing encrypted data
CN115459967A (en) Ciphertext database query method and system based on searchable encryption
Li et al. Fully homomorphic encryption with table lookup for privacy-preserving smart grid
CN113132345B (en) Agent privacy set intersection method with searchable function
Hoang et al. A multi-server oblivious dynamic searchable encryption framework
CN117786751A (en) Symmetrical searchable encryption method, device, equipment and medium
CN113630250A (en) Model training method and system based on data encryption
CN108920968B (en) File searchable encryption method based on connection keywords
CN109672525B (en) Searchable public key encryption method and system with forward index
CN109409111B (en) Encrypted image-oriented fuzzy search method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Peng Yanguo

Inventor after: Wang Tengyu

Inventor after: Hou Yongchao

Inventor after: Lv Zhen

Inventor after: Wang Long

Inventor after: Li Xin

Inventor before: Peng Yanguo

Inventor before: Wang Tengyu

Inventor before: Lv Zhen

Inventor before: Wang Long

Inventor before: Li Xin

GR01 Patent grant
GR01 Patent grant