CN111552988A - Monte Carlo sampling-based forward safety k neighbor retrieval method and system - Google Patents
Monte Carlo sampling-based forward safety k neighbor retrieval method and system Download PDFInfo
- Publication number
- CN111552988A CN111552988A CN202010319210.2A CN202010319210A CN111552988A CN 111552988 A CN111552988 A CN 111552988A CN 202010319210 A CN202010319210 A CN 202010319210A CN 111552988 A CN111552988 A CN 111552988A
- Authority
- CN
- China
- Prior art keywords
- dictionary
- data
- key
- search
- ciphertext
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses a forward safe k neighbor retrieval method and a system based on Monte Carlo sampling, wherein the method comprises the following steps: acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets; encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary; finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary; performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data; and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set. The forward security k neighbor retrieval method provided by the invention ensures the forward security attribute of the ciphertext data uploaded to the server by a data provider by using an encryption scheme of proxy re-encryption and the ciphertext data in the bidirectional dictionary storage server.
Description
Technical Field
The invention belongs to the technical field of data security, and particularly relates to a forward security k nearest neighbor retrieval method and system based on Monte Carlo sampling.
Background
In the era of big data and cloud computing, more and more data providers select to upload data to a cloud server for storage and use the data by a payment method. However, the cloud server is often not trusted, and if the data is uploaded directly without encryption, a large amount of private data of a data provider is leaked. Therefore, the safe and efficient encryption technology applied to scenes with different requirements is designed, and the capacity of dynamically updating the ciphertext data is supported, so that the method is a hotspot of the current cloud data safety research.
The k-nearest neighbor search is an important technology in the current big data field, and has a large application range, such as nearest neighbor search of spatial data, a recommendation system, and the like. Conventional k-nearest neighbor retrieval technologies, such as nearest neighbor graphs and Local Sensitive Hash (LSH), are retrieval technologies in plain text, and when directly uploading to the cloud, a large amount of privacy information of a user is leaked, so that data needs to be encrypted and then uploaded to a server, and k-nearest neighbor retrieval and dynamic updating capabilities of ciphertext data are retained. In solving the problem of k-nearest neighbor search, the LSH method is an important solution, and performs dimensionality reduction on data points through a locality sensitive hash function, and then compresses the data points into corresponding hash buckets, wherein the probability that points closer to each other are compressed into the same hash bucket is higher, and the probability that points farther from each other are compressed into the same hash bucket is lower.
However, although the existing k-nearest neighbor search can ensure that data is encrypted and supports the capabilities of nearest neighbor query and dynamic update, forward security attributes are ignored, that is, a data provider usually allows a paying user to use ciphertext data outsourced to a cloud server by the provider in a paying manner, but when the term of user payment is expired, the newly added data in the server can still be acquired and decrypted by using a token obtained before; furthermore, since data distribution tends to be uneven, this results in a large difference in the number of points within each hash bucket when using the LSH method, thereby reducing data security.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a forward secure k-nearest neighbor search method and system based on monte carlo sampling. The technical problem to be solved by the invention is realized by the following technical scheme:
a forward safe k neighbor retrieval method based on Monte Carlo sampling comprises the following steps:
acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets;
encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary;
finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary;
performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data;
and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
In one embodiment of the present invention, acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary includes:
acquiring a data set, and generating the data set into a plurality of uniform hash buckets according to an LSH function based on Monte Carlo sampling;
greedy merging is carried out on the uniform hash buckets, and false points are added to obtain a plurality of complex buckets with the same data volume;
and generating a corresponding data set dictionary according to the complex barrel.
In an embodiment of the present invention, the encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary includes:
generating a first key pair, a second key pair and a corresponding encrypted data set by adopting a proxy re-encryption algorithm for each complex bucket;
generating a first key dictionary according to the first key pairs and the second key pairs corresponding to all the complex buckets;
initializing a bidirectional dictionary and storing the encrypted data sets corresponding to all the complex buckets in the bidirectional dictionary.
In an embodiment of the present invention, finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary, includes:
acquiring the searching quantity of points to be searched and adjacent points;
finding a complex bucket corresponding to the point to be searched in the data set dictionary, and generating a third key pair and a corresponding third ciphertext;
finding a first key pair and a second key pair corresponding to the point to be searched in the first key dictionary, and generating a corresponding first ciphertext and a corresponding second ciphertext;
re-encrypting the first key pair and the second key pair according to the third key pair to generate a first re-encryption key;
generating a search token according to the first ciphertext, the second ciphertext, the third ciphertext and the first re-encryption key;
and forming a second key dictionary according to the third key pair and the corresponding third ciphertext.
In an embodiment of the present invention, after forming a second key dictionary according to the third key pair and a corresponding third ciphertext thereof, the method further includes:
updating the first key dictionary.
In an embodiment of the present invention, performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data includes:
generating a temporary search dictionary according to the search token;
searching data in the bidirectional dictionary according to the search token, and re-encrypting a search result to obtain a search result data set;
and inserting the search result data set into the temporary search dictionary to obtain ciphertext data.
In an embodiment of the present invention, after searching data in the bidirectional dictionary according to the search token and re-encrypting the search result to obtain a search result data set, the method further includes:
and updating the bidirectional dictionary.
In an embodiment of the present invention, decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set includes:
finding a third key pair corresponding to the complex bucket of the points to be searched in the second key dictionary;
decrypting the ciphertext data according to the third key to obtain a plaintext point set to be screened;
if the number of the to-be-screened plaintext point sets is judged to be smaller than a preset search number k, expanding a search range to 2r-1 complex buckets, and repeatedly searching in the bidirectional dictionary until the number of the to-be-screened plaintext point sets is larger than or equal to the preset search number k; wherein r and k are positive integers;
and selecting the first k nearest points as a final plaintext point set.
Another embodiment of the present invention further provides a forward secure k-nearest neighbor search method based on monte carlo sampling, including:
acquiring a data set and respectively preprocessing the data set according to a plurality of LSH functions to obtain a plurality of data set dictionaries;
performing proxy re-encryption on the plurality of data set dictionaries respectively to obtain a plurality of bidirectional dictionaries;
respectively carrying out data search on each bidirectional dictionary to obtain a plurality of groups of ciphertext data;
carrying out re-encryption processing on the plurality of groups of ciphertext data to obtain ciphertext data with a plurality of unified keys;
taking intersection from the ciphertext data with the plurality of unified keys to obtain final ciphertext data;
and decrypting the final ciphertext data to obtain a plaintext point set.
Yet another embodiment of the present invention further provides a forward secure k-nearest neighbor retrieval system based on monte carlo sampling, comprising: data set supplying means, searching means, and decrypting means, wherein,
the data set supplying apparatus includes:
the data acquisition module is used for acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary;
the initialization module is used for encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary and transmitting the bidirectional dictionary to the searching device;
the encryption module is used for finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary, carrying out re-encryption processing to obtain a search token and a second key dictionary, and simultaneously transmitting the search token and the second key dictionary to the decryption device;
the searching device comprises a data searching module, a decryption device and a searching module, wherein the data searching module is used for searching data in the bidirectional dictionary according to the searching token to obtain ciphertext data and transmitting the ciphertext data to the decryption device;
the decryption apparatus includes:
the first data receiving module is used for receiving the search token and uploading the search token to the search module so as to enable the search module to search data;
and the second data receiving module is used for receiving the second key dictionary and the ciphertext data and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
The invention has the beneficial effects that:
1. the Monte Carlo sampling-based forward security k neighbor retrieval method provided by the invention ensures the forward security attribute of the ciphertext data uploaded to the server by a data provider by using an encryption scheme of proxy re-encryption and the ciphertext data in the bidirectional dictionary storage server;
2. the Monte Carlo sampling-based forward safe k nearest neighbor retrieval method provided by the invention is based on the Monte Carlo sampling technology, the hash values obtained by mapping hash vectors of the sampled points of the data set are uniformly divided, a set of known density functions is designed to be uniformly subjected to local sensitive hash compression, and then complex buckets with the same number of data points are generated by greedy combination and addition of false points, so that the safety of data is further improved;
3. the forward safety k neighbor retrieval method based on Monte Carlo sampling provided by the invention can also be applied to multi-dimensional data, and can ensure the safety of the multi-dimensional data.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
Fig. 1 is a schematic diagram of a forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention;
fig. 2 is a schematic diagram of another forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention
Fig. 3 is a schematic structural diagram of a forward secure k-nearest neighbor retrieval system based on monte carlo sampling according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
Example one
Referring to fig. 1, fig. 1 is a schematic diagram of a forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention, including:
s1: acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets;
further, step S1 includes:
s1.1: acquiring a data set, and generating the data set into a plurality of uniform hash buckets according to an LSH function based on Monte Carlo sampling; the specific operation is as follows:
(1) first, for a given data set, knowing the probability density function as f (x, y), a suitable, easily modeled proposed distribution (e.g., a two-dimensional normal distribution) with probability density p (x, y) is selected, and a value M is large enough that Mp (x, y) ≧ f (x, y);
(2) according to the proposed distribution of the known model described above, a point (x) is randomly sampled0,y0) And from a uniform distribution of U ([0,1 ]]) Sampling to obtain a value u if satisfiedThe sampling point is accepted, otherwise, the sampling point is rejected;
(3) repeatedly sampling according to actual needs until the number of points sampled reaches a proper required amount;
(4) given a code lengthAnd one random vector α, where α survival21 and calculating the threshold value of the locality sensitive hash functionCarrying out p- α operation on each point in the point set obtained by sampling to obtain a projection value corresponding to each point, and then sequencing according to the size of the projection values from small to large;
(5) according to the number of the threshold values of the locality sensitive hash function, the number of the divided intervals is known to beThen dividing the total number of the sampling points by the number of the current interval to obtain the number n of points of each interval of the sampling point set; then taking out n points from small to large according to the sorted projection values, and taking the maximum value of the previous interval as ciThe maximum value in this interval is ci+1And so on to determine each value in the set of limits;
(6) finally, obtaining LSH function based on Monte Carlo sampling according to the threshold valueNumber, i.e. h (p) ═ { i | ci-1<p·α≤ci}; wherein, c1=0。
(7) And generating a plurality of uniform hash buckets with basically same data quantity according to the LSH function.
S1.2: greedy combination is carried out on the uniform hash buckets, false points are added, a plurality of complex buckets with the same data volume are obtained, and corresponding data set dictionaries are generated, and the method specifically comprises the following steps:
(a) firstly, the partial sensitive hash function based on Monte Carlo sampling given above is used to calculate the corresponding hash value for all the points of our data set, wherein the points with equal hash value are stored in the corresponding hash bucketIn (1),wherein p isi~pjIf and only if hi=hj。
(b) Greedy merging of hash buckets is performed to generate complex buckets:
specifically, an LSH mapping dictionary Dic is first createdHComplex bucket for storing each point after mergingThe hash buckets are initialized to be empty sets and arranged from small to large.
Then, taking a value Max containing the maximum number of data points in the hash buckets as a capacity standard of the complex bucket, initializing i to 1, j to 1, and performing the following loop: initialization BiIs empty, if B is presentiIf the number of points in is less than Max, then it will beAll points in the column are added with BiAnd (h) isj,Bi) Joining dictionary DicHIn (1), increasing the value of jEntering the next judgment according to the value corresponding to the number of the midpoints; if at this time BiIf the number of the points in (1) is greater than or equal to Max, a judgment cycle is skipped, and i is added by 1.
When all hash buckets mentioned aboveAfter all the integration into a complex bucket, the cycle is tripped out.
After the integration process is completed, the point number corresponding to the bucket with the maximum number in all the complex buckets needs to be selected, and false points are added to other buckets with the point number lower than the maximum point number by taking the point number as a boundary until the point number reaches the maximum point number, wherein the false points are easily distinguished from real data points, so that a plurality of complex buckets B with the same data volume are obtained.
S1.3: and generating a corresponding data set dictionary according to the complex barrel.
In particular, a complex bucket B containing all data sets is generated into a data set dictionary DicH。
The Monte Carlo sampling-based forward secure k-nearest neighbor retrieval method provided by the embodiment is based on the Monte Carlo sampling technology, the hash values obtained by mapping hash vectors of the sampled points of the data set are uniformly divided, a set of known density functions is designed to perform uniform local sensitive hash compression, the number of the final hash buckets is basically the same, complex buckets with the same number of data points are generated by means of greedy combination and false point adding, and the data security is improved.
S2: encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary;
here, the proxy re-encryption technique and the bidirectional dictionary will be briefly described.
The proxy re-encryption technology mainly comprises six parts:
firstly, initializing a generation parameter prm;
(II) generating a pair of keys: (<pkA,skA>);
(III) Pair Key skATo skBRe-encryption to generate rkA→B;
(IV) passing the plaintext data m through the key pkAEncrypt to generate ciphertext cA;
(V) use the re-encryption key rkA→BTo cACarry out encryption to generate cB;
(VI) decrypting the ciphertext with the corresponding key, say skADecryption of cA,skBDecryption of cBWherein the ciphertext can only be decrypted using the corresponding key.
Stored in the bidirectional dictionary is the set ei=(Δi (B),Δi (ID),datai) And the three elements respectively represent the number of the complex bucket, the id value corresponding to each point and the coordinate value of each point.
Bidirectional dictionaryMainly composed of two parts DicIAnd DicFThe components are respectively used for storage and search, and the specific steps are as follows:
when storing, insert one (delta)i (B),Δi (ID),datai) Concretely, will be (Δ)i (B),datai) Insert into DicIIn, simultaneously handle (delta)i (ID),Δi (B),&(datai) Inserted Dic)FPerforming the following steps;
when searching, the number of the complex bucket is given, which can be at DicITo obtain a corresponding (Δ)(ID)Data), when given the id value of the data, will return the corresponding (Δ)(B),data);
And when updating, the corresponding coordinate value is found according to the id value of the data to update.
In this embodiment, step S2 specifically includes:
s2.1: generating a first key pair, a second key pair and a corresponding encrypted data set by adopting a proxy re-encryption algorithm for each complex bucket;
specifically, a safety parameter 1 is first selectedλGiven the hash function H, the proxy re-encryption algorithm PRE and a permutation pi, the data set has been stored in the corresponding complex bucket B and a data set dictionary Dic is generated, via step S1H。
For each complex barrel BiGenerating two pairs of keys using a proxy re-encryption algorithm<cpki,cski>And<upki,uski>i.e. the first key pair, the second key pair, and simultaneously encrypt all data points to generate a corresponding encrypted data set, wherein the encryption method is Δ(ID)=π(ID),Δ(B)=H(B,cski),data=PRE.Enc(cpki,p)。
S2.2: generating a first key dictionary according to the first key pairs and the second key pairs corresponding to all the complex buckets;
in particular, according to two pairs of keys<cpki,cski>And<upki,uski>generating a key dictionary DickeyFor storing the corresponding key B in each complex bucketi,(<cpki,cski>,<upki,uski>,<npki=null,nski=null>) This key dictionary, the first key dictionary, is stored at the data provider side.
S2.3: initializing a bidirectional dictionary and storing the encrypted data sets corresponding to all the complex buckets in the bidirectional dictionary.
Specifically, a bidirectional dictionary is initialized, and all encrypted data sets in step S2.1 are stored therein to obtain a bidirectional dictionaryAnd uploading the bidirectional dictionary to a server for storage for subsequent searching.
S3: finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary; the method specifically comprises the following steps:
s3.1: acquiring the searching quantity of points to be searched and adjacent points;
specifically, a value of k, that is, the number of neighboring points to be searched, is given, and a point p to be searched is given.
S3.2: finding a complex bucket corresponding to the point to be searched in the data set dictionary, and generating a third key pair and a corresponding third ciphertext;
specifically, first, the dictionary Dic is passed through the data setHFinding the complex barrel B corresponding to the point p, and then generating a key pair<npk,nsk>I.e. the third key pair, and generates a third ciphertext delta(B)'=H(B,nsk)。
S3.3: finding a first key pair and a second key pair corresponding to the point to be searched in the first key dictionary, and generating a corresponding first ciphertext and a corresponding second ciphertext;
in particular, by means of the first key dictionary DickeyFinding the first key pair corresponding to the complex barrel B<cpk,csk>And a second key pair<upk,usk>Then generates a first ciphertext delta1 (B)H (B, csk) and a second ciphertext Δ2 (B)=H(B,usk)。
S3.4: re-encrypting the first key pair and the second key pair according to the third key pair to generate a first re-encryption key;
specifically, keys csk and usk are re-encrypted with nsk to generate re-encrypted key rkc→nAnd rku→nI.e. the first re-encryption key.
S3.5: generating a search token according to the first ciphertext, the second ciphertext, the third ciphertext and the first re-encryption key;
specifically, a set { Δ } corresponding to the first ciphertext, the second ciphertext, the third ciphertext, and the first re-encryption key is set1 (B),Δ2 (B),Δ(B)',rkc→n,rku→nAdding the token into a token to form a search token, transmitting the token to a user side, and transmitting the token to a server by the user side for data search and decryption.
S3.6: forming a second key dictionary according to the third key pair and a third ciphertext corresponding to the third key pair;
specifically, the third key pair and its corresponding third ciphertext { B, nsk } are inserted into the dictionary DicskIn (2), a second key dictionary is formed, and the second key dictionary Dic is usedskThere is a local place.
Further, after step S3.6, updating the first key dictionary is also included. Specifically, the second key dictionary Dic is obtainedskThereafter, the key pair is generated again<upk,usk>And updating the first key dictionary DickeyIs composed of
S4: performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data; the method comprises the following steps:
s4.1: generating a temporary search dictionary according to the search token;
specifically, in this embodiment, the search token is obtained by the user providing information to be searched (including the number of points to be searched and neighboring points) to the data provider, and uploading the information to the server, and the server receives the token and generates a temporary search dictionary Diccan。
S4.2: searching data in the bidirectional dictionary according to the search token, and re-encrypting a search result to obtain a search result data set;
in particular, the first ciphertext delta in the pass token1 (B)In a bidirectional dictionaryFind the corresponding set { (Δ)(ID),data)}1By the second cryptogram Delta2 (B)In a bidirectional dictionaryFind the corresponding set { (Δ)(ID),data)}2For { (Δ)(ID),data)}1With a first re-encryption key rkc→nRe-encrypting the data to generate data' and converting (delta)(ID)Data') into the set CanSet, Δ is added after all elements are completed1 (B)By substitution of Δ(B)'. For { (Δ)(ID),data)}2The same operation is carried out to obtain a search result data set (delta)(B)',CanSet)。
S4.3: inserting the search result data set into the temporary search dictionary to obtain ciphertext data;
specifically, the dictionary Dic is searched temporarilycanAdding the search result data set (delta)(B)', Canset) to obtain the final ciphertext data dictionary DiccanAnd returned to the user.
In this embodiment, after the search operation is completed, updating the bidirectional dictionary is further included. Specifically, after the data 'is generated by re-encrypting the data with the first re-encryption key, the data' is updated back to the bidirectional dictionaryIn (1).
Meanwhile, after the search is completed, the data provider needs to update the data of the server. Specifically, for a data point to be updated, first go through DicHFind its corresponding complex bucket, then perform Δ1 (B)=H(B,csk)、Δ2 (B)H (B, usk) and Δ(ID)Pi (ID), and encrypts data, i.e., data is generated by encrypting the data with cpk and upk, respectively1And data2Then the two values are respectively atAnd updating, and if the ciphertext of the corresponding complex bucket is found, executing corresponding updating operation.
Further, the data provider needs to add, subtract or delete data of the server. In particular, for increasing data points, when a new set of points is to be inserted, the corresponding complex is first found by the previous methodMiscellaneous bucket, then for each point, perform Δ(ID)=π(ID),Δ(B)=H(B,uski),data=PRE.Enc(upkiP) and inserted into a bidirectional dictionary in the server. For deleting data points, true points do not need to be deleted really, and only false points are used for replacing original true points to perform updating operation.
S5: decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set, which specifically comprises:
s5.1: finding a third key pair corresponding to the complex bucket of the points to be searched in the second key dictionary;
s5.2: decrypting the ciphertext data according to the third key to obtain a plaintext point set to be screened;
specifically, after receiving the ciphertext data returned by the server, the client first needs to use the second key dictionary DicskThe key nsk corresponding to the complex bucket B of the point to be searched is found, and then the returned ciphertext data' is decrypted through nsk, so that a corresponding plaintext point set is obtained, and screening and confirmation are performed.
S5.3: if the number of the to-be-screened plaintext point sets is judged to be smaller than a preset search number k, expanding a search range to 2r-1 complex buckets, and repeatedly searching in the bidirectional dictionary until the number of the to-be-screened plaintext point sets is larger than or equal to the preset search number k; wherein r and k are positive integers;
s5.4: and selecting the first k nearest points as a final plaintext point set.
Specifically, when the number of the plaintext point sets is larger than k, directly screening to obtain the first k nearest points; when the number of the obtained plaintext point sets is smaller than k, the number of the obtained plaintext point sets is required to be multiplied by the multiple obtained by dividing the size of the result set (namely the number of the plaintext point sets to be screened), and then the multiple is multiplied by the original searching range to obtain the searching range r of the time, and finally the number of the complex buckets required to be searched is 2 r-1. (assuming that the initial range is r is 1, that is, only the complex bucket where the current search point is located is searched, when r is 2, two buckets are respectively extended to the left and right of the complex bucket where the initial point is located, that is, three complex buckets need to be searched by the above method, and finally, the first k largest points are taken, and so on), and finally, the first k nearest points are selected as the required plaintext point set.
The Monte Carlo sampling-based forward security k-nearest neighbor retrieval method provided by the embodiment ensures the forward security attribute of the ciphertext data uploaded to the server by a data provider by using the proxy re-encryption scheme and the ciphertext data in the bidirectional dictionary storage server. In addition, the proxy re-encryption algorithm provided by the embodiment can also be applied to other algorithms.
Example two
Since the forward secure k-nearest neighbor search method based on monte carlo sampling provided in the first embodiment uses a single LSH function, which may cause a large error in returned nearest neighbors, the present embodiment selects multiple LSHs to implement a more accurate k-nearest neighbor search based on the first embodiment.
Referring to fig. 2, fig. 2 is a schematic diagram of another forward secure k-nearest neighbor search method based on monte carlo sampling according to an embodiment of the present invention, including:
step 1: acquiring a data set and respectively preprocessing the data set according to a plurality of LSH functions to obtain a plurality of data set dictionaries;
step 2: performing proxy re-encryption on the plurality of data set dictionaries respectively to obtain a plurality of bidirectional dictionaries;
and step 3: respectively carrying out data search on each bidirectional dictionary to obtain a plurality of groups of ciphertext data;
and 4, step 4: carrying out re-encryption processing on the plurality of groups of ciphertext data to obtain ciphertext data with a plurality of unified keys;
and 5: taking intersection from the ciphertext data with the plurality of unified keys to obtain final ciphertext data;
step 6: and decrypting the final ciphertext data to obtain a plaintext point set.
Specifically, m LSH functions are selected, and only m bidirectional dictionaries need to be maintained at the server according to the method in the first embodimentAnd maintaining m data set dictionaries Dic at data providersHAnd a first key dictionary Dickey. In the operation of adding and deleting data, as in the method of the first embodiment, only the operations of adding and deleting m and the instance need to be performed respectively.
In k-nearest neighbor searching process, m bidirectional dictionaries are usedSince each used encryption key is different, even at the same point, the corresponding data ciphertexts are different, and normal decryption cannot be performed after the intersection is obtained. For this purpose, we need to reuse an honest deletion cloud as middleware medium (which can be created and maintained by users), and we use each two-way dictionaryTo find out the corresponding ciphertext data set (delta)(ID)Data) is uploaded to the delete cloud, and the Dic needs to be removed firstskFind each corresponding nsk, then generate a new key pair<rpk,rsk>All nsk re-encrypt rsk to generate skn→rAnd then, the cipher text data set searched out by each example is re-encrypted by using the re-encryption key, and the generated cipher text can be decrypted by using the uniform key. Then, for the work of taking intersection from the ciphertext, only pi of each instance is needed-1Upload to delete cloud, then Pair Δ(ID)And decrypting to obtain the id value of the plaintext, and then screening the data to obtain the intersection.
After the screening is completed, if the number n of points in the obtained intersection is less than k, as in the method of the first embodiment, the search range of the complex bucket in each example needs to be expanded by k divided by n (the initial range is r is 1, that is, only the complex bucket where the current search point is located is searched, when r is 2, two buckets are respectively extended to the left and right of the complex bucket where the initial point is located, that is, three complex buckets need to be searched by the method, and finally k maximum points are taken, and so on), the process is repeated until the number of points in the searched intersection is greater than k, the search is ended, and the final plaintext point set is obtained.
The forward secure k-nearest neighbor retrieval method based on monte carlo sampling provided by this embodiment adopts a mode of multiple LSH functions to perform multiple searches and take an intersection, thereby improving the search accuracy. In addition, the forward secure k-nearest neighbor retrieval method based on monte carlo sampling provided by the embodiment can also be applied to higher-dimensional data search.
EXAMPLE III
In this embodiment, a forward secure k-neighbor search system based on monte carlo sampling is provided on the basis of the first embodiment, please refer to fig. 3, where fig. 3 is a schematic structural diagram of a forward secure k-neighbor search system based on monte carlo sampling according to an embodiment of the present invention, and includes: data set supplying means 1, search means 2 and decryption means 3, wherein,
the data set supplying apparatus 1 includes:
the data acquisition module 11 is used for acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary;
the initialization module 12 is configured to encrypt each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary, and transmit the bidirectional dictionary to the search apparatus 2;
the encryption module 13 is configured to find data corresponding to a point to be searched in the data set dictionary and the first key dictionary, perform re-encryption processing to obtain a search token and a second key dictionary, and transmit the search token and the second key dictionary to the decryption device 3;
the searching device 2 comprises a data searching module 21, which is used for searching data in the bidirectional dictionary according to the search token to obtain ciphertext data and transmitting the ciphertext data to the decryption device 3;
the decryption apparatus 3 includes:
a first data receiving module 31, configured to receive the search token and upload the search token to the search module 21 for the search module 21 to perform data search;
and the second data receiving module 32 is configured to receive the second key dictionary and the ciphertext data, and decrypt the ciphertext data according to the second key dictionary to obtain a plaintext point set.
In the present embodiment, the data set supplying apparatus is mainly applied to data providers; the searching device is mainly applied to a server, and the decryption device is mainly applied to a user, namely a client.
The system provided in this embodiment may implement the forward secure k neighbor retrieval method based on monte carlo sampling described in the first embodiment, and details of the process are not described again.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (10)
1. A forward secure k-nearest neighbor retrieval method based on Monte Carlo sampling is characterized by comprising the following steps:
acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary according to the complex buckets;
encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a two-way dictionary;
finding data corresponding to the point to be searched in the data set dictionary and the first key dictionary, and performing re-encryption processing to obtain a search token and a second key dictionary;
performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data;
and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
2. The forward secure k-nearest neighbor retrieval method of claim 1, wherein obtaining a data set and preprocessing the data set to obtain a plurality of complex buckets and generating a data set dictionary from the complex buckets comprises:
acquiring a data set, and generating the data set into a plurality of uniform hash buckets according to an LSH function based on Monte Carlo sampling;
greedy merging is carried out on the uniform hash buckets, and false points are added to obtain a plurality of complex buckets with the same data volume;
and generating a corresponding data set dictionary according to the complex barrel.
3. The forward secure k-nearest neighbor retrieval method of claim 1, wherein said encrypting each of said complex buckets according to a proxy re-encryption algorithm to obtain a first key dictionary and a bi-directional dictionary comprises:
generating a first key pair, a second key pair and a corresponding encrypted data set by adopting a proxy re-encryption algorithm for each complex bucket;
generating a first key dictionary according to the first key pairs and the second key pairs corresponding to all the complex buckets;
initializing a bidirectional dictionary and storing the encrypted data sets corresponding to all the complex buckets in the bidirectional dictionary.
4. The forward secure k-nearest neighbor retrieval method according to claim 1, wherein the step of finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary and performing re-encryption processing to obtain a search token and a second key dictionary comprises:
acquiring the searching quantity of points to be searched and adjacent points;
finding a complex bucket corresponding to the point to be searched in the data set dictionary, and generating a third key pair and a corresponding third ciphertext;
finding a first key pair and a second key pair corresponding to the point to be searched in the first key dictionary, and generating a corresponding first ciphertext and a corresponding second ciphertext;
re-encrypting the first key pair and the second key pair according to the third key pair to generate a first re-encryption key;
generating a search token according to the first ciphertext, the second ciphertext, the third ciphertext and the first re-encryption key;
and forming a second key dictionary according to the third key pair and the corresponding third ciphertext.
5. The forward secure k-nearest neighbor retrieval method of claim 4, further comprising, after forming a second key dictionary from said third key pair and its corresponding third ciphertext:
updating the first key dictionary.
6. The forward secure k-nearest neighbor retrieval method of claim 1, wherein performing data search in the bidirectional dictionary according to the search token to obtain ciphertext data comprises:
generating a temporary search dictionary according to the search token;
searching data in the bidirectional dictionary according to the search token, and re-encrypting a search result to obtain a search result data set;
and inserting the search result data set into the temporary search dictionary to obtain ciphertext data.
7. The method of claim 6, wherein after searching the bidirectional dictionary for data according to the search token and re-encrypting the search result to obtain a search result data set, the method further comprises:
and updating the bidirectional dictionary.
8. The forward secure k-nearest neighbor retrieval method of claim 1, wherein decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set comprises:
finding a third key pair corresponding to the complex bucket of the points to be searched in the second key dictionary;
decrypting the ciphertext data according to the third key to obtain a plaintext point set to be screened;
if the number of the to-be-screened plaintext point sets is judged to be smaller than a preset search number k, expanding a search range to 2r-1 complex buckets, and repeatedly searching in the bidirectional dictionary until the number of the to-be-screened plaintext point sets is larger than or equal to the preset search number k; wherein r and k are positive integers;
and selecting the first k nearest points as a final plaintext point set.
9. A forward secure k-nearest neighbor retrieval method based on Monte Carlo sampling is characterized by comprising the following steps:
acquiring a data set and respectively preprocessing the data set according to a plurality of LSH functions to obtain a plurality of data set dictionaries;
performing proxy re-encryption on the plurality of data set dictionaries respectively to obtain a plurality of bidirectional dictionaries;
respectively carrying out data search on each bidirectional dictionary to obtain a plurality of groups of ciphertext data;
carrying out re-encryption processing on the plurality of groups of ciphertext data to obtain ciphertext data with a plurality of unified keys;
taking intersection from the ciphertext data with the plurality of unified keys to obtain final ciphertext data;
and decrypting the final ciphertext data to obtain a plaintext point set.
10. A forward secure k-nearest neighbor retrieval system based on monte carlo sampling, comprising: data set supplying means (1), search means (2) and decryption means (3), wherein,
the data set supplying apparatus (1) includes:
the data acquisition module (11) is used for acquiring a data set and preprocessing the data set to obtain a plurality of complex buckets and generate a data set dictionary;
the initialization module (12) is used for encrypting each complex bucket according to a proxy re-encryption algorithm to obtain a first key dictionary and a bidirectional dictionary and transmitting the bidirectional dictionary to the search device (2);
an encryption module (13) for finding data corresponding to a point to be searched in the data set dictionary and the first key dictionary, performing re-encryption processing to obtain a search token and a second key dictionary, and transmitting the search token and the second key dictionary to the decryption device (3);
the searching device (2) comprises a data searching module (21) which is used for searching data in the bidirectional dictionary according to the search token to obtain ciphertext data and transmitting the ciphertext data to the decrypting device (3);
the decryption device (3) comprises:
a first data receiving module (31) for receiving the search token and uploading the search token to the search module (21) for data search by the search module (21);
and the second data receiving module (32) is used for receiving the second key dictionary and the ciphertext data and decrypting the ciphertext data according to the second key dictionary to obtain a plaintext point set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010319210.2A CN111552988B (en) | 2020-04-21 | 2020-04-21 | Forward safe k neighbor retrieval method and system based on Monte Carlo sampling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010319210.2A CN111552988B (en) | 2020-04-21 | 2020-04-21 | Forward safe k neighbor retrieval method and system based on Monte Carlo sampling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111552988A true CN111552988A (en) | 2020-08-18 |
CN111552988B CN111552988B (en) | 2023-05-02 |
Family
ID=72005827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010319210.2A Active CN111552988B (en) | 2020-04-21 | 2020-04-21 | Forward safe k neighbor retrieval method and system based on Monte Carlo sampling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111552988B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114417073A (en) * | 2022-03-28 | 2022-04-29 | 之江实验室 | Neighbor node query method and device of encryption graph and electronic equipment |
CN115733617A (en) * | 2022-10-31 | 2023-03-03 | 支付宝(杭州)信息技术有限公司 | Biological characteristic authentication method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787076A (en) * | 2016-03-02 | 2016-07-20 | 南京信息工程大学 | Monochrome mutual nearest neighbor query processing method for uncertain spatial data |
CN108629970A (en) * | 2018-04-25 | 2018-10-09 | 浙江大学 | Intersection signal parameter optimization method based on the search of Monte Carlo tree |
CN108959478A (en) * | 2018-06-21 | 2018-12-07 | 中南林业科技大学 | Ciphertext image search method and system under a kind of cloud environment |
CN109543061A (en) * | 2018-11-16 | 2019-03-29 | 西安电子科技大学 | A kind of encrypted image search method for supporting multi-key cipher |
US20190272344A1 (en) * | 2018-03-01 | 2019-09-05 | Yangdi Lu | Random draw forest index structure for searching large scale unstructured data |
CN110334526A (en) * | 2019-05-30 | 2019-10-15 | 西安电子科技大学 | It is a kind of that the forward secrecy verified is supported to can search for encryption storage system and method |
CN110351679A (en) * | 2019-04-22 | 2019-10-18 | 鲁东大学 | A kind of wireless sensor network resource allocation methods based on improvement simulated annealing |
-
2020
- 2020-04-21 CN CN202010319210.2A patent/CN111552988B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787076A (en) * | 2016-03-02 | 2016-07-20 | 南京信息工程大学 | Monochrome mutual nearest neighbor query processing method for uncertain spatial data |
US20190272344A1 (en) * | 2018-03-01 | 2019-09-05 | Yangdi Lu | Random draw forest index structure for searching large scale unstructured data |
CN108629970A (en) * | 2018-04-25 | 2018-10-09 | 浙江大学 | Intersection signal parameter optimization method based on the search of Monte Carlo tree |
CN108959478A (en) * | 2018-06-21 | 2018-12-07 | 中南林业科技大学 | Ciphertext image search method and system under a kind of cloud environment |
CN109543061A (en) * | 2018-11-16 | 2019-03-29 | 西安电子科技大学 | A kind of encrypted image search method for supporting multi-key cipher |
CN110351679A (en) * | 2019-04-22 | 2019-10-18 | 鲁东大学 | A kind of wireless sensor network resource allocation methods based on improvement simulated annealing |
CN110334526A (en) * | 2019-05-30 | 2019-10-15 | 西安电子科技大学 | It is a kind of that the forward secrecy verified is supported to can search for encryption storage system and method |
Non-Patent Citations (4)
Title |
---|
WANQI LIU 等: "I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space", 《2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)》 * |
YANGUO PENG 等: "Towards Secure Approximate k -Nearest Neighbor Query Over Encrypted High-Dimensional Data", 《 IEEE ACCESS》 * |
吴瑾 等: "基于局部敏感哈希的安全相似性查询方案", 《密码学报》 * |
翟建峰: "云环境下安全密文检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114417073A (en) * | 2022-03-28 | 2022-04-29 | 之江实验室 | Neighbor node query method and device of encryption graph and electronic equipment |
CN115733617A (en) * | 2022-10-31 | 2023-03-03 | 支付宝(杭州)信息技术有限公司 | Biological characteristic authentication method and system |
CN115733617B (en) * | 2022-10-31 | 2024-01-23 | 支付宝(杭州)信息技术有限公司 | Biological feature authentication method and system |
Also Published As
Publication number | Publication date |
---|---|
CN111552988B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109660555B (en) | Content secure sharing method and system based on proxy re-encryption | |
Chai et al. | Preserving privacy while revealing thumbnail for content-based encrypted image retrieval in the cloud | |
Salam et al. | Implementation of searchable symmetric encryption for privacy-preserving keyword search on cloud storage | |
CN108768951B (en) | Data encryption and retrieval method for protecting file privacy in cloud environment | |
CN112671802B (en) | Data sharing method and system based on oblivious transmission protocol | |
US20090138698A1 (en) | Method of searching encrypted data using inner product operation and terminal and server therefor | |
CN109361644B (en) | Fuzzy attribute based encryption method supporting rapid search and decryption | |
CN111541679B (en) | Image security retrieval method based on secret sharing in cloud environment | |
CN104780161A (en) | Searchable encryption method supporting multiple users in cloud storage | |
CN108111587B (en) | Cloud storage searching method based on time release | |
CN108491184B (en) | Entropy source acquisition method of random number generator, computer equipment and storage medium | |
CN110866135B (en) | Response length hiding-based k-NN image retrieval method and system | |
US20180278414A1 (en) | Encrypted data sharing with a hierarchical key structure | |
CN114417073B (en) | Neighbor node query method and device of encryption graph and electronic equipment | |
CN111552988A (en) | Monte Carlo sampling-based forward safety k neighbor retrieval method and system | |
CN112685753A (en) | Method and equipment for storing encrypted data | |
CN115459967A (en) | Ciphertext database query method and system based on searchable encryption | |
Li et al. | Fully homomorphic encryption with table lookup for privacy-preserving smart grid | |
CN113132345B (en) | Agent privacy set intersection method with searchable function | |
Hoang et al. | A multi-server oblivious dynamic searchable encryption framework | |
CN117786751A (en) | Symmetrical searchable encryption method, device, equipment and medium | |
CN113630250A (en) | Model training method and system based on data encryption | |
CN108920968B (en) | File searchable encryption method based on connection keywords | |
CN109672525B (en) | Searchable public key encryption method and system with forward index | |
CN109409111B (en) | Encrypted image-oriented fuzzy search method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Peng Yanguo Inventor after: Wang Tengyu Inventor after: Hou Yongchao Inventor after: Lv Zhen Inventor after: Wang Long Inventor after: Li Xin Inventor before: Peng Yanguo Inventor before: Wang Tengyu Inventor before: Lv Zhen Inventor before: Wang Long Inventor before: Li Xin |
|
GR01 | Patent grant | ||
GR01 | Patent grant |