CN102945282B - Based on the method and system of the safe nearest neighbor of isometric Data Placement - Google Patents

Based on the method and system of the safe nearest neighbor of isometric Data Placement Download PDF

Info

Publication number
CN102945282B
CN102945282B CN201210465692.8A CN201210465692A CN102945282B CN 102945282 B CN102945282 B CN 102945282B CN 201210465692 A CN201210465692 A CN 201210465692A CN 102945282 B CN102945282 B CN 102945282B
Authority
CN
China
Prior art keywords
data
division
border
point
query point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210465692.8A
Other languages
Chinese (zh)
Other versions
CN102945282A (en
Inventor
姚斌
李飞飞
肖小奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201210465692.8A priority Critical patent/CN102945282B/en
Publication of CN102945282A publication Critical patent/CN102945282A/en
Application granted granted Critical
Publication of CN102945282B publication Critical patent/CN102945282B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention relates to a kind of method and system of the safe nearest neighbor based on isometric Data Placement, described method comprises: the voronoi figure that data chief commander comprises Outsourced database is partitioned into k division, the border that record divides, random bytes is added in division, and according to the hash function preset, corresponding index is set up to each border, and the index of all divisions after encryption and correspondence thereof is sent to server, all borders dividing correspondence are sent to data user; The index of the division correspondence comprising true query point is sent to server by data user; Server sends the division comprising true query point after encryption to data user; The division comprising described true query point after data user obtains encryption, and calculate arest neighbors after deciphering, when carrying out K-NN search in data user is to the Outsourced database that server stores, server cannot be known and guarantee data security data, query point and Query Result in Outsourced database.

Description

Based on the method and system of the safe nearest neighbor of isometric Data Placement
Technical field
The present invention relates to Safety query process field, a kind of method and system of the safe nearest neighbor based on isometric Data Placement.
Background technology
The basic SQL that the existing research of Safety query process field relates on encrypting database is inquired about (see document 3:H.Hacigumus, B.R.Iyer, C.Li, and S.Mehrotra.Executing SQL overencrypted data in the database service provider model.In SIGMOD, 2002), aggregate query is (see document 4:H.Hacigumus, B.R.Iyer, and S.Mehrotra. execution ofaggregation queries over encrypted relational databases.In DASFAA, pages 125 – 136, 2004 and document 5:E.Mykletun and G.Tsudik.Aggregation queries in thedatabase-as-a-service model.In DBSec, 2006) and range query (see document 6:B.Hore, S.Mehrotra, M.Canim, and M.Kantarcioglu.Secure multidimensional range queriesover outsourced data.VLDBJ.To Appear. and document 7:E.Shi, J.Bethencourt, H.T.-H.Chan, D.X.Song, and A.Perrig.Multi-dimensional range query over encrypted data.In IEEE Symposium on Security and Privacy, pages 350-364, 2007).As many existing research (see document 1:H.Hu, J.Xu, C.Ren, and B.Choi.Processing private queriesover untrusted data cloud through privacy homomorphism.In ICDE, pages 601-612, 2011 and document 2:W.K.Wong, D.W.-L.Cheung, B.Kao, and N.Mamoulis.Secureknn computation on encrypted databases.In SIGMOD, pages 139 – 152, 2009 and document 6 and document 7) prove, for meeting certain security requirement and obtaining higher efficiency, more complicated query type often needs some special processings.Especially, for the research work (see document 1 and document 2) done of existing many forefathers now of SNN problem, but the solution that they propose finally often is proved to be unsafe, can by success attack easily.
The people such as Hacigumus first proposed " Outsourced database " (outsourced database, ODB) model is (see document 8:H.Hacigumus, B.R.Iyer, and S.Mehrotra.Providing database as aservice.In ICDE, 2002), in this model, " data management " and " inquiry response " two service outsourcing are given insecure service provider (service provider) by data owner (data owner).Safety research about ODB is intended to guarantee data security by encrypting and carry out query processing in enciphered data.Such as, use a kind of order-preserving enciphered method (order-preserving encryption scheme, OPES, see document 9:R.Agrawal, J.Kiernan, R.Srikant, and Y.Xu.Order preservingencryption for numeric data.In SIGMOD, 2002), to an ordinal number territory (ordinal domain) utility function E, make, to arbitrary the value x to meeting x<y, y, have E (x) <E (y).In addition, the people such as Hacigumus also proposed one and add and take advantage of homomorphism (additive and multiplicative homomorphic) encryption function E(E to meet E (x)+E (y)=E (x+y), E (x) E (y)=E (xy)) support that aggregate query in enciphered data is (see document 4:H.Hacigumus, B.R.Iyer, and S. executio n ofaggregation queries over encrypted relational databases.In DASFAA, pages 125-136,2004).But as the people such as Mykletun prove, in fact other safety of homomorphism method lowermost level all can not ensure (see document 5).In brief, ODB model before all only considers simple Numerical Range and SQL operation, and does not consider with kNN(k nearest neighbor, k arest neighbors) inquiry waits and more complicated is operating as research object; In addition, the attack of single type is always supposed in ODB model investigation, and does not consider the attack of different levels, does not have universality.
Except various encryption technology, other data guard method is also had to ensure the security of query count.SQL statement on ODB model performs and just have employed " gross index " (coarse index, also known as " index based on bucket ", bucket-based index) technology (see document 3).Tuple is encrypted by the common encryption method of such as RSA; By each Database Properties regional partition, the every part (i.e. " division ", partition) obtained after segmentation is composed by hash function with an ID.Data chief commander encrypts the ID that tuple splits together with its place and delivers to server, serves as " gross index ".Inquiry becomes the segmentation obtaining and comprise target tuple.Server then returns the superset of a Query Result.Then, result just can be deciphered by the user having key, then falls wherein garbage by certain aftertreatment screening.In advanced inquiry, the quantity of garbage may be very huge, and for user, this can become very white elephant.Such as, the ID that in kNN calculating, required data point and query point spacing are just difficult to by splitting obtains easily.Therefore, directly application this " gross index " technology can cause server that whole database is returned to user, allows user bear alone the calculating of Query Result.Obviously, when user's processing power limited (as user uses mobile device) this method is very inapplicable.
Another kind of Safety query disposal route make use of special hardware---and the coprocessor of safety is (see document 10:E.Mykletun and G.Tsudik.Incorporating a secure coprocessor in thedatabase-as-a-service model.In IWIA, 2005 and document 11:R.Agrawal, D.Asonov, M.Kantarcioglu, and Y.Li.Sovereignjoins.In ICDE, 2006).It is a kind of safe computing unit, and the data of its computation process and storage are all transparent to either party in inquiry.The use of coprocessor is very simple, only needs to install encryption and decryption key, and direct application deployment logic.But on the other hand, its speed, not as ordinary processor, is not therefore suitable for and needs a large amount of complicated applications calculated.In addition, coprocessor must be safeguarded by user.Such as, if processor hang-up, user must dispose it again.This obviously with user in cloud computing without the need in person safeguarding that raw data is contradiction.
In addition; Sweeney; Li; the people such as Machanavajjhala propose various data anonymous model; as k anonymity (k-anonymity); for the secret protection (see document 12:L.Sweeney.k-anonymity:A model for protecting privacy.In IJUFKS, 2002) during data publication.Their basic thought is all that each tuple in database can not be distinguished (indistinguishable) with other at least k-1 tuple " standard " indications (quasi-identifiers)." k is anonymous " can pass through " standard " indications vague generalization (generalizing), tuple suppresses (suppressing tuples), tuple upsets methods such as (perturbing tuples) and realizes.But " k is anonymous " model has information dropout in query script, and model itself also has specific defect.Pointed by the people such as Machanavajjhala, " standard " indications group that can not distinguish of anonymization is also containing many responsive values, therefore assailant just can cause information leakage (see document 13:A.Machanavajjhala by limited background knowledge, J.Gehrke, D.Kifer, and M.Venkitasubramaniam.l-diversity:Privacy beyond k-anonymity.In ICDE, 2006).In addition, the codomain after vague generalization also can facilitate the statistical information of potential attacker to raw data or some preciousnesses to make accurate estimation.It is especially noted that secret protection during data publication is different from the target of the data security in ODB model: the information that the former tries hard to avoid issuing exposes particular individual, the latter focuses on for unauthorized user protection information.
For solving Safety query process problem better, kNN inquiry (see document 2) in the primary studies such as W.K.Wong SCONEDB model.Oliveira etc. once proposed " equidistantly change " (distance-preservingtransformation, DPT) as its encryption method (see document 14:S.R.M.Oliveira and O.R.Zaiane.Privacy preserving clustering by data transformation.In SBBD, Manaus, Amazonas, Brazil, 2003).Set point x is converted to Nx+t by DPT, and wherein, N is the orthogonal matrix of a d × d, and t is a d dimensional vector.The key property of DPT is that before and after conversion, dot spacing is constant, that is, d (x, y)=d (E (x), E (y)), and wherein, d represents Euclidean distance, and E is encryption (conversion) function.Because distance does not change, kNN inquiry can obtain correct calculating.But it is unsafe [8] that the testimony of a witnesies such as Liu obtain DPT about 2 grades of attacks and 3 grades of attacks.For one 3 grades of attacks, W.K.Wong etc. examined one group of point in DB x1, x2 ..., xm} and corresponding secret value thereof y1, y2 ..., ym}, then sets up out one group of equation yi=Nxi+t, forms linear equality group, and wherein, the d of d2 and t of N is unknown.So, if m>=d+1, then this equation set can be separated.For one 2 grades of attacks, one group of some P in the visible DB of assailant.Because DPT remains the correlativity between each dimension, Liu etc. use PCA to determine in point set P and change the major component in rear the data obtained storehouse.By coupling major component, assailant can make accurate estimation (see document 8) to N and t.
It is also the problem that " sex service of suitable ground " (LBS) system needs to consider that kNN on unreliable platform calculates.In LBS model, server has a tuple set (being also " point of interest " point of interest, POI).With user orientation server submit Query (range query or kNN inquiry), obtain the point of interest wanted.Wherein main Security Target is the positional information of protection query point, and other model also can consider the privacy concern of POI." k is anonymous " model is often used, and so that the position of query point is converted to a spatial dimension, so at least contain the point of other k-1 in this scope, server is then difficult to the position determining user's (query point) wherein.Although this model can with solving our problem, it also has certain defect.First, the data after anonymity can expose raw value approx; Secondly, (see document 15:G.Ghinita in particular model, P.Kalnis, A.Khoshgozaran, C.Shahabi, and K.L.Tan.Privatequeries in location based services:Anonymizers are not necessary.In SIGMOD, 2008), database is assumed that server owns, and therefore server can see raw data; Again, in some systems (such as " gross index " system), the superset that server returns Query Result usually carries out aftertreatment for user, and this adds burden for users, for some " lightweight " (light-weight) user sides, this or even can not bear.The people such as Khoshgozaran propose a LBS model can inquiring about to be encrypted for kNN (see document 16:A.Khoshgozaran and C.Shahabi.Blind evaluation ofnearest neighbor queries using space transformation to preserve location privacy.InSSTD, 2007).Its main thought uses Hilbert curve to carry out " encryption " data point and inquiry.The Hilbert value of each point is sent to server.Then in the space of the rear gained of Hilbert conversion, calculate kNN and draw approximation.This method is approximation except what return, also there is the problem that DPT is similar, easily by success attack.
Summary of the invention
The object of the present invention is to provide a kind of method and system of the safe nearest neighbor based on isometric Data Placement, when can carry out K-NN search in data user is to the Outsourced database that server stores, server cannot be known and guarantee data security data, the query point of data user and the Query Result of arest neighbors in Outsourced database.
For solving the problem, the invention provides a kind of method of the safe nearest neighbor based on isometric Data Placement, comprising:
The main generation of data comprises the voronoi figure of all data points of Outsourced database, and wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer;
Data user or the main given parameters K of data, described voronoi figure is partitioned into k division according to described parameter k by data main root, record each border dividing correspondence, wherein, each division is mutually disjointed, the data point part that different demarcation comprises repeats or does not repeat completely, and k is more than or equal to 1 and is less than or equal to N;
The byte number comprising the division of an at most data point in all divisions of the main acquisition of data is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number;
Data main root sets up corresponding index according to the hash function preset to each border, and according to a cryptographic algorithm preset, index corresponding to all divisions after encryption and all with corresponding border thereof is sent to server stores;
All corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and the described hash functions of dividing of data chief commander send to described data user to store;
Described data user determines true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server;
Described server sends the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user;
The division comprising described true query point after the encryption received is decrypted according to described decipherment algorithm by described data user, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point.
Further, in the above-mentioned methods, described Outsourced database is one to three-dimensional Outsourced database.
Further, in the above-mentioned methods, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.
Further, in the above-mentioned methods, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.
Further, in the above-mentioned methods, described voronoi figure to be partitioned in k the step divided according to described parameter k by data main root, and described voronoi figure is partitioned into the foursquare grid of the size such as k, wherein, k is a square number.
According to another side of the present invention, a kind of system of the safe nearest neighbor based on isometric Data Placement is provided, comprises:
Data master, for given described parameter k, generate the voronoi figure comprising all data points of Outsourced database, wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer; According to parameter k, described voronoi figure is partitioned into k division, record each border dividing correspondence, wherein, each division is mutually disjointed, and the data point part that different demarcation comprises repeats or do not repeat completely, and k is more than or equal to 1 and is less than or equal to N; The byte number obtaining in all divisions the division comprising an at most data point is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number; According to the hash function preset, corresponding index is set up to each border, and to be sent to by index corresponding for the border of all divisions after encryption and all correspondences thereof index corresponding to server stores and all with corresponding border thereof to send to server stores according to a cryptographic algorithm preset; Described data user is sent to store all corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and described hash functions of dividing;
Data user, for given described parameter k, determine true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server; According to described decipherment algorithm, the division comprising described true query point after the encryption received is decrypted, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point;
Server, for sending the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user.According to the hash function preset, corresponding index is set up to each border, and to be sent to by index corresponding for the border of all divisions after encryption and all correspondences thereof index corresponding to server stores and all with corresponding border thereof to send to server stores according to a cryptographic algorithm preset.
Further, in said system, described Outsourced database is one to three-dimensional Outsourced database.
Further, in said system, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.
Further, in said system, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.
Further, in said system, voronoi figure described in described data chief commander is partitioned into the foursquare grid of the sizes such as k, and wherein, k is a square number.
Compared with prior art, the present invention comprises the voronoi figure of all data points of Outsourced database by the main generation of data, wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer, data user or the main given parameters K of data, described voronoi figure is partitioned into k division according to described parameter k by data main root, record each border dividing correspondence, wherein, each division is mutually disjointed, the data point part that different demarcation comprises repeats or does not repeat completely, and k is more than or equal to 1 and is less than or equal to N, the byte number comprising the division of an at most data point in all divisions of the main acquisition of data is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number, data main root sets up corresponding index according to the hash function preset to each border, and according to a cryptographic algorithm preset, index corresponding to all divisions after encryption and all with corresponding border thereof is sent to server stores, all corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and the described hash functions of dividing of data chief commander send to described data user to store, described data user determines true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server, described server sends the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user, the division comprising described true query point after the encryption received is decrypted described data user according to described decipherment algorithm, the division comprising described true query point after the encryption received is decrypted, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point, when can carry out K-NN search in data user is to the Outsourced database that server stores, make server cannot know data in Outsourced database, the query point of data user and the Query Result of arest neighbors, guarantee data security.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the method for the safe nearest neighbor based on isometric Data Placement of one embodiment of the invention;
Fig. 2 is the division schematic diagram of the method for the safe nearest neighbor based on isometric Data Placement of one embodiment of the invention;
Fig. 3 divides schematic diagram under the one-dimensional space of one embodiment of the invention;
Fig. 4 is the division schematic diagram of the MinCs method of one embodiment of the invention;
Fig. 5 a be under the MinCs division methods of one embodiment of the invention k to the effect diagram of time division cost;
Fig. 5 b is under the MinCs division methods of one embodiment of the invention | D| is to the effect diagram of time division cost;
Fig. 6 a is that the mean value of the division size of one embodiment of the invention, maximal value and minimum value are with parameter k situation of change figure;
Fig. 6 b is the mean value of the division size of one embodiment of the invention, maximal value and minimum value with | D| situation of change figure
Fig. 7 a is that the total run time of the MinCs division methods of one embodiment of the invention is with parameter k situation of change figure;
Fig. 7 b is that the total run time of the MinCs division methods of one embodiment of the invention is with | D| situation of change figure;
Fig. 8 a is under the MinCs division methods of one embodiment of the invention | E (D) | with k situation of change figure;
Fig. 8 b is under the MinCs division methods of one embodiment of the invention | E (D) | with | D| situation of change figure;
Fig. 9 a be under the MinCs division methods of one embodiment of the invention query communication cost with k situation of change figure;
Fig. 9 b be under the MinCs division methods of one embodiment of the invention query communication cost with | D| situation of change figure;
Figure 10 a be under the MinCs division methods of one embodiment of the invention query time with | D| situation of change figure;
Figure 10 b be under the MinCs division methods of one embodiment of the invention query time with k situation of change figure;
Figure 11 is the high-level schematic functional block diagram of the system of the safe nearest neighbor based on isometric Data Placement of one embodiment of the invention.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Along with " cloud computing concept and application thereof day by day universal, " Safety query problem " for the encrypted data set E (D) on " cloud " obtains increasing concern.Namely the present invention has carried out comparatively deep research to " safe arest neighbors " (secure nearest neighbor, SNN) problem wherein; This problem relates to data master (data owner), data user (client) and server (server) tripartite, data user can send inquiry ciphertext (encrypted query to server, E (q)) obtain the ciphertext of the nearest data point of query point in E (D) (i.e. " arest neighbors "), but will ensure to allow server know the particular content of data and inquiry.Specifically, SNN problem relates to data master, data user and server tripartite and its corresponding action of described tripartite:
(1) data master: have and tie up by d the Outsourced database D that European point or object form, and D can be contracted out to server not exclusively reliably.
(2) data user: need to inquire about described Outsourced database D.
(3) server: not exclusively reliable, can spy upon the data content from data master and the query contents from data user because of self reason or third party's reason.
Arest neighbors (NN can be carried out in outsourcing database D to allow data user, nearest neighbor), inquiry, but server can not be allowed to know data in the Outsourced database D of data master, the true inquiry of data user and the specifying information of Query Result, data are main must be applied certain cryptographic algorithm E to outsourcing database D and be encrypted, the ciphertext of D can be represented with E (D), represent corresponding decipherment algorithm with E-1.Similar, its query point q(should specifically be represented a query point by data user) encryption obtains E (q) and sends to server, here need to try hard to ensure that the cryptographic algorithm E that SNN method is instigated with data in security is the same, that is, all E are proved to be the attack model of safety (can not differentiate under chosen-plain attact, or can not differentiate under chosen ciphertext attacks) in SNN method.
In fact can prove, under the condition of only given E (q) and E (D), server can not find query point arest neighbors accurately.If but do not require that server provides the precise positioning of SNN Query Result, and only need provide Query Result one " roughly locating "? a very simple directly inquiry processing method might as well be imagined: data chief commander Outsourced database D regards as after an entirety is encrypted and obtains E (D), sends server to; And whole E (D) can be returned to data user as Query Result by server; Data user uses E again -1(tentation data master and data user's sharing E are decrypted to E (D) -1) obtain D, thus the result of SNN inquiry can be gone out at local computing---the method for this " simplicity " can be called " (directly) transmits D " (Send-D method) algorithm.Obviously, this algorithm has the security identical with E, but very poor efficiency.So on this basis, as shown in Figure 1, further provide more efficient SNN disposal route, namely the method (secure voronoi diagram, SVD) of a kind of safe nearest neighbor based on isometric Data Placement provided by the invention, comprises pretreatment stage step S1 ~ S5, inquiry phase comprises step S6 ~ S8, step S6 ~ the S8 of inquiry phase can need constantly to repeat according to actual queries, wherein
Step S1, the main generation of data comprises all data point p of Outsourced database D ivoronoi figure, wherein, every data point p ibyte number identical, the number of the data point in Outsourced database is N, N is positive integer.Owing to will produce k the division of Outsourced database D in subsequent step S2, a key issue is exactly (may exist overlapping) " division " G (D)={ G how producing Outsourced database D here 1..., G k, the target considered has two: (1) SNN Query Result (arest neighbors of query point) at least should be contained in a certain division, and (2) data user can determine G by quantity of information little as far as possible ito reduce Query Cost, therefore naturally select to build G (D) with the voronoi figure based on Outsourced database D, wherein each data point p exactly for one irepresent with a voronoi cell.
The present invention needs for following four problems:
Problem one: it is too large that border P (D) describes space cost with voronoi cell.Such as, when described Outsourced database D be two dimension and d=2 time, each voronoi cell is the convex polygon of an arbitrary shape, and these polygons on average have 6 summits.Therefore, so represent that storage space shared by border P (D) is by than storing the much bigger of Outsourced database D itself.
Problem two: set up index how to border P (D) and promptly determine required original index value i to enable data user.This problem can seem particularly outstanding when element has irregular obstacle body in border P (D), such as: work as B j(now B when ∈ P (D) is a voronoi cell jnamely be arbitrary shape convex polygon).
Problem three: ensure | E (G i) |=| E (G j) | (i ≠ j), server like this just can not distinguish them according to the size cases dividing (ciphertext), and then understands dividing condition.In any secure cryptographic algorithm E, we need to ensure this point, therefore need to be obligated G (D) | G i| b=| G j| b(i ≠ j), wherein | G i| brepresent and divide G ibyte number (caution area divide represent G ithe number of middle data point is | G i|).
Problem four: data user goes out to meet nn (q, D) ∈ G at local computing ioriginal index value i after, original index value i directly should not delivered to server and obtain E (G i), can not know any information about dividing condition in query processing process by Deterministic service device like this.If data user only sends the ciphertext of original index value i to server, so this problem is not just present in; But this means that server must find E (G by the ciphertext of original index value i i).
Step S2, data user or the main given parameters K of data, described voronoi is schemed G (D) according to described parameter k and is partitioned into k division G by data main root 1..., G k, record each border P (D)={ B dividing correspondence 1..., B k, wherein, each division is mutually disjointed, and the data point part that different demarcation comprises repeats or do not repeat completely, and k is more than or equal to 1 and is less than or equal to N.Concrete, according to the parameter k that user is given, D is divided into k part, each part is like this called one " division " (allow to intersect between each division, namely each division can contain identical data point), namely has G (D)={ G 1..., G k, then by E (D)={ E (G 1) ..., E (G k) be sent to described server.In this process, " geometrical boundary " P (D)={ B to each division is also needed 1..., B k(wherein B ifor dividing G igeometrical boundary) store, such information is under guarantee enough describes out the prerequisite of dividing condition, and it is the smaller the better that institute takes up space that yes.
Here a kind of egregious cases is first considered: if make k=N(N be the number of the data point of data set D, namely | D|), so each G inamely by single data point p ithe set that ∈ D is formed, and border P (D) is the set of the voronoi cell of D.If p ivoronoi cell be vc i(p ifor vc icomprised), so given arbitary inquiry point q, data user need search the voronoi cell comprising q; Assuming that then data user need obtain E (G to server request i); Here nn (q, D)=p is obviously had iand G i={ p i, and algorithm returns nn (q, E -1(E (G i)))=nn (q, { p i)=p i, be visible as correct result.The general situation above-mentioned thought being promoted the use of k<<N can be considered.
Step S3, the byte number comprising the division of an at most data point in all divisions of the main acquisition of data is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number.Concrete, tentation data master has generated G (D) and P (D) deals with problems three in order to above-mentioned, and main need of data ensures | G i|=| G j| (i ≠ j).If G xfor having the division of an at most data point (under be called " size ", size) in G (D), namely for any i ∈ [1, k], have | G i|≤| G x|.G is divided for each i(i ≠ x), to its add (| G x| b-| G i| b) individual random bytes, here available characters * represent any random bytes with G iin actual data point carry out distinguishing that (these random characters of * representative all can not at G imiddle reality occurs), this process can be claimed for " filling up operation at random ".Obviously, " fill up operation at random " through this, for dividing G arbitrarily i, have | G x| b=| G i| b.Like this, in subsequent step S4, no matter data instigator generates { E (G by which kind of secure cryptographic algorithm 1) ..., E (G k), have | E (G i) | b=| E (G j) | b(i ≠ j), thus Deterministic service device cannot distinguish certain division arbitrarily in G (D), thus recognize dividing condition.
Step S4, data main root sets up corresponding index according to the hash function preset to each border, and according to a cryptographic algorithm preset, index corresponding to all divisions after encryption and all with corresponding border thereof is sent to server stores.Concrete, in order to solve the problem four, the secret random Harsh function g:[1 of the main application of data one, N] → Z+, E (D) the most at last={ (E (g (1)), E (G1)), ..., (E (g (k)), E (Gk)) } be distributed to server, by P (D), E -1data user is distributed to g.
Step S5, all corresponding border P (D), the decipherment algorithm corresponding with described cryptographic algorithm and the described hash functions of dividing of data chief commander send to described data user to store.P (D) is stored in data user's end, like this for arbitary inquiry point q, data user can determine that the original index value i of border P (D) meets nn (q, D) ∈ G efficiently i, the index then dividing the correspondence on corresponding border according to hash function is filed a request to server and is got E (G i), this process can be carried out when server does not know concrete original i value completely, so can not recognize dividing condition in the process replying inquiry by Deterministic service device; Finally, data user easily can draw nn (q, D)=nn (q, E naturally -1(E (G i))).
Step S6, described data user determines true query point q, the border of the correspondence of the division comprising described true query point is determined according to described true query point, the corresponding index with the corresponding border of the division comprising described true query point is obtained according to described hash function, and the index E (g (j)) comprising the correspondence on the border of the correspondence of the division of described true query point is sent to server, namely data user is by E (D)={ (E (g (1)), E (G 1)) ..., (E (g (k)), E (G k)) send server to.
Step S7, described server sends the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user.Concrete, when after the index that the border of all divisions after described server receives encryption and all correspondences thereof is corresponding, can directly by all divisions after encryption and set up corresponding relation to index just.
Step S8, the division comprising described true query point after the encryption received is decrypted according to described decipherment algorithm by described data user, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point.
Step S6 ~ the S8 of above-mentioned inquiry phase can need constantly to repeat according to actual queries, until the query point input do not had.
Preferably, described Outsourced database is one to three-dimensional Outsourced database.
Preferably, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.Concrete, in order to one and the problem two of dealing with problems, each division in regulation G (D) must have regular shape.As shown in Figure 3, one " optimal case " is there is under one dimension situation, it can produce size balance and mutually disjoint division, this is because the voronoi figure of one-dimensional data point is be made up of continuous and disjoint interval, for generating the division of " complete equipilibrium " (i.e. equal and opposite in direction), only need find its 1/k in D, ..., (k-1)/k quantile, then produce G (D) with them, the border of voronoi cell corresponding to these quantiles and ± ∞ determine P (D)={ B 1..., B k.
Above-mentioned search (k-1) individual quantile and by them to the journey of the mistake that the voronoi figure of D divides by having scanned the once linear of D.Therefore, the IO(I/O of this algorithm in out-of-core models) cost is O (N log N).Because each division size is | D|/k=N/k, therefore G isize is | E (N/k) |, so E (D) size is k|E (N/k) |; And obviously the size of P (D) is O (k).
Preferably, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.Concrete, in order to one and the problem two of solving the problem, each division in regulation G (D) must have regular shape, be exactly that each division must be limited in " grid " (box) surrounded by the limit parallel with Y-coordinate axle (X-axis or Y-axis) with X-coordinate axle specifically.But, this means to divide G iboundary B imay comprise or run through multiple voronoi cell, as shown in Figure 2, wherein dotted line is the boundary B divided 1and B 2, represent different pieces of information point p 1~ p 16convex polygon be voronoi cell.
In order to solve the problem two, if D mid point p icorresponding voronoi cell is vc i, for ensureing that data user can meet nn (q, D) ∈ G for P (D) produce index and light determining efficiently iindex value i, following division principle can be determined: establish G igeometrical boundary by B irepresent, so
Principle one: Bi is one " grid " that surrounded by the limit parallel with Y-coordinate axle with X-coordinate axle;
Principle two: the border of namely different in G (D) its correspondences of division is mutually disjoint;
Principle three: if B icomprise completely or intersect at a voronoi cell collection V i, so G i={ p j| v cj∈ V i, but different G imay containing the data point repeated.As in fig. 2, the data point inside dark voronoi cell can be joined simultaneously divides G 1and G 2in, i.e. G 1={ p 1, p 2, p 3, p 4, p 5, p 6, p 7, p 8, p 10, G 2={ p 5, p 6, p 7, p 8, p 10, p 11, p 12, p 13, p 14, p 15, p 16, p 9.
Lemma 1. as shown in Figure 2, according to above-mentioned division principle, if nn (q, D)=p iand q ∈ B j, then p is had i∈ G j.
Prove: if there is query point q ∈ B j, namely q belongs to B jthe region confined, so q must belong to one by B jthe voronoi cell comprised completely or run through.According to above-mentioned division principle, this voronoi cell must belong to V j, and V jdetermine G jelement.Suppose that q is included in v ciin (then have nn (q, D)=p i), as from the foregoing, v ci∈ Vj, then p i∈ G j.
From above-mentioned lemma 1, in step S6, for arbitary inquiry point q, the work of data user's end is exactly find to meet q ∈ B jb j∈ P (D), namely finds a grid comprising q in P (D), and this is actually the process of " some position enquiring " (a point location query).Because the grid in P (D) is all disjoint, but the space of the voronoi figure of whole Outsourced database D can be covered, therefore have and only have a grid to comprise a q; In addition, because the limit of these grid is all parallel with Y-coordinate axle with X-coordinate axle, therefore data user can be P (D) produce index i easily.Afterwards, data user just can obtain E (G to server request j), this is because accused, as long as q is comprised in B by described lemma 1 jin, so just there is nn (q, D) ∈ G j.
By step S7, once E (G j) serviced device returns, in step S8, data user just can by E -1to E (G j) deciphering obtain G j; Afterwards, data user, by identifying random character *, can will data master pass through " filling up operation at random " and will be added on G before like a cork jin random bytes sequence remove; Finally, data user can obtain nn (q, D)=nn (q, G j).
But do not forgotten, data user is obtaining E (G to server request j) time need to deal with problems four, according to the hash function preset, corresponding index is set up to each border by step S4 data main root, and after all divisions after encryption and all indexes of correspondence thereof are sent to server stores, in step s 6, data user is allowed to send E (g (j)) to server.At server end in step S7, the E (g (i)) that data master sends and E (G i) there is pair relationhip, server, after receiving the E (D) that data master sends, just can be set up a Hash table T recorded containing k, E (g (i)) is mapped to E (G i).So, given request E (g (i)) of data user, server just by searching in T, can finally find E (G i), the time complexity of this process is only O (1), and therefore server can find E (G by said process by E (g (j)) efficiently j).
Further, a kind of MinCs division methods is provided in the present embodiment, have followed the division principle one to three of above-mentioned proposition, owing to adding " filling up operation at random ", therefore obviously, the communication cost of the present embodiment and the storage cost of server end are by the difference of the size of maximum division and each division, namely | and G x|-| G i|, or more accurately, | E (G x) |-| E (G i) |) determine, that is, in order to reduce the storage cost of communication cost and server end, also should adhere to principled four as much as possible when design partition method: the division generating size " balance " (equal or close) as far as possible.
To be data main root be partitioned into described voronoi figure in K the step divided according to described parameter K MinCs method, and described voronoi figure is partitioned into the foursquare grid of the size such as K, wherein, k is a square number.
MinCs method is the simplest a kind of division methods, D and voronoi diagram root is grid (grid) shape by it, and the geometric size of each grid here in " grid " is identical.It is all parallel with Y-coordinate axle with the X-coordinate axle square lattice in all limits that such division determines element in P (D).Once given P (D), just G (D) can be produced according to the thinking shown in Fig. 2.As shown in Figure 4, wherein k=4; Please note that k is necessary for a square number here.In MinCs method, only need a numerical value---little foursquare length of side l---just can indicate P (D) (supposing that D boundary value is known parameters).
Discuss for simplifying, can with a pair parameter value, { x, y} represent a grid by order from left to right, from the bottom to top.Such as, in the diagram, the grid of most lower-left end is designated as C 1,1, the grid of most upper right side is designated as C 2,2.According to each grid C i, jwith the geometrical boundary B of each division xcorresponding relation, in this example, we can obtain C i , j = B ( i - 1 ) &CenterDot; k + j .
Easily to demonstrate,prove: only need know that l and k(k can be calculated by l and D boundary value) and q point coordinate, data user just can find out the grid at q place with the time cost of O (1).Therefore, in all division methods, the storage cost that this method is held data user is minimum, because P (D) only needs a numerical value l just can represent, so be referred to as " MinCs(Minimum Client Storage) method ".
In more detail, available C Plus Plus realizes above-mentioned MinCs division methods.In the experiment of this realization, Qhull storehouse is used to carry out voronoi division to data set D; Up-to-date Crypto++ storehouse is used to encrypt.The carrying out of testing subsequently is on the Linux machine that is configured to Intel Xeon 3.07GHz CPU, 8GB internal memory.
For two-dimentional Outsourced database D, during experiment, concrete data set employs ten million data point of sampling from California, USA (CA) and Texas (TX) as raw data set, and these data are all from OpenStreetMap project.In CA and TX data centralization, each random selecting 2,000,000 data point as maximum experimental data collection Dmax, and defines the data set of small-scale based on Dmax.What need special one to carry here is, when changing data set size to test the extensibility of the division methods of the present embodiment, can guarantee the subset of small data set always large data sets, be that concrete data point changes the impact brought in D like this, thus monounsaturated monomer reveals | the impact of D|.
As follows to the default setting of the involved parameter of experiment: | D|=106, k=625(|D|, k are respectively the number of data point and last number of partitions); The number acquiescence of data point uses the data from CA; Use AES encryption algorithm to be encrypted, its key size and block size are 256 bits.Here need special one to carry, after testing other cryptographic algorithm, find that the performance of different cryptographic algorithm on the present embodiment forms impact hardly.Therefore any safe public keys or symmetric key encryption algorithm all can be used for realizing the present embodiment, and different cryptographic algorithm realize the present embodiment performance all can by description of test.Finally bright, in all experiments, unless specifically stated otherwise, when certain parameter being studied as variable, other parameters are default value.
Specific experiment result is as follows:
1. pretreatment stage
At pretreatment stage, data main side need carry out dividing and encrypt two work, and they are all mainly by number of partitions k and data set size | the impact of D|.To be respectively shown in Fig. 5 a, 5b under MinCs method k and | D| is on the impact of working time.In fig 5 a, clearly the time cost of MinCs is almost a constant had nothing to do with k, this is because the simplified and traditional degree of the partition process of MinCs itself has nothing to do with k size.MinCs is actually the simplest direct a kind of division methods; What is worth mentioning, it is by 1, and 000,000 point is divided into 1, and 225 divisions only need 22 seconds.
What Fig. 5 b showed is size of data | D| changes (from 250,000 to 2,000,000) impact on time division cost.Obviously, the processing time of MinCs is linear with N=|D|.
The division G (D) produced under MinCs method below=and G1 ..., the size of Gk} (carry out " filling up operation at random " front).Because the size of all divisions can increase to equally large with maximum division by filling up random bytes by " filling up operation at random ", therefore following two numerical value are most important for the performance of the division methods of assessment the present embodiment: the size of maximum division | G x| with (| G x|-| G i|) variance (i ∈ [1, k]).| G x| determine the storage cost of server end and the communication cost of each inquiry; (| G x|-| G i|) variance determine the cost filling up operation itself.In order to be presented to simple, intuitive in one drawing by these numerical value, Fig. 6 a, 6b respectively illustrate the mean value dividing size (avg partition size), maximal value | G x|=max i ∈ [1, k]| G i| and minimum value | G y|=min i ∈ [1, k]| G i| with parameter k and | the situation of change of D|.
As shown in Fig. 6 a, 6b, in MinCs division methods, the mean value of its division size and maximal value increase with k and all successively decrease.
Below by Fig. 7 a, 7b total run time (total runningtime) in pretreatment stage MinCs method.So-called total run time, concrete containing divide and encryption two steps time (voronoi time division and " filling up operation at random " time are also included, but their relatively divide and encryption times smaller).In Fig. 7 a, 7b, also add the pretreatment time of described Send-D method to do reference, the pretreatment time of Send-D method is exactly D is regarded as the time that an entirety is encrypted.
In addition, from Fig. 7 a, 7b, T.T. of MinCs method pretreatment stage with k or | it is all linear increase that D| increases.
The size of the final E (D) produced again, this affects server end storage cost and the data master key factor to server communication cost.Cross after filling up operation at random, each division is provided with the size identical with maximum division, therefore, | E (D) |=k|E (Gx) |=k|E (Gi) | (i ∈ [1, k]).
Under Fig. 8 a, 8b respectively illustrate MinCs method | E (D) | (size of E (D)) with k or | the situation of D| change.Be similar to the discussion for Fig. 7 a, 7b, also the size of D and the size (i.e. the cost of Send-D) that D is integrally encrypted the E (D) obtained added in Fig. 8 a, 8b to do reference.Obviously, in MinCs method the size of E (D) with k or | it is all linear increase that D| increases.E (D) size that Send-D is corresponding also with | D| linear increase, but has nothing to do with k.Nature, relatively directly transmit the plaintext of Outsourced database D itself, MinCs method can introduce the communication cost of data master to server and the storage cost of server end.
The storage cost that data user holds is the size depending on P (D), and this cost is O (1) in MinCs method.
Finally need statement, observe in an experiment and use which kind of data set (CA data set or TX data set) to the experimental result difference that almost has nothing obvious, therefore for simplicity, the experimental result on TX data set is not discussed here.
2. query processing cost
First, to arbitary inquiry point q, adopt the method for the present embodiment, the communication cost that server is held to data user only depends on | E (G j) |.But as the above-mentioned analysis to Fig. 8 a, 8b result, owing to having carried out filling up operation at random, therefore each division has had identical size, and | E (G i) |=| E (D) |/k(i ∈ [1, k]).On the contrary, in Send-D method, server to the communication cost that data user holds is exactly the size to the encryption of data set D entirety, namely | and E (D as one message) |.Therefore, although | E (D as one message) | generate than MinCs method | E (D) | much smaller (as shown in Fig. 8 a, 8b), as shown in Fig. 9 a, 9b, the query communication cost (query communication) that in the method for the present embodiment, server is held to data user is still more much smaller than Send-D.
Below, then data user's query processing cost of holding.Each experiment, has carried out 100 inquiries all at random, has then obtained the average handling time of the MinCs method as shown in Figure 10 a, 10b.The query time (query time) that Figure 10 a shows MinCs method increases with k and successively decreases, and this is obviously diminish because k increase result in division.Compare Send-D, the performance of MinCs method is far better.Shown in Figure 10 b, and work as | when D| increases, the query time linear increase all thereupon of MinCs method.
The efficiency of algorithm of the present embodiment comprises the space-time cost of pretreatment stage and the Query Cost of inquiry phase, and the space-time cost of described pretreatment stage comprises time cost and storage cost, and the Query Cost of inquiry phase comprises time cost and communication cost:
Time cost when 1.SVD algorithm carries out pre-service is mainly reflected in following three stages:
(1) the voronoi figure of D is obtained;
(2) D is divided;
(3) E (D) is generated.
For the Outsourced database of a peacekeeping two dimension, the cost in stage (1) is O (NlogN).
And in (2) stage (dividing D), under one dimension situation, after data being carried out to sequence, required quantile can be obtained by once traveling through obviously, therefore the cost in stage 2 is also O (NlogN); Under two-dimensional case, the cost in this stage depends on the division methods selected by us.The cost of MinCs method is O (N).
The stage cost of (3) is linear with encryption expenses.To suppose by cryptographic algorithm E to the cost that information m is encrypted to be e (m); Because the size of each division increases to maximum by " filling up operation at random ", the time complexity that therefore can generate E (D) is O (ke (| Gx|b)), wherein G x = arg max G i &Element; G ( D ) | G i | .
2. the storage cost of pretreatment stage
The storage cost of server end is | E (D) | and, be O (k|E (Gx) |).
The space of storage cost shared by P (D) and index i of data user's end.For most index structure (as kd tree, R tree, Kd-Trees etc.) of the same type, the size of i and | P (D) | linear correlation, therefore the storage cost of data user's end is O (| P (D) |); As for P (D), under one dimension situation, P (D) only contains (k-1) individual numerical value, therefore | P (D) |=k-1; Under two-dimensional case, | P (D) | determined by selected division methods.MinCs method | P (D) | be O (1).
3. the Query Cost of inquiry phase
The time cost of inquiry phase is mainly reflected in two ends: data user's end and server end.Wherein, data user need comprise the B of query point q by the index search of P (D) i∈ P (D), wherein any B iall that the d surrounded by the limit being parallel to X-coordinate axle and Y-coordinate axle ties up (one to three-dimensional) grid; Owing to ensureing that any two grid in P (D) can not intersect, and a grid must be had to comprise q, therefore this search actual be one export size be that 1(result has and only has one) typical case's " some position enquiring " process; Under a two-dimensional case, the cost of above process is only O (logk).
At server end, given request E (g (j)) of data user, server just finds E (Gj) by inquiry Hash table T, and this process is O (1).
Single communication cost in system is | E (D) | with | P (D) |, this can draw the discussion of storage cost naturally by above us.Inquiry communication cost be | E (g (j)) |+| E (Gj) |, or | E (g (j)) |+| E (Gx) |, or | E (g (j)) |+| E (Gj) |/k.
In the security of the present embodiment, because data user in the present embodiment is only E (D)={ (E (g (1)), E (G1)), ..., (E (g (k)), E (Gk)) } send server to, and in query processing process, only have E (g (j)) to be visible to server, we can demonstrate,prove to obtain following theorem 1 thus.
Theorem 1. supposes that E is that certain is proved to be safe cryptographic algorithm in standard security model M (e.g., IND-CPA), and so in M, svd algorithm and E have identical security.
Prove: in whole processing procedure, server can only see the E (D) from data master and E (g (the j)) random series from data user, therefore, server can only recognize division number k.Because " filling up operation at random " ensure that | E (G j) |=| E (G i) | (i ≠ j), if E is safe in M, so obviously, server can not be recognized about dividing G arbitrarily iboundary information.Moreover, because random Harsh function g:[1, N] → Z +not known to server, therefore server can not restore original index value i when only given E (g (j)), also namely, (E (g (i), E (G in server there is no telling E (D) i)) right original index value i.
To sum up, when the present embodiment can carry out K-NN search in data user is to the Outsourced database that server stores, server cannot be known and guarantee data security data, the query point of data user and the Query Result of arest neighbors in Outsourced database.
As shown in figure 11, the present invention also provides the another kind of system based on the safe nearest neighbor of isometric Data Placement, comprises data master 1, data user 2 and server 3.
Data master 1, for given described parameter k, generate the voronoi figure comprising all data points of Outsourced database, wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer; According to parameter k, described voronoi figure is partitioned into k division, record each border dividing correspondence, wherein, each division is mutually disjointed, and the data point part that different demarcation comprises repeats or do not repeat completely, and k is more than or equal to 1 and is less than or equal to N; The byte number obtaining in all divisions the division comprising an at most data point is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number; According to the hash function preset, corresponding index is set up to each border, and to be sent to by index corresponding for the border of all divisions after encryption and all correspondences thereof index corresponding to server stores and all with corresponding border thereof to send to server stores according to a cryptographic algorithm preset; Described data user is sent to store all corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and described hash functions of dividing.Wherein, described parameter k by data master 1 or data user 2 given.
Data user 2, for given described parameter k, determine true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server; According to described decipherment algorithm, the division comprising described true query point after the encryption received is decrypted, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point.
Server 3, for sending the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user.
Preferably, described Outsourced database is one to three-dimensional Outsourced database.
Preferably, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.
Preferably, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.
Wherein, voronoi figure described in described data chief commander is partitioned into the foursquare grid of the sizes such as k, and wherein, k is a square number.
In sum, the present invention comprises the voronoi figure of all data points of Outsourced database by the main generation of data, wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer; Data user or the main given parameters K of data, described voronoi figure is partitioned into k division according to described parameter k by data main root, record each border dividing correspondence, wherein, each division is mutually disjointed, the data point part that different demarcation comprises repeats or does not repeat completely, and k is more than or equal to 1 and is less than or equal to N; The byte number comprising the division of an at most data point in all divisions of the main acquisition of data is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number; Data main root sets up corresponding index according to the hash function preset to each border, and all indexes of all divisions after encryption and correspondence thereof are sent to server stores; All corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and the described hash functions of dividing of data chief commander send to described data user to store; Described data user determines true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server; Described server sends the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user; The division comprising described true query point after the encryption received is decrypted according to described decipherment algorithm by described data user, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point, when can carry out K-NN search in data user is to the Outsourced database that server stores, server cannot be known and guarantee data security data, the query point of data user and the Query Result of arest neighbors in Outsourced database.
In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar portion mutually see.For system disclosed in embodiment, owing to corresponding to the method disclosed in Example, so description is fairly simple, relevant part illustrates see method part.
Professional can also recognize further, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with electronic hardware, computer software or the combination of the two, in order to the interchangeability of hardware and software is clearly described, generally describe composition and the step of each example in the above description according to function.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.
Obviously, those skilled in the art can carry out various change and modification to invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (10)

1. based on a method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, comprising:
The main generation of data comprises the voronoi figure of all data points of Outsourced database, and wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer;
Data user or the main given parameters K of data, described voronoi figure is partitioned into k division according to described parameter k by data main root, record each border dividing correspondence, wherein, each division is mutually disjointed, the data point part that different demarcation comprises repeats or does not repeat completely, and k is more than or equal to 1 and is less than or equal to N;
The byte number comprising the division of an at most data point in all divisions of the main acquisition of data is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number;
Data main root sets up corresponding index according to the hash function preset to each border, and according to a cryptographic algorithm preset, index corresponding to all divisions after encryption and all with corresponding border thereof is sent to server stores;
All corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and the described hash functions of dividing of data chief commander send to described data user to store;
Described data user determines true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server;
Described server sends the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user;
The division comprising described true query point after the encryption received is decrypted according to described decipherment algorithm by described data user, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point.
2. as claimed in claim 1 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, described Outsourced database is one to three-dimensional Outsourced database.
3., as claimed in claim 2 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.
4. as claimed in claim 2 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.
5. as claimed in claim 4 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, described voronoi figure is partitioned in k the step divided according to described parameter k by data main root, described voronoi figure is partitioned into the foursquare grid of the sizes such as k, wherein, k is a square number.
6. based on a system for the safe nearest neighbor of isometric Data Placement, it is characterized in that, comprising:
Data master, for given parameters k, generate the voronoi figure comprising all data points of Outsourced database, wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer; According to parameter k, described voronoi figure is partitioned into k division, record each border dividing correspondence, wherein, each division is mutually disjointed, and the data point part that different demarcation comprises repeats or do not repeat completely, and k is more than or equal to 1 and is less than or equal to N; The byte number obtaining in all divisions the division comprising an at most data point is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number; According to the hash function preset, corresponding index is set up to each border, and to be sent to by index corresponding for the border of all divisions after encryption and all correspondences thereof index corresponding to server stores and all with corresponding border thereof to send to server stores according to a cryptographic algorithm preset; Described data user is sent to store all corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and described hash functions of dividing;
Data user, for given described parameter k, determine true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server; According to described decipherment algorithm, the division comprising described true query point after the encryption received is decrypted, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point;
Server, for sending the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user.
7. as claimed in claim 6 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, described Outsourced database is one to three-dimensional Outsourced database.
8., as claimed in claim 7 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.
9. as claimed in claim 7 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.
10. as claimed in claim 9 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, voronoi figure described in described data chief commander is partitioned into the foursquare grid of the size such as k, and wherein, k is a square number.
CN201210465692.8A 2012-11-16 2012-11-16 Based on the method and system of the safe nearest neighbor of isometric Data Placement Expired - Fee Related CN102945282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210465692.8A CN102945282B (en) 2012-11-16 2012-11-16 Based on the method and system of the safe nearest neighbor of isometric Data Placement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210465692.8A CN102945282B (en) 2012-11-16 2012-11-16 Based on the method and system of the safe nearest neighbor of isometric Data Placement

Publications (2)

Publication Number Publication Date
CN102945282A CN102945282A (en) 2013-02-27
CN102945282B true CN102945282B (en) 2015-09-16

Family

ID=47728226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210465692.8A Expired - Fee Related CN102945282B (en) 2012-11-16 2012-11-16 Based on the method and system of the safe nearest neighbor of isometric Data Placement

Country Status (1)

Country Link
CN (1) CN102945282B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533680B (en) * 2017-01-22 2017-12-12 安徽大学 Quantum neighbor query method for protecting position privacy
CN113407799B (en) * 2021-06-22 2024-09-03 深圳大学 Performance measurement method, device and related equipment for measuring space division boundary

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692231A (en) * 2009-01-14 2010-04-07 中国科学院地理科学与资源研究所 Remote sensing image block sorting and storing method suitable for spatial query

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692231A (en) * 2009-01-14 2010-04-07 中国科学院地理科学与资源研究所 Remote sensing image block sorting and storing method suitable for spatial query

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Reverse Furthest Neighbors in Spatial Databases;Bin Yao 等;《www.cs.fsu.edu/~yao/papers/RFN-icde2009.pdf》;20091231;全文 *
基于空间填充曲线网格划分的最近邻查询算法;徐红波 等;《计算机科学》;20100131;第37卷(第1期);全文 *

Also Published As

Publication number Publication date
CN102945282A (en) 2013-02-27

Similar Documents

Publication Publication Date Title
US11973889B2 (en) Searchable encrypted data sharing method and system based on blockchain and homomorphic encryption
CN102930051B (en) Based on the method and system of isometric division with random safe nearest neighbor of filling
CN102945281B (en) Based on the method and system of the safe nearest neighbor that maximum data block divides
CN107547525B (en) Privacy protection method for big data query processing
Shen et al. Secure phrase search for intelligent processing of encrypted data in cloud-based IoT
Liu et al. Toward highly secure yet efficient KNN classification scheme on outsourced cloud data
Orencik et al. A practical and secure multi-keyword search method over encrypted cloud data
US9043927B2 (en) Method and apparatus for authenticating location-based services without compromising location privacy
JP6180177B2 (en) Encrypted data inquiry method and system capable of protecting privacy
Hu et al. Authenticating location-based services without compromising location privacy
Su et al. Privacy-preserving top-k spatial keyword queries in untrusted cloud environments
Yao et al. Privacy-preserving search over encrypted personal health record in multi-source cloud
CN105471826A (en) Ciphertext data query method, device and ciphertext query server
Andola et al. Searchable encryption on the cloud: a survey
Hu et al. Private search on key-value stores with hierarchical indexes
Liu et al. Accurate range query with privacy preservation for outsourced location-based service in IOT
Cui et al. Secure range query over encrypted data in outsourced environments
Wang et al. Forward/backward and content private DSSE for spatial keyword queries
CN101859306A (en) Method and equipment for generating blind index table, and united keyword search method and equipment
Cui et al. Secure boolean spatial keyword query with lightweight access control in cloud environments
Wang et al. QuickN: Practical and secure nearest neighbor search on encrypted large-scale data
CN102968475B (en) Secure nearest neighbor query method and system based on minimum redundant data partition
CN102999594B (en) Based on the safe nearest neighbor method and system of maximum division and random data block
CN102945282B (en) Based on the method and system of the safe nearest neighbor of isometric Data Placement
CN102968477B (en) Divide and the safe nearest neighbor method and system of random number based on minimal redundancy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150916

Termination date: 20181116