CN102945282B

CN102945282B - Based on the method and system of the safe nearest neighbor of isometric Data Placement

Info

Publication number: CN102945282B
Application number: CN201210465692.8A
Authority: CN
Inventors: 姚斌; 李飞飞; 肖小奎
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2012-11-16
Filing date: 2012-11-16
Publication date: 2015-09-16
Anticipated expiration: 2032-11-16
Also published as: CN102945282A

Abstract

The present invention relates to a method and system for secure nearest neighbor query based on equal-length data division. The method includes: the data master divides the Voronoi graph containing the outsourced database into k divisions, records the boundaries of the divisions, and adds random words to the divisions. section, and establish a corresponding index for each boundary according to the preset hash function, and send all the encrypted partitions and their corresponding indexes to the server, and send the corresponding boundaries of all partitions to the data user; the data user will include The index corresponding to the division of the real query point is sent to the server; the server sends the encrypted division containing the real query point to the data user; the data user obtains the encrypted division containing the real query point, and calculates the nearest neighbor after decryption, When the data user performs the nearest neighbor query on the outsourced database stored on the server, the server cannot know the data, query points and query results in the outsourced database, ensuring data security.

Description

Method and system for secure nearest neighbor query based on equal-length data division

技术领域technical field

本发明涉及安全查询处理领域，一种基于等长数据划分的安全最近邻查询的方法及系统。The invention relates to the field of safe query processing, and relates to a method and system for safe nearest neighbor query based on equal-length data division.

背景技术Background technique

安全查询处理领域的现有研究涉及加密数据库上的基本SQL查询（参见文献3：H.Hacigumus,B.R.Iyer,C.Li,and S.Mehrotra.Executing SQL overencrypted data in the database service provider model.In SIGMOD,2002）、聚合查询（参见文献4：H.Hacigumus,B.R.Iyer,and S.Mehrotra.execution ofaggregation queries over encrypted relational databases.In DASFAA,pages 125–136,2004和文献5：E.Mykletun and G.Tsudik.Aggregation queries in thedatabase-as-a-service model.In DBSec,2006）和范围查询（参见文献6：B.Hore,S.Mehrotra,M.Canim,and M.Kantarcioglu.Secure multidimensional range queriesover outsourced data.VLDBJ.To Appear.和文献7：E.Shi,J.Bethencourt,H.T.-H.Chan,D.X.Song,and A.Perrig.Multi-dimensional range query over encrypted data.In IEEE Symposium on Security and Privacy，pages 350-364,2007）。正如诸多现有研究（参见文献1：H.Hu,J.Xu,C.Ren,and B.Choi.Processing private queriesover untrusted data cloud through privacy homomorphism.In ICDE,pages 601-612,2011和文献2：W.K.Wong,D.W.-L.Cheung,B.Kao,and N.Mamoulis.Secureknn computation on encrypted databases.In SIGMOD,pages 139–152,2009和文献6和文献7）所证明的，为满足一定的安全性要求和获得更高的效率，较复杂的查询类型往往需要一些特殊处理。特别的，针对SNN问题现在已有不少前人所做的研究工作（参见文献1和文献2），然而他们所提出的解决方案最后往往被证明是不安全的，可以被轻而易举地攻击成功。Existing research in the area of secure query processing involves basic SQL queries on encrypted databases (see literature 3: H. Hacigumus, BRIyer, C. Li, and S. Mehrotra. Executing SQL overencrypted data in the database service provider model. In SIGMOD, 2002), aggregation query (see literature 4: H.Hacigumus, BRIyer, and S.Mehrotra. execution of aggregation queries over encrypted relational databases. In DASFAA, pages 125–136, 2004 and literature 5: E. Mykletun and G. Tsudik. Aggregation queries in the database-as-a-service model. In DBSec, 2006) and range queries ( See Document 6: B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu. Secure multidimensional range queries over outsourced data. VLDBJ. To Appear. and Document 7: E. Shi, J. Bethencourt, HT-H. Chan , DXSong, and A. Perrig. Multi-dimensional range query over encrypted data. In IEEE Symposium on Security and Privacy, pages 350-364, 2007). As many existing studies (see literature 1: H.Hu, J.Xu, C.Ren, and B.Choi. Processing private queries over untrusted data cloud through privacy homomorphism. In ICDE, pages 601-612, 2011 and literature 2: WKWong, DW-L.Cheung, B.Kao, and N.Mamoulis. Secureknn computation on encrypted databases. In SIGMOD, pages 139–152, 2009 and literature 6 and literature 7) proved that in order to meet certain security requirements And to achieve greater efficiency, more complex query types often require some special handling. In particular, many predecessors have done research work on the SNN problem (see Document 1 and Document 2), but the solutions they propose are often proved to be insecure and can be successfully attacked easily.

Hacigumus等人首先提出了“外包数据库”（outsourced database,ODB）模型（参见文献8：H.Hacigumus,B.R.Iyer,and S.Mehrotra.Providing database as aservice.In ICDE,2002），在这个模型里，数据拥有者（data owner）将“数据管理”及“查询应答”两项服务外包给不可靠的服务提供商（service provider）。关于ODB的安全性研究旨在通过加密及在加密数据上进行查询处理来确保数据安全。例如，使用一种保序加密法（order-preserving encryption scheme,OPES，参见文献9：R.Agrawal,J.Kiernan,R.Srikant,and Y.Xu.Order preservingencryption for numeric data.In SIGMOD,2002），对一序数域（ordinal domain）应用函数E，使得对任一对满足x<y的值x,y，都有E(x)<E(y)。另外，Hacigumus等人还提出了一种加乘同态（additive and multiplicative homomorphic）加密函数E（E满足E(x)+E(y)=E(x+y),E(x)E(y)=E(xy)）来支持加密数据上的聚合查询（参见文献4：H.Hacigumus,B.R.Iyer,and S.executio n ofaggregation queries over encrypted relational databases.In DASFAA,pages 125-136,2004）。然而，正如Mykletun等人所证明的那样，实际上同态法连最低级别的安全都不能保证（参见文献5）。概而言之，之前的ODB模型都仅仅考虑了简单的数值域和SQL操作，而没有考虑以kNN（k nearest neighbor，k最近邻）查询等更复杂的操作为研究对象；另外，ODB模型研究总是假设单一类型的攻击，而没有综合考虑不同层次的攻击，不具有普适性。Hacigumus and others first proposed the "outsourced database" (ODB) model (see literature 8: H. Hacigumus, BRIyer, and S. Mehrotra. Providing database as a service. In ICDE, 2002), in this model, the data The owner (data owner) outsources the two services of "data management" and "query answering" to an unreliable service provider (service provider). Security research on ODB aims to ensure data security through encryption and query processing on encrypted data. For example, use an order-preserving encryption scheme (OPES, see Document 9: R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Order preserving encryption for numeric data. In SIGMOD, 2002) , apply the function E to an ordinal domain, so that for any pair of values x, y satisfying x<y, E(x)<E(y). In addition, Hacigumus et al. also proposed an additive and multiplicative homomorphic (additive and multiplicative homomorphic) encryption function E (E satisfies E(x)+E(y)=E(x+y), E(x)E(y )=E(xy)) to support aggregation queries on encrypted data (see literature 4: H. Hacigumus, BRIyer, and S. execution of aggregation queries over encrypted relational databases. In DASFAA, pages 125-136, 2004). However, as demonstrated by Mykletun et al., in practice homomorphic laws cannot guarantee even the lowest level of security (see ref. 5). In a nutshell, the previous ODB models only considered simple numerical fields and SQL operations, but did not consider more complex operations such as kNN (k nearest neighbor, k nearest neighbor) queries as research objects; in addition, ODB model research Always assume a single type of attack without comprehensive consideration of different levels of attack, which is not universal.

除了各种加密技术，还有其他的数据保护方法来保证查询计算的安全性。ODB模型上的SQL语句执行就采用了“粗索引”（coarse index，亦称“基于桶的索引”，bucket-based index）技术(参见文献3)。元组通过诸如RSA的普通加密法加密；将各数据库属性域分割，分割后得到的每一部分（即一个“划分”，partition）通过哈希函数赋以一ID。数据主将加密元组连同其所在分割的ID送至服务器，充当“粗索引”。查询变为获取包含目标元组的分割。服务器则返回一个查询结果的超集。继而，拥有密钥的用户便可以将结果解密，再通过一定的后处理筛选掉其中无用信息。高级查询中无用信息的数量可能十分巨大，对用户而言这会成为十分沉重的负担。例如，kNN计算中所需的数据点与查询点间距便难以通过分割的ID轻易获得。因此，直接应用这种“粗索引”技术会导致服务器将整个数据库返回给用户，让用户独自负担查询结果的计算。显然，当用户处理能力有限（如用户使用的是移动设备）这种方法很不适用。In addition to various encryption techniques, there are other data protection methods to ensure the security of query calculations. The SQL statement execution on the ODB model uses the "coarse index" (coarse index, also known as "bucket-based index", bucket-based index) technology (see Document 3). The tuple is encrypted by a common encryption method such as RSA; each database attribute field is divided, and each part obtained after division (that is, a "partition", partition) is assigned an ID through a hash function. The data master sends the encrypted tuple together with the ID of its partition to the server to act as a "coarse index". The query becomes fetching the split containing the target tuple. The server returns a superset of the query results. Then, the user who has the key can decrypt the result, and then filter out useless information through certain post-processing. The amount of useless information in an advanced query can be enormous, and it can become very burdensome for the user. For example, the distance between data points and query points required in kNN calculation is difficult to obtain easily through the divided ID. Therefore, directly applying this "coarse index" technology will cause the server to return the entire database to the user, leaving the user alone to bear the calculation of the query results. Obviously, this method is not suitable when the user has limited processing power (such as the user is using a mobile device).

另一种安全查询处理方法利用了特殊的硬件——安全的协处理器（参见文献10：E.Mykletun and G.Tsudik.Incorporating a secure coprocessor in thedatabase-as-a-service model.In IWIA,2005和文献11：R.Agrawal,D.Asonov，M.Kantarcioglu,and Y.Li.Sovereignjoins.In ICDE,2006）。它是一种安全的计算单元，其计算过程及储存的数据对查询中的任一方都是透明的。协处理器的使用很简单，只需要安装上加密和解密密钥，并直接部署应用逻辑即可。然而另一方面，它的速度不如普通处理器，因此不适用于需要大量计算的复杂应用。除此之外，协处理器必须靠用户来维护。例如，如果处理器意外停机，用户必须重新对其进行部署。这显然与云计算中用户本无需亲自维护原始数据是矛盾的。Another secure query processing method utilizes special hardware - a secure coprocessor (see literature 10: E. Mykletun and G. Tsudik. Incorporating a secure coprocessor in the database-as-a-service model. In IWIA, 2005 and Literature 11: R. Agrawal, D. Asonov, M. Kantarcioglu, and Y. Li. Sovereignjoins. In ICDE, 2006). It is a secure computing unit whose computing process and stored data are transparent to any party in the query. The use of the coprocessor is very simple, only need to install the encryption and decryption keys, and directly deploy the application logic. However, on the other hand, it is not as fast as ordinary processors, so it is not suitable for complex applications that require a lot of calculations. In addition, coprocessors must be maintained by the user. For example, if a processor goes down unexpectedly, the user must redeploy it. This is obviously contradictory to the fact that in cloud computing, users do not need to maintain the original data themselves.

另外，Sweeney，Li,Machanavajjhala等人提出了各种各样的数据匿名模型，如k匿名（k-anonymity），用于数据发布时的隐私保护（参见文献12：L.Sweeney.k-anonymity:A model for protecting privacy.In IJUFKS,2002）。他们的基本思想都是使数据库中各元组与另外至少k-1个元组的“准”标示符（quasi-identifiers）是不可辨（indistinguishable）的。“k匿名”可以通过“准”标示符一般化（generalizing）、元组抑制（suppressing tuples）、元组扰乱（perturbing tuples）等方法实现。但是，“k匿名”模型在查询过程中会有信息丢失，并且模型本身还有特定的缺陷。如Machanavajjhala等人所指出的，匿名化的不可辨的“准”标示符组也含有不少敏感值，因此攻击者通过有限的背景知识就可以引起信息泄露（参见文献13：A.Machanavajjhala,J.Gehrke,D.Kifer,and M.Venkitasubramaniam.l-diversity:Privacy beyond k-anonymity.In ICDE,2006）。另外，一般化后的值域也会方便潜在攻击者对原始数据或一些宝贵的统计信息做出准确估计。特别需要注意的是，数据发布时的隐私保护与ODB模型中的数据安全两者的目标是不同的：前者力图避免发布的信息暴露特定个体，后者注重针对非授权用户保护信息。In addition, Sweeney, Li, Machanavajjhala et al. proposed various data anonymity models, such as k-anonymity (k-anonymity), for privacy protection when data is released (see literature 12: L.Sweeney.k-anonymity: A model for protecting privacy. In IJUFKS, 2002). Their basic idea is to make each tuple in the database indistinguishable from the "quasi-identifiers" of at least k-1 other tuples. "k-anonymity" can be achieved by generalizing "quasi" identifiers, suppressing tuples, and perturbing tuples. However, the "k-anonymous" model has information loss during the query process, and the model itself has specific flaws. As pointed out by Machanavajjhala et al., the anonymized non-identifiable "quasi" identifier set also contains many sensitive values, so an attacker with limited background knowledge can cause information leakage (see literature 13: A.Machanavajjhala, J . Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006). In addition, the generalized value domain will also facilitate potential attackers to make accurate estimates of the original data or some valuable statistical information. In particular, it should be noted that the goals of privacy protection when data is released and data security in the ODB model are different: the former tries to prevent the published information from exposing specific individuals, while the latter focuses on protecting information from unauthorized users.

为更好地解决安全查询处理问题，W.K.Wong等重点研究了SCONEDB模型中的kNN查询（参见文献2）。Oliveira等曾提出“等距转换”（distance-preservingtransformation,DPT）作为其加密方法（参见文献14：S.R.M.Oliveira and O.R.Zaiane.Privacy preserving clustering by data transformation.In SBBD,Manaus,Amazonas,Brazil,2003）。DPT将给定点x转换为Nx+t，其中，N是一个d×d的正交矩阵，t是一个d维向量。DPT的主要特性是转换前后点间距不变，即，d(x,y)=d(E(x),E(y))，其中，d表示欧几里得距离，E是加密（转换）函数。由于距离没有改变，kNN查询能够得到正确的计算。然而，Liu等人证得DPT关于2级攻击和3级攻击是不安全的[8]。对于一3级攻击，W.K.Wong等审查了DB中的一组点{x1,x2，...,xm}及其相应的加密值{y1,y2，...,ym}，然后建立出一组等式yi=Nxi+t，形成线性等式组，其中，N的d2和t的d未知。如此，如果m>=d+1,则此等式组可解。对于一2级攻击，攻击者可见DB中的一组点P。由于DPT保留了各维间的相关性，Liu等使用PCA确定点集P中和转换后所得数据库中的主成分。通过匹配主成分，攻击者可以对N和t做出准确估计（参见文献8）。In order to better solve the problem of secure query processing, W.K.Wong et al. focused on the kNN query in SCONEDB model (see literature 2). Oliveira and others proposed "distance-preserving transformation" (distance-preserving transformation, DPT) as its encryption method (see literature 14: S.R.M. Oliveira and O.R. Zaiane. Privacy preserving clustering by data transformation. In SBBD, Manaus, Amazonas, Brazil, 2003). DPT converts a given point x into Nx+t, where N is a d×d orthogonal matrix and t is a d-dimensional vector. The main feature of DPT is that the point spacing remains unchanged before and after conversion, that is, d(x,y)=d(E(x),E(y)), where d represents the Euclidean distance, and E is encryption (conversion) function. Since the distance does not change, the kNN query is computed correctly. However, Liu et al. proved that DPT is insecure with respect to level 2 attacks and level 3 attacks [8]. For a level 3 attack, W.K.Wong et al. reviewed a set of points {x1,x2,...,xm} in the DB and their corresponding encrypted values {y1,y2,...,ym}, and then established a The group equation yi=Nxi+t forms a linear equation group, where d2 of N and d of t are unknown. So, if m>=d+1, then this system of equations can be solved. For a level 2 attack, the attacker sees a set of points P in DB. Since DPT preserves the correlation between dimensions, Liu et al. used PCA to determine the principal components in the point set P and the transformed database. By matching the principal components, the attacker can make accurate estimates of N and t (see literature 8).

不可靠平台上的kNN计算也是“适地性服务”（LBS）系统需要考虑的问题。在LBS模型中，服务器拥有一个元组集（也即“兴趣点”point of interest,POI）。用户向服务器提交查询（范围查询或kNN查询），获取想要的兴趣点。其中主要的安全目标即为保护查询点的位置信息，另外一些模型也会考虑POI的隐私问题。“k匿名”模型是常被使用，以将查询点的位置转换为一个空间范围，如此这一范围中至少包含了其他k-1的点，服务器则难以在其中确定出用户（查询点）的位置。尽管这一模型可以用来解决我们的问题，但它也有一定缺陷。首先，匿名后的数据会近似地暴露出原始数据值；其次，在特定模型中（参见文献15：G.Ghinita,P.Kalnis,A.Khoshgozaran,C.Shahabi,and K.L.Tan.Privatequeries in location based services:Anonymizers are not necessary.In SIGMOD,2008），数据库被假定为服务器所有，因此服务器能看到原始数据；再次，在一些系统（诸如“粗索引”系统）中，服务器通常返回查询结果的超集以供用户进行后处理，这增添了用户负担，对于一些“轻量级”（light-weight）用户端，这甚至是不可承受的。Khoshgozaran等人提出了一个可以为kNN查询进行加密的LBS模型（参见文献16：A.Khoshgozaran and C.Shahabi.Blind evaluation ofnearest neighbor queries using space transformation to preserve location privacy.InSSTD,2007）。其主要思想是使用Hilbert曲线来对数据点和查询进行“加密”。各点的Hilbert值被送往服务器。然后在Hilbert转换后所得的空间中计算kNN得出近似结果。这种方法除了返回的是近似结果，还存在DPT类似的问题，容易被攻击成功。Computation of kNN on unreliable platforms is also an issue that needs to be considered in the "Locality-Based Service" (LBS) system. In the LBS model, the server has a set of tuples (that is, "point of interest" point of interest, POI). The user submits a query (range query or kNN query) to the server to obtain the desired POI. The main security goal is to protect the location information of the query point, and some other models will also consider the privacy of POI. The "k-anonymous" model is often used to convert the location of the query point into a spatial range, such that this range contains at least other k-1 points, and it is difficult for the server to determine the location of the user (query point). Location. Although this model can be used to solve our problem, it has certain drawbacks. First, the anonymized data will approximately expose the original data value; second, in a specific model (see literature 15: G.Ghinita, P.Kalnis, A.Khoshgozaran, C.Shahabi, and K.L.Tan. services:Anonymizers are not necessary.In SIGMOD,2008), the database is assumed to be owned by the server, so the server can see the original data; again, in some systems (such as "coarse indexing" systems), the server usually returns super Sets are used for post-processing by users, which increases the burden on users, and for some "light-weight" clients, this is even unaffordable. Khoshgozaran et al. proposed an LBS model that can be encrypted for kNN queries (see literature 16: A.Khoshgozaran and C.Shahabi. Blind evaluation of nearest neighbor queries using space transformation to preserve location privacy.InSSTD, 2007). The main idea is to use Hilbert curves to "encrypt" data points and queries. The Hilbert value of each point is sent to the server. Then calculate kNN in the space obtained after the Hilbert transformation to get an approximate result. In addition to returning approximate results, this method also has problems similar to DPT, and it is easy to be successfully attacked.

发明内容Contents of the invention

本发明的目的在于提供一种基于等长数据划分的安全最近邻查询的方法及系统，能够在数据用户对服务器上存储的外包数据库中进行最近邻查询时，使服务器无法获知外包数据库中的数据、数据用户的查询点及最近邻的查询结果，保证数据安全。The object of the present invention is to provide a method and system for safe nearest neighbor query based on equal-length data division, which can make the server unable to know the data in the outsourced database when the data user performs the nearest neighbor query on the outsourced database stored on the server , the query point of the data user and the query result of the nearest neighbor to ensure data security.

为解决上述问题，本发明提供一种基于等长数据划分的安全最近邻查询的方法，包括：In order to solve the above problems, the present invention provides a method of safe nearest neighbor query based on equal-length data division, including:

数据主生成包含外包数据库的所有数据点的voronoi图，其中，每个数据点的字节数相同,外包数据库中的数据点的个数为N，N为正整数；The data master generates a Voronoi diagram containing all data points of the outsourced database, where each data point has the same number of bytes, and the number of data points in the outsourced database is N, where N is a positive integer;

数据用户或数据主给定参数K，数据主根据所述参数k将所述voronoi图分割成为k个划分，记录每个划分对应的边界，其中，每个划分互不相交，不同划分包含的数据点部分重复或完全不重复，k大于等于1且小于等于N；The data user or data master gives a parameter K, and the data master divides the Voronoi diagram into k divisions according to the parameter k, and records the boundaries corresponding to each division, wherein each division is disjoint, and the data contained in different divisions Points are partially repeated or not repeated at all, k is greater than or equal to 1 and less than or equal to N;

数据主获取所有划分中包含最多个数据点的划分的字节数作为最长字节数，在除包含最多个数据点的划分之外的每个其它划分中添加随机字节，使除包含最多个数据点的划分之外的每个其它划分的字节数等于所述最长字节数；The data master takes the byte count of the partition containing the most data points among all partitions as the longest byte count, and adds random bytes in every other partition except the partition containing the most The number of bytes of each other division than the division of data points is equal to the longest number of bytes;

数据主根据预设的哈希函数对每个边界建立对应的索引，并根据一预设的加密算法将加密后的所有划分及其所有与相应的边界对应的索引发送给服务器存储；The data master establishes a corresponding index for each boundary according to a preset hash function, and sends all encrypted partitions and all indexes corresponding to corresponding boundaries to the server for storage according to a preset encryption algorithm;

数据主将所有划分对应的边界、与所述加密算法对应的解密算法和所述哈希函数发送给所述数据用户存储；The data master sends the boundaries corresponding to all partitions, the decryption algorithm corresponding to the encryption algorithm, and the hash function to the data user for storage;

所述数据用户确定真实查询点，根据所述真实查询点确定包含所述真实查询点的划分的对应的边界，根据所述哈希函数获取与包含所述真实查询点的划分的对应的边界的对应的索引，并将包含所述真实查询点的划分的对应的边界的对应的索引发送给服务器；The data user determines a real query point, determines a boundary corresponding to a partition containing the real query point according to the real query point, and obtains a boundary corresponding to a partition containing the real query point according to the hash function the corresponding index, and send the corresponding index containing the corresponding boundary of the division of the real query point to the server;

所述服务器根据接收到的包含所述真实查询点的划分的对应的边界的对应的索引向所述数据用户发送对应的加密后的包含所述真实查询点的划分；The server sends the corresponding encrypted partition containing the real query point to the data user according to the received corresponding index of the corresponding boundary of the partition containing the real query point;

所述数据用户根据所述解密算法将接收到的加密后的包含所述真实查询点的划分进行解密，获取包含所述真实查询点的划分，并从包含所述真实查询点的划分中获取所述真实查询点的最近邻的数据点。The data user decrypts the received encrypted partition containing the real query point according to the decryption algorithm, obtains the partition containing the real query point, and obtains all the partitions containing the real query point. The nearest neighbor data point of the true query point.

进一步的，在上述方法中，所述外包数据库为一至三维外包数据库。Further, in the above method, the outsourced database is a one-to-three-dimensional outsourced database.

进一步的，在上述方法中，当所述外包数据库为一维外包数据库时，每个划分的边界为两个相邻数据点之间的垂直平分线。Further, in the above method, when the outsourced database is a one-dimensional outsourced database, the boundary of each division is a vertical bisector between two adjacent data points.

进一步的，在上述方法中，当所述外包数据库为二维外包数据库时，每个划分的边界为由与所述voronoi图的X坐标轴和Y坐标轴平行的直线围成的格子。Further, in the above method, when the outsourced database is a two-dimensional outsourced database, the boundary of each division is a grid surrounded by straight lines parallel to the X-coordinate axis and the Y-coordinate axis of the voronoi diagram.

进一步的，在上述方法中，数据主根据所述参数k将所述voronoi图分割成为k个划分的步骤中，将所述voronoi图分割成为k个等大小的正方形的格子，其中，k为一平方数。Further, in the above method, in the step of dividing the Voronoi diagram into k divisions according to the parameter k, the data master divides the Voronoi diagram into k square grids of equal size, wherein k is one Squares.

根据本发明的另一面，提供一种基于等长数据划分的安全最近邻查询的系统，包括：According to another aspect of the present invention, a system of safe nearest neighbor query based on equal-length data division is provided, including:

数据主，用于给定所述参数k，生成包含外包数据库的所有数据点的voronoi图，其中，每个数据点的字节数相同,外包数据库中的数据点的个数为N，N为正整数；根据参数k将所述voronoi图分割成为k个划分，记录每个划分对应的边界，其中，每个划分互不相交，不同划分包含的数据点部分重复或完全不重复，k大于等于1且小于等于N；获取所有划分中包含最多个数据点的划分的字节数作为最长字节数，在除包含最多个数据点的划分之外的每个其它划分中添加随机字节，使除包含最多个数据点的划分之外的每个其它划分的字节数等于所述最长字节数；根据预设的哈希函数对每个边界建立对应的索引，并根据一预设的加密算法将加密后的所有划分及其所有对应的边界对应的索引发送给服务器存储及其所有与相应的边界对应的索引发送给服务器存储；将所有划分对应的边界、与所述加密算法对应的解密算法和所述哈希函数发送给所述数据用户存储；The data master is used to generate a Voronoi diagram containing all data points of the outsourced database given the parameter k, wherein the number of bytes of each data point is the same, and the number of data points in the outsourced database is N, and N is Positive integer; divide the Voronoi diagram into k divisions according to the parameter k, and record the boundary corresponding to each division, wherein each division does not intersect with each other, and the data points contained in different divisions are partially repeated or not repeated at all, and k is greater than or equal to 1 and less than or equal to N; get the number of bytes of the partition containing the most data points among all partitions as the longest byte count, add random bytes in every other partition except the partition containing the most data points, Make the number of bytes of each other partition except the partition containing the most data points equal to the longest number of bytes; establish a corresponding index for each boundary according to a preset hash function, and according to a preset The encryption algorithm sends all the encrypted partitions and the indexes corresponding to all the corresponding boundaries to the server for storage and sends all the indexes corresponding to the corresponding boundaries to the server for storage; The decryption algorithm and the hash function are sent to the data user for storage;

数据用户，用于给定所述参数k，确定真实查询点，根据所述真实查询点确定包含所述真实查询点的划分的对应的边界，根据所述哈希函数获取与包含所述真实查询点的划分的对应的边界的对应的索引，并将包含所述真实查询点的划分的对应的边界的对应的索引发送给服务器；根据所述解密算法将接收到的加密后的包含所述真实查询点的划分进行解密，获取包含所述真实查询点的划分，并从包含所述真实查询点的划分中获取所述真实查询点的最近邻的数据点；The data user is used to determine the real query point given the parameter k, determine the corresponding boundary of the partition containing the real query point according to the real query point, obtain and contain the real query according to the hash function The corresponding index of the corresponding boundary of the point division, and the corresponding index of the corresponding boundary of the division containing the real query point is sent to the server; according to the decryption algorithm, the received encrypted data containing the real decrypting the division of the query point, obtaining the division containing the real query point, and obtaining the nearest neighbor data point of the real query point from the division containing the real query point;

服务器，用于根据接收到的包含所述真实查询点的划分的对应的边界的对应的索引向所述数据用户发送对应的加密后的包含所述真实查询点的划分。根据预设的哈希函数对每个边界建立对应的索引，并根据一预设的加密算法将加密后的所有划分及其所有对应的边界对应的索引发送给服务器存储及其所有与相应的边界对应的索引发送给服务器存储。The server is configured to send to the data user a corresponding encrypted partition containing the real query point according to the received corresponding index of the corresponding boundary of the partition containing the real query point. Create a corresponding index for each boundary according to a preset hash function, and send all encrypted divisions and the corresponding indexes of all corresponding boundaries to the server for storage and all corresponding boundaries according to a preset encryption algorithm The corresponding index is sent to the server for storage.

进一步的，在上述系统中，所述外包数据库为一至三维外包数据库。Further, in the above system, the outsourced database is a one-to-three-dimensional outsourced database.

进一步的，在上述系统中，当所述外包数据库为一维外包数据库时，每个划分的边界为两个相邻数据点之间的垂直平分线。Further, in the above system, when the outsourced database is a one-dimensional outsourced database, the boundary of each division is a vertical bisector between two adjacent data points.

进一步的，在上述系统中，当所述外包数据库为二维外包数据库时，每个划分的边界为由与所述voronoi图的X坐标轴和Y坐标轴平行的直线围成的格子。Further, in the above system, when the outsourced database is a two-dimensional outsourced database, the boundary of each division is a grid surrounded by straight lines parallel to the X-coordinate axis and the Y-coordinate axis of the voronoi diagram.

进一步的，在上述系统中，所述数据主将所述voronoi图分割成为k个等大小的正方形的格子，其中，k为一平方数。Further, in the above system, the data master divides the Voronoi diagram into k square grids of equal size, where k is a square number.

与现有技术相比，本发明通过数据主生成包含外包数据库的所有数据点的voronoi图，其中，每个数据点的字节数相同,外包数据库中的数据点的个数为N，N为正整数；数据用户或数据主给定参数K，数据主根据所述参数k将所述voronoi图分割成为k个划分，记录每个划分对应的边界，其中，每个划分互不相交，不同划分包含的数据点部分重复或完全不重复，k大于等于1且小于等于N；数据主获取所有划分中包含最多个数据点的划分的字节数作为最长字节数，在除包含最多个数据点的划分之外的每个其它划分中添加随机字节，使除包含最多个数据点的划分之外的每个其它划分的字节数等于所述最长字节数；数据主根据预设的哈希函数对每个边界建立对应的索引，并根据一预设的加密算法将加密后的所有划分及其所有与相应的边界对应的索引发送给服务器存储；数据主将所有划分对应的边界、与所述加密算法对应的解密算法和所述哈希函数发送给所述数据用户存储；所述数据用户确定真实查询点，根据所述真实查询点确定包含所述真实查询点的划分的对应的边界，根据所述哈希函数获取与包含所述真实查询点的划分的对应的边界的对应的索引，并将包含所述真实查询点的划分的对应的边界的对应的索引发送给服务器；所述服务器根据接收到的包含所述真实查询点的划分的对应的边界的对应的索引向所述数据用户发送对应的加密后的包含所述真实查询点的划分；将接收到的加密后的包含所述真实查询点的划分进行解密所述数据用户根据所述解密算法将接收到的加密后的包含所述真实查询点的划分进行解密，获取包含所述真实查询点的划分，并从包含所述真实查询点的划分中获取所述真实查询点的最近邻的数据点，能够在数据用户对服务器上存储的外包数据库中进行最近邻查询时，使服务器无法获知外包数据库中的数据、数据用户的查询点及最近邻的查询结果，保证数据安全。Compared with the prior art, the present invention generates a Voronoi diagram containing all data points of the outsourced database through the data master, wherein the number of bytes of each data point is the same, and the number of data points in the outsourced database is N, where N is Positive integer; the data user or data master gives a parameter K, and the data master divides the Voronoi diagram into k divisions according to the parameter k, and records the boundary corresponding to each division, wherein each division is disjoint, and different divisions The included data points are partially repeated or not repeated at all, k is greater than or equal to 1 and less than or equal to N; the data master obtains the number of bytes of the division that contains the most data points in all divisions as the longest number of bytes, except for the division that contains the most data Random bytes are added to each other division than the division containing the most data points so that the number of bytes in each division other than the division containing the most data points is equal to the longest number of bytes; The hash function establishes a corresponding index for each boundary, and sends all encrypted partitions and all indexes corresponding to the corresponding boundary to the server for storage according to a preset encryption algorithm; the data master stores all partitions corresponding to the boundary, The decryption algorithm corresponding to the encryption algorithm and the hash function are sent to the data user for storage; the data user determines the real query point, and determines the corresponding division of the real query point according to the real query point A boundary, obtaining, according to the hash function, an index corresponding to a boundary corresponding to the partition containing the real query point, and sending the index corresponding to the boundary corresponding to the partition containing the real query point to the server; The server sends the corresponding encrypted division containing the real query point to the data user according to the received corresponding index of the corresponding boundary of the division containing the real query point; the received encrypted containing The division of the real query point is decrypted. The data user decrypts the received encrypted division containing the real query point according to the decryption algorithm, obtains the division containing the real query point, and obtains the division containing the real query point from the In the division of the real query point, the nearest neighbor data point of the real query point can be obtained, so that when the data user performs the nearest neighbor query on the outsourced database stored on the server, the server cannot know the data in the outsourced database and the data user The query points and the query results of the nearest neighbors ensure data security.

附图说明Description of drawings

图1是本发明一实施例的基于等长数据划分的安全最近邻查询的方法的流程图；Fig. 1 is the flowchart of the method for safe nearest neighbor query based on equal-length data division according to an embodiment of the present invention;

图2是本发明一实施例的基于等长数据划分的安全最近邻查询的方法的划分示意图；Fig. 2 is a schematic diagram of division of a method for safe nearest neighbor query based on equal-length data division according to an embodiment of the present invention;

图3是本发明一实施例的一维空间下划分示意图；Fig. 3 is a schematic diagram of one-dimensional space division according to an embodiment of the present invention;

图4是本发明一实施例的MinCs方法的划分示意图；Fig. 4 is a schematic diagram of division of the MinCs method according to an embodiment of the present invention;

图5a是本发明一实施例的MinCs划分方法下k对划分时间代价的影响图；Fig. 5 a is the impact diagram of k on the division time cost under the MinCs division method of an embodiment of the present invention;

图5b是本发明一实施例的MinCs划分方法下|D|对划分时间代价的影响图；Fig. 5b is a diagram showing the influence of |D| on the division time cost under the MinCs division method according to an embodiment of the present invention;

图6a是本发明一实施例的划分大小的平均值、最大值和最小值随参数k变化情况图；Fig. 6a is a diagram showing the variation of the average value, maximum value and minimum value of the division size with parameter k according to an embodiment of the present invention;

图6b是本发明一实施例的划分大小的平均值、最大值和最小值随|D|变化情况图Fig. 6b is a diagram showing the variation of the average value, maximum value and minimum value of the division size with |D| according to an embodiment of the present invention

图7a是本发明一实施例的MinCs划分方法的总运行时间随参数k变化情况图；Fig. 7a is a graph of the total running time of the MinCs division method according to an embodiment of the present invention as a function of parameter k;

图7b是本发明一实施例的MinCs划分方法的总运行时间随|D|变化情况图；Fig. 7b is a diagram showing the variation of the total running time with |D| of the MinCs division method according to an embodiment of the present invention;

图8a是本发明一实施例的MinCs划分方法下|E(D)|随k变化情况图；Fig. 8a is a diagram of |E(D)| changing with k under the MinCs division method according to an embodiment of the present invention;

图8b是本发明一实施例的MinCs划分方法下|E(D)|随|D|变化情况图；Fig. 8b is a graph showing the variation of |E(D)| with |D| under the MinCs division method according to an embodiment of the present invention;

图9a是本发明一实施例的MinCs划分方法下查询通信代价随k变化情况图；Fig. 9a is a graph showing the variation of query communication cost with k under the MinCs division method according to an embodiment of the present invention;

图9b是本发明一实施例的MinCs划分方法下查询通信代价随|D|变化情况图；Fig. 9b is a graph showing the variation of query communication cost with |D| under the MinCs division method according to an embodiment of the present invention;

图10a是本发明一实施例的MinCs划分方法下查询时间随|D|变化情况图；Fig. 10a is a diagram showing the variation of query time with |D| under the MinCs division method according to an embodiment of the present invention;

图10b是本发明一实施例的MinCs划分方法下查询时间随k变化情况图；Fig. 10b is a graph showing the variation of query time with k under the MinCs division method according to an embodiment of the present invention;

图11是本发明一实施例的基于等长数据划分的安全最近邻查询的系统的功能模块示意图。FIG. 11 is a schematic diagram of functional modules of a system for secure nearest neighbor query based on equal-length data division according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

随着“云计算”概念及其应用的日益普及，针对“云”上的加密数据集E(D)的“安全查询问题”得到了越来越多的关注。本发明即对其中的“安全最近邻”（secure nearest neighbor,SNN）问题进行了较为深入的研究；该问题涉及数据主（data owner）、数据用户（client）和服务器（server）三方，数据用户会向服务器发送查询密文（encrypted query，E(q)）来获取查询点在E(D)中的最近数据点（即“最近邻”）的密文，但要保证不能让服务器获知数据和查询的具体内容。具体说，SNN问题涉及数据主、数据用户和服务器三方及所述三方其相应的动作：With the increasing popularity of the concept of "cloud computing" and its applications, more and more attention has been paid to the "security query problem" for the encrypted data set E(D) on the "cloud". In the present invention, the problem of "secure nearest neighbor (SNN)" (secure nearest neighbor, SNN) has been studied in depth; The query ciphertext (encrypted query, E(q)) will be sent to the server to obtain the ciphertext of the nearest data point (that is, "nearest neighbor") of the query point in E(D), but it must be ensured that the server cannot be informed of the data and The specific content of the query. Specifically, the SNN problem involves the three parties of data master, data user and server and the corresponding actions of the three parties:

（1）数据主：拥有由d维欧式点或对象构成的外包数据库D，并会将D外包给不完全可靠的服务器。(1) Data Master: Owns an outsourced database D consisting of d-dimensional Euclidean points or objects, and outsources D to an incompletely reliable server.

（2）数据用户：需要对所述外包数据库D进行查询。(2) Data user: need to query the outsourced database D.

（3）服务器：不完全可靠，会因为自身原因或第三方原因而窥探来自数据主的数据内容和来自数据用户的查询内容。(3) Server: It is not completely reliable, and will spy on the data content from the data master and the query content from the data user due to its own reasons or third-party reasons.

为了让数据用户能在外包数据库D上进行最近邻（NN，nearest neighbor）,查询，却不会让服务器获知数据主的外包数据库D中的数据、数据用户的真实查询及查询结果的具体信息，数据主须对外包数据库D应用某种加密算法E进行加密，可以用E(D)代表D的密文，用E-1代表相应的解密算法。相似的，数据用户应将其查询点q（具体代表一个查询点）加密得到E(q)发送给服务器，这里需要力图保证SNN方法在安全性上同数据主使用的加密算法E是一样的，也就是说，所有E在SNN方法中被证明是安全（在选择明文攻击下是不可分辨的，或在选择密文攻击下是不可分辨的）的攻击模型。In order to allow data users to perform nearest neighbor (NN, nearest neighbor) queries on the outsourced database D, without allowing the server to know the data in the data master’s outsourced database D, the actual query of the data user, and the specific information of the query results, The data master must apply a certain encryption algorithm E to the outsourced database D to encrypt. E(D) can be used to represent the ciphertext of D, and E-1 can be used to represent the corresponding decryption algorithm. Similarly, data users should encrypt their query point q (representing a query point) to obtain E(q) and send it to the server. Here, we need to try to ensure that the SNN method is the same as the encryption algorithm E used by the data owner in terms of security. That is, all E are proven to be secure (indistinguishable under chosen-plaintext attack, or indistinguishable under chosen-ciphertext attack) attack models in SNN methods.

事实上可以证明，在只给定E(q)和E(D)的条件下，服务器不可能查找到精准的查询点最近邻。但若不要求服务器给出SNN查询结果的精准定位，而只需给出查询结果的一个“大致定位”呢？不妨设想一个非常简单直接的查询处理方法：数据主将外包数据库D看作一个整体进行加密得到E(D)后，传送给服务器；而服务器可以将整个E(D)作为查询结果返回给数据用户；数据用户再用E^-1对E(D)进行解密（假定数据主与数据用户共享E^-1）得到D，从而可以在本地计算出SNN查询的结果——可将这一“朴素”的方法称为“（直接）传送D”（Send-D方法）算法。显然，这种算法具有与E相同的安全性，但却十分低效。于是在此基础上，如图1所示，进一步提出了更为高效的SNN处理方法，即本发明提供的一种基于等长数据划分的安全最近邻查询的方法（secure voronoi diagram,SVD），包括预处理阶段步骤S1～S5，查询阶段包括步骤S6～S8，查询阶段的步骤S6～S8可以根据实际查询需要不断重复，其中，In fact, it can be proved that, given only E(q) and E(D), it is impossible for the server to find the exact nearest neighbor of the query point. But what if the server is not required to give the precise location of the SNN query result, but only needs to give a "rough location" of the query result? Let’s imagine a very simple and direct query processing method: the data owner regards the outsourced database D as a whole and encrypts it to obtain E(D), then sends it to the server; and the server can return the entire E(D) as a query result to the data user; The data user then uses E ^-1 to decrypt E(D) (assuming that the data master shares E ^-1 with the data user) to obtain D, so that the result of the SNN query can be calculated locally - this "simple" method can be Called the "(direct) transmission D" (Send-D method) algorithm. Obviously, this algorithm has the same security as E, but it is very inefficient. On this basis, as shown in Figure 1, a more efficient SNN processing method is further proposed, that is, a secure nearest neighbor query method (secure voronoi diagram, SVD) based on equal-length data division provided by the present invention, It includes steps S1-S5 in the preprocessing stage, and the query stage includes steps S6-S8. Steps S6-S8 in the query stage can be repeated according to the actual query needs. Among them,

步骤S1，数据主生成包含外包数据库D的所有数据点p_i的voronoi图，其中，每数据点p_i的字节数相同,外包数据库中的数据点的个数为N，N为正整数。由于在后续步骤S2中要产生外包数据库D的k个划分，这里一个关键问题就是如何产生外包数据库D的（可能存在重叠的）“划分”G(D)={G₁,...,G_k}，考虑到的目标是有两个：（1）SNN查询结果（查询点的最近邻）至少应被包含于某一划分中，且（2）数据用户能够用尽可能小的信息量确定出G_i以降低查询代价，因此一个自然的选择就是用基于外包数据库D的voronoi图来构建G(D)，其中每个数据点p_i用一个voronoi cell表示。Step S1, the data master generates a Voronoi diagram including all data points p _i of the outsourced database D, where the number of bytes of each data point p _i is the same, and the number of data points in the outsourced database is N, where N is a positive integer. Since k partitions of the outsourced database D are to be generated in the subsequent step S2, a key issue here is how to generate (possibly overlapping) “partitions” G(D)={G ₁ ,...,G _k }, two goals are considered: (1) SNN query results (nearest neighbors of query points) should be included in at least one partition, and (2) data users can determine G _i is used to reduce the query cost, so a natural choice is to use the voronoi diagram based on the outsourced database D to construct G(D), where each data point p _i is represented by a voronoi cell.

本发明需要针对以下四个问题：The present invention needs to address following four problems:

问题一：边界P(D)用voronoi cell来描述空间代价太大。例如，当所述外包数据库D为二维即d=2时，每个voronoi cell是一个任意形状的凸多边形，并且这些多边形平均会有6个顶点。因此，如此表示边界P(D)所占的存储空间将比存储外包数据库D本身的大得多。Problem 1: The boundary P(D) uses voronoi cell to describe the space cost too much. For example, when the outsourced database D is two-dimensional, that is, d=2, each voronoi cell is a convex polygon of any shape, and these polygons have an average of 6 vertices. Therefore, the storage space occupied by the boundary P(D) represented in this way will be much larger than that of the outsourced database D itself.

问题二：如何给边界P(D)建立索引以使数据用户能够迅速地确定所需要的原始索引值i。这一问题在边界P(D)中元素具有不规则边界时会显得尤为突出，例如：当B_j∈P(D)为一voronoi cell时（此时B_j即是任意形状的的凸多边形）。Question 2: How to establish an index for the boundary P(D) so that data users can quickly determine the required original index value i. This problem is particularly prominent when the elements in the boundary P(D) have irregular boundaries, for example: when B _j ∈ P(D) is a voronoi cell (at this time B _j is a convex polygon of any shape) .

问题三：保证|E(G_i)|=|E(G_j)|（i≠j），如此服务器才不会根据划分（密文）的大小情况区分出它们，进而了解划分情况。在任意安全加密算法E中我们都需保证这一点，因此需对G(D)加以强制约束|G_i|_b=|G_j|_b（i≠j），其中|G_i|_b表示划分G_i的字节数（注意区分表示G_i中数据点的个数为|G_i|）。Question 3: Guarantee |E(G _i )|=|E(G _j )|(i≠j), so that the server will not distinguish them according to the size of the division (ciphertext), and then understand the division situation. We need to ensure this in any secure encryption algorithm E, so we need to enforce constraints on G(D) |G _i | _b = |G _j | _b (i≠j), where |G _i | _b represents the division of G The number of bytes of _i (note that the number of data points in G _i is distinguished as |G _i |).

问题四：数据用户在本地计算出满足nn(q,D)∈G_i的原始索引值i后，不应将原始索引值i直接送至服务器获取E(G_i)，这样方可保证服务器不会在查询处理过程中获知任何关于划分情况的信息。若数据用户只向服务器发送原始索引值i的密文，那么这一问题就不存在了；但这意味着服务器必须能通过原始索引值i的密文查找到E(G_i)。Question 4: After the data user calculates the original index value i satisfying nn(q,D)∈G _i locally, he should not directly send the original index value i to the server to obtain E(G _i ), so as to ensure that the server does not Any information about the partition is learned during query processing. If the data user only sends the ciphertext of the original index value i to the server, then this problem does not exist; but this means that the server must be able to find E(G _i ) through the ciphertext of the original index value i.

步骤S2，数据用户或数据主给定参数K，数据主根据所述参数k将所述voronoi图G(D)分割成为k个划分G₁,...,G_k，记录每个划分对应的边界P(D)={B₁,...,B_k}，其中，每个划分互不相交，不同划分包含的数据点部分重复或完全不重复，k大于等于1且小于等于N。具体的，根据用户给定的参数k将D分割为k部分，每个这样的部分称为一个“划分”（各划分之间允许相交，即各划分可含有相同数据点），即有G(D)={G₁,...,G_k}，然后将E(D)={E(G₁),...,E(G_k)}传送至所述服务器。在这个过程中，还需对各划分的“几何边界”P(D)={B₁,...,B_k}（其中B_i为划分G_i的几何边界）进行存储，这样的信息在保证足够描述出划分情况的前提下，所占空间当然是越小越好。Step S2, the data user or data master gives a parameter K, and the data master divides the Voronoi diagram G(D) into k partitions G ₁ ,...,G _k according to the parameter k, and records the corresponding Boundary P(D)={B ₁ ,...,B _k }, where each partition is mutually disjoint, data points contained in different partitions are partially repeated or not repeated at all, and k is greater than or equal to 1 and less than or equal to N. Specifically, according to the parameter k given by the user, D is divided into k parts, and each such part is called a "partition" (intersections are allowed between the partitions, that is, each partition can contain the same data points), that is, G( D)={G ₁ ,...,G _k }, and then transmit E(D)={E(G ₁ ),...,E(G _k )} to the server. In this process, it is also necessary to store the "geometric boundary" P(D)={B ₁ ,...,B _k } of each division (where B _i is the geometric boundary of division G _i ), such information is in Under the premise of ensuring sufficient description of the division, the smaller the occupied space, the better.

这里先考虑一种极端情形：如果令k=N（N为数据集D的数据点的个数，即|D|），那么每个G_i即由单一数据点p_i∈D构成的集合，而边界P(D)即为D的voronoi cell的集合。设p_i的voronoi cell为vc_i（p_i为vc_i所包含），那么给定任意查询点q，数据用户需查找包含q的voronoi cell；假定则数据用户需向服务器请求获取E(G_i)；这里显然有nn(q,D)=p_i以及G_i={p_i}，而算法返回nn(q,E^-1(E(G_i)))=nn(q,{p_i})=p_i，可见为正确结果。可以考虑把上述思想推广应用到k<<N的一般情形。Here we first consider an extreme situation: if k=N (N is the number of data points in the data set D, ie |D|), then each G _i is a set composed of a single data point p _i ∈ D, The boundary P(D) is the set of voronoi cells of D. Suppose the voronoi cell of p _i is vc _i (p _i is included in vc _i ), then given any query point q, the data user needs to find the voronoi cell containing q; suppose Then the data user needs to request E(G _i ) from the server; here obviously nn(q,D)=p _i and G _i ={p _i }, and the algorithm returns nn(q,E ^-1 (E(G _i )))=nn(q,{p _i })=p _i , it can be seen that the result is correct. It can be considered to apply the above idea to the general situation of k<<N.

步骤S3，数据主获取所有划分中包含最多个数据点的划分的字节数作为最长字节数，在除包含最多个数据点的划分之外的每个其它划分中添加随机字节，使除包含最多个数据点的划分之外的每个其它划分的字节数等于所述最长字节数。具体的，假设数据主已经生成了G(D)及P(D)为了上述解决问题三，数据主需保证|G_i|=|G_j|（i≠j）。设G_x为G(D)中具有最多个数据点（下称为“大小”，size）的划分，即对于任意i∈[1,k]，都有|G_i|≤|G_x|。对于各个划分G_i（i≠x），对其添加(|G_x|_b-|G_i|_b)个随机字节，这里可用字符*代表任意随机字节以与G_i中的实际数据点进行区分（*代表的这些随机字符都不会在G_i中实际出现），可称此过程为“随机填补操作”。显然，经此“随机填补操作”，对于任意划分G_i，都有|G_x|_b=|G_i|_b。这样，在后续步骤S4中，无论数据主使用何种安全加密算法生成{E(G₁),...,E(G_k)}，都有|E(G_i)|_b=|E(G_j)|_b（i≠j），从而保证服务器无法在G(D)中区分出任意某个划分，从而了解到划分情况。Step S3, the data master obtains the number of bytes of the division that contains the most data points among all the divisions as the longest number of bytes, and adds random bytes to each division except the division that contains the most data points, so that The number of bytes for each of the divisions other than the division containing the most data points is equal to the longest number of bytes. Specifically, assume that the data master has generated G(D) and P(D). In order to solve the above problem three, the data master needs to ensure that |G _i |=|G _j | (i≠j). Let G _x be the partition with the most data points (hereinafter referred to as "size", size) in G(D), that is, for any i∈[1,k], there is |G _i |≤|G _x |. For each partition G _i (i≠x), add (|G _x | _b -|G _i | _b ) random bytes to it, where the character * can be used to represent any random byte to match the actual data point in G _i (None of these random characters represented by * will actually appear in G _i ), this process can be called "random filling operation". Obviously, after this "random filling operation", for any partition G _i , there is |G _x | _b =|G _i | _b . In this way, in the subsequent step S4, no matter which secure encryption algorithm the data owner uses to generate {E(G ₁ ),...,E(G _k )}, there is |E(G _i )| _b =|E( G _j )| _b (i≠j), so as to ensure that the server cannot distinguish any partition in G(D), so as to understand the partition situation.

步骤S4，数据主根据预设的哈希函数对每个边界建立对应的索引，并根据一预设的加密算法将加密后的所有划分及其所有与相应的边界对应的索引发送给服务器存储。具体的，为了解决上述问题四，数据主应用一秘密随机哈希函数g:[1,N]→Z+，最终将E(D)={(E(g(1)),E(G1))，...,(E(g(k)),E(Gk))}发布给服务器，将P(D)、E^-1和g发布给数据用户。Step S4, the data master establishes a corresponding index for each boundary according to a preset hash function, and sends all encrypted divisions and all indexes corresponding to corresponding boundaries to the server for storage according to a preset encryption algorithm. Specifically, in order to solve the above problem 4, the data master applies a secret random hash function g:[1,N]→Z+, and finally converts E(D)={(E(g(1)),E(G1)) ,...,(E(g(k)),E(Gk))} are published to the server, and P(D), E ^-1 and g are published to data users.

步骤S5，数据主将所有划分对应的边界P(D)、与所述加密算法对应的解密算法和所述哈希函数发送给所述数据用户存储。将P(D)存储在数据用户端，这样对于任意查询点q，数据用户能够高效地确定出边界P(D)的原始索引值i满足nn(q,D)∈G_i，然后根据哈希函数划分对应的边界的对应的索引向服务器提出请求获取到E(G_i)，这个过程完全可以在服务器不知道具体原始i值的情况下进行，如此可以保证服务器不会在答复查询的过程中了解到划分情况；最后，数据用户自然能够轻松得出nn(q,D)=nn(q,E^-1(E(G_i)))。Step S5, the data master sends the boundary P(D) corresponding to all divisions, the decryption algorithm corresponding to the encryption algorithm and the hash function to the data user for storage. Store P(D) in the data user end, so that for any query point q, the data user can efficiently determine the original index value i of the boundary P(D) satisfying nn(q,D)∈G _i , and then according to the hash The corresponding index of the boundary corresponding to the function division makes a request to the server to obtain E(G _i ). This process can be carried out without the server knowing the specific original i value, so that it can be guaranteed that the server will not respond to the query. Knowing the division; finally, data users can easily derive nn(q,D)=nn(q,E ^-1 (E(G _i ))).

步骤S6，所述数据用户确定真实查询点q，根据所述真实查询点确定包含所述真实查询点的划分的对应的边界，根据所述哈希函数获取与包含所述真实查询点的划分的对应的边界的对应的索引，并将包含所述真实查询点的划分的对应的边界的对应的索引E(g(j))发送给服务器，即数据用户将E(D)={(E(g(1)),E(G₁))，...,(E(g(k)),E(G_k))}传送给服务器。Step S6, the data user determines the real query point q, determines the corresponding boundary of the partition containing the real query point according to the real query point, and obtains the boundary of the partition containing the real query point according to the hash function The corresponding index of the corresponding boundary, and the corresponding index E(g(j)) of the corresponding boundary containing the division of the real query point is sent to the server, that is, the data user will E(D)={(E( g(1)),E(G ₁ )),...,(E(g(k)),E(G _k ))} are sent to the server.

步骤S7，所述服务器根据接收到的包含所述真实查询点的划分的对应的边界的对应的索引向所述数据用户发送对应的加密后的包含所述真实查询点的划分。具体的，当所述服务器接收到加密后的所有划分及其所有对应的边界对应的索引后，会直接将加密后的所有划分及对就的索引建立对应关系。In step S7, the server sends the corresponding encrypted partition containing the real query point to the data user according to the received corresponding index of the corresponding boundary of the partition containing the real query point. Specifically, after the server receives all encrypted partitions and indexes corresponding to all corresponding boundaries, it will directly establish a corresponding relationship between all encrypted partitions and corresponding indexes.

步骤S8，所述数据用户根据所述解密算法将接收到的加密后的包含所述真实查询点的划分进行解密，获取包含所述真实查询点的划分，并从包含所述真实查询点的划分中获取所述真实查询点的最近邻的数据点。Step S8, the data user decrypts the received encrypted partition containing the real query point according to the decryption algorithm, obtains the partition containing the real query point, and obtains the partition containing the real query point from the partition containing the real query point Obtain the nearest neighbor data point of the real query point.

上述查询阶段的步骤S6～S8可以根据实际查询需要不断重复，直至没的查询点输入。Steps S6-S8 of the above query stage can be repeated according to actual query needs until no query points are input.

较佳的，所述外包数据库为一至三维外包数据库。Preferably, the outsourced database is a one-to-three-dimensional outsourced database.

优选的，当所述外包数据库为一维外包数据库时，每个划分的边界为两个相邻数据点之间的垂直平分线。具体的，为了解决问题一和问题二，规定G(D)中的各个划分必须具有规则形状。如图3所示，在一维情形下存在一个“最优方案”，它可以产生大小平衡且互不相交的划分，这是因为一维数据点的voronoi图都是由连续而不相交的区间构成，为生成“完全平衡”（即大小相等）的划分，只需在D中找到其1/k，...,(k-1)/k分位点，然后用它们产生G(D)，这些分位点所对应voronoi cell的边界以及±∞确定了P(D)={B₁,...,B_k}。Preferably, when the outsourced database is a one-dimensional outsourced database, the boundary of each division is a perpendicular bisector between two adjacent data points. Specifically, in order to solve the first and second problems, it is stipulated that each partition in G(D) must have a regular shape. As shown in Figure 3, there is an "optimal solution" in the one-dimensional case, which can produce balanced and mutually disjoint partitions, because the Voronoi diagrams of one-dimensional data points are composed of continuous disjoint intervals Formation, to generate "completely balanced" (i.e. equal in size) partitions, simply find their 1/k, ..., (k-1)/k quantiles in D, and use them to generate G(D) , the boundary of the voronoi cell corresponding to these quantile points and ±∞ determine P(D)={B ₁ ,...,B _k }.

上述查找(k-1)个分位点并通过它们对D的voronoi图进行划分的过的程可通过对D的一次线性扫描完成。因此，该算法在外存模型中的IO（输入/输出）代价为O(N log N)。由于每个划分大小均为|D|/k=N/k，因此G_i大小均为|E(N/k)|，故而E(D)大小为k|E(N/k)|；并且显然P(D)的大小是O(k)的。The above process of finding (k-1) quantile points and dividing the Voronoi diagram of D through them can be completed by a linear scan of D. Therefore, the IO (input/output) cost of the algorithm in the external memory model is O(N log N). Since each partition is of size |D|/k=N/k, G _i is of size |E(N/k)|, so E(D) is of size k|E(N/k)|; and Obviously the size of P(D) is O(k).

优选的，当所述外包数据库为二维外包数据库时，每个划分的边界为由与所述voronoi图的X坐标轴和Y坐标轴平行的直线围成的格子。具体的，为了解决上述问题一和问题二，规定G(D)中的各个划分必须具有规则形状，具体来说就是，每个划分必须被限定在一个由与X坐标轴和Y坐标轴（X轴或Y轴）平行的边围成的“格子”（box）中。然而，这意味着划分G_i的边界B_i可能包含或贯穿多个voronoi cell，如图2所示，其中虚线为划分的边界B₁和B₂，代表不同数据点p₁~p₁₆的凸多边形为voronoi cell。Preferably, when the outsourced database is a two-dimensional outsourced database, the boundary of each division is a grid surrounded by straight lines parallel to the X-coordinate axis and the Y-coordinate axis of the voronoi diagram. Specifically, in order to solve the above-mentioned problem 1 and problem 2, it is stipulated that each division in G(D) must have a regular shape, specifically, each division must be limited to a shape defined by the X coordinate axis and the Y coordinate axis (X axis or Y axis) in a "grid" (box) surrounded by parallel sides. However, this means that the boundary B _i that divides G _i may contain or run through multiple voronoi cells, as shown in Figure 2, where the dotted lines are the divided boundaries B ₁ and B ₂ , representing the convexity of different data points p ₁ ~p ₁₆ The polygons are voronoi cells.

为了解决上述问题二，设D中点p_i对应的voronoi cell为vc_i，为保证数据用户能够为P(D)编制索引以及轻松高效地确定出满足nn(q,D)∈G_i的索引值i，可以确定如下划分原则：设G_i的几何边界由B_i表示，那么In order to solve the above problem 2, set the voronoi cell corresponding to point p _i in D as vc _i , in order to ensure that data users can index P(D) and easily and efficiently determine the index satisfying nn(q,D)∈G _i value i, the division principle can be determined as follows: Let the geometric boundary of G _i be represented by B _i , then

原则一：Bi是一个由与X坐标轴和Y坐标轴平行的边围成的“格子”；Principle 1: Bi is a "grid" surrounded by sides parallel to the X-coordinate axis and the Y-coordinate axis;

原则二：即G(D)中不同的划分其对应的边界是互不相交的；Principle two: That is, the corresponding boundaries of different partitions in G(D) are mutually disjoint;

原则三：如果B_i完全包含或相交于一个voronoi cell集V_i，那么G_i={p_j|v_cj∈V_i}，但不同的G_i可能会含重复的数据点。如在图2中，深色的voronoi cell里面的数据点会被同时加入到划分G₁和G₂中，即G₁={p₁,p₂,p₃,p₄,p₅,p₆,p₇,p₈,p₁₀}，G₂={p₅,p₆,p₇,p₈,p₁₀,p₁₁,p₁₂,p₁₃,p₁₄,p₁₅,p₁₆,p₉}。Principle 3: If B _i completely contains or intersects with a voronoi cell set V _i , then G _i ={p _j |v _cj ∈V _i }, but different G _i may contain repeated data points. As shown in Figure 2, the data points in the dark voronoi cell will be added to the division G ₁ and G ₂ at the same time, that is, G ₁ ={p ₁ ,p ₂ ,p ₃ ,p ₄ ,p ₅ ,p ₆ ,p ₇ ,p ₈ ,p ₁₀ }, G ₂ ={p ₅ ,p ₆ ,p ₇ ,p ₈ ,p ₁₀ ,p ₁₁ ,p ₁₂ ,p ₁₃ ,p ₁₄ ,p ₁₅ ,p ₁₆ ,p ₉ }.

引理1.如图2所示，依照上述划分原则，若nn(q,D)=p_i且q∈B_j，则有p_i∈G_j。Lemma 1. As shown in Figure 2, according to the above division principle, if nn(q,D)=pi and q∈B _j , then there is p _i _{∈G j} _.

证明：若有查询点q∈B_j，即q属于B_j所框定的区域，那么q必属于一个被B_j完全包含或贯穿的voronoi cell。依照上述划分原则，这一voronoi cell必属于V_j，而V_j确定了G_j的元素。假设q包含在v_ci中（则有nn(q,D)=p_i），由上可知，v_ci∈Vj，继而p_i∈G_j。Proof: If there is a query point q∈B _j , that is, q belongs to the area framed by B _j , then q must belong to a voronoi cell completely contained or penetrated by B _j . According to the above division principle, this voronoi cell must belong to V _j , and V _j determines the elements of G _j . Assuming that q is included in v _ci (then nn(q,D)=p _i ), we can see from the above that v _ci ∈ Vj, and then p _i ∈ G _j .

由上述引理1可知，步骤S6中，对于任意查询点q，数据用户端的工作就是找到满足q∈B_j的B_j∈P(D)，即在P(D)中找到一个包含q的格子，这实际上是一个“点位置查询”（point location query）的过程。由于P(D)中的格子都是不相交的，但会覆盖整个外包数据库D的voronoi图的空间，因此有且只有一个格子包含点q；另外，由于这些格子的边都是与X坐标轴和Y坐标轴平行的，因此数据用户能够方便地为P(D)编制索引i。之后，数据用户便可以向服务器请求获取E(G_j)，这是因为由所述引理1告，只要q被包含在B_j中，那么就有nn(q,D)∈G_j。From Lemma 1 above, in step S6, for any query point q, the job of the data client is to find B _j ∈ P(D) satisfying q ∈ B _j , that is, to find a grid containing q in P(D) , which is actually a "point location query" (point location query) process. Since the grids in P(D) are all disjoint, but will cover the space of the Voronoi diagram of the entire outsourced database D, there is one and only one grid containing point q; in addition, since the sides of these grids are all aligned with the X coordinate axis is parallel to the Y axis, so data users can easily index i for P(D). Afterwards, the data user can request E(G _j ) from the server, because according to Lemma 1, as long as q is included in B _j , then there is nn(q,D)∈G _j .

通过步骤S7，一旦E(G_j)被服务器返回，步骤S8中数据用户便可以由E^-1对E(G_j)解密得到G_j；之后，数据用户通过识别随机字符*，可以轻松地将之前数据主通过“随机填补操作”添加在G_j中的随机字节序列去除；最后，数据用户可得nn(q,D)=nn(q,G_j)。Through step S7, once E(G _j ) is returned by the server, in step S8, the data user can decrypt E(G _j ) by E ^-1 to obtain G _j ; after that, the data user can easily identify the random character * Previously, the data master removed the random byte sequence added in G _j through "random padding operation"; finally, the data user can obtain nn(q,D)=nn(q,G _j ).

然而别忘了，数据用户在向服务器请求获取E(G_j)时需要解决问题四，通过步骤S4数据主根据预设的哈希函数对每个边界建立对应的索引，并将加密后的所有划分及其对应的所有索引发送给服务器存储之后，在步骤S6中，让数据用户向服务器发送E(g(j))。步骤S7中在服务器端，数据主发送来的E(g(i))与E(G_i)存在配对关系，服务器在接收到数据主发送来的E(D)后，便会建立一个含k个记录的哈希表T，将E(g(i))映射到E(G_i)。如此，数据用户给定一个请求E(g(i))，服务器便可以通过在T中查找，最终找到E(G_i)，这个过程的时间复杂度仅为O(1)，因此服务器可以通过上述过程由E(g(j))高效地查找到E(G_j)。However, don’t forget that the data user needs to solve the fourth problem when requesting E(G _j ) from the server. Through step S4, the data master establishes a corresponding index for each boundary according to the preset hash function, and encrypts all encrypted After the partition and all corresponding indexes are sent to the server for storage, in step S6, the data user is asked to send E(g(j)) to the server. In step S7, on the server side, there is a pairing relationship between E(g(i)) sent by the data master and E(G _i ), and after receiving E(D) sent by the data master, the server will create a record hash table T, mapping E(g(i)) to E(G _i ). In this way, given a request E(g(i)) by the data user, the server can finally find E(G _i ) by searching in T. The time complexity of this process is only O(1), so the server can pass The above process efficiently finds E(G _j ) from E(g(j)).

进一步的，本实施例中提供一种MinCs划分方法，遵循了上述提出的划分原则一至三，由于加入了“随机填补操作”，因此很显然，本实施例的通信代价以及服务器端的存储代价是由最大划分与各划分的大小之差，即|G_x|-|G_i|，或更准确地，|E(G_x)|-|E(G_i)|）决定的，也就是说，为了降低通信代价以及服务器端的存储代价，在设计划分方法时还应尽可能地遵循原则四：尽可能生成大小“平衡”（相等或接近）的划分。Furthermore, this embodiment provides a MinCs division method, which follows the above-mentioned division principles 1 to 3. Since the "random padding operation" is added, it is obvious that the communication cost and the server-side storage cost of this embodiment are determined by The difference between the largest partition and the size of each partition, i.e. |G _x| - |G _i| , or more precisely, |E(G _x )|-|E(G _i )|), that is, for To reduce communication costs and server-side storage costs, principle 4 should be followed as much as possible when designing the partition method: generate partitions with "balanced" (equal or close) sizes as much as possible.

MinCs方法为数据主根据所述参数K将所述voronoi图分割成为K个划分的步骤中，将所述voronoi图分割成为K个等大小的正方形的格子，其中，k为一平方数。In the MinCs method, the data master divides the voronoi diagram into K divisions according to the parameter K, and divides the voronoi diagram into K square grids of equal size, wherein k is a square number.

MinCs方法是最简单的一种划分方法，它将D及其voronoi图划分为网格（grid）状，这里“网格”中的每个格子的几何大小是相同的。这样的划分决定了P(D)中元素为所有边都与X坐标轴和Y坐标轴平行的正方形格子。一旦给定P(D)，便可按照图2所示的思路产生G(D)。如图4所示，其中k=4；请注意这里k必须为一平方数。在MinCs方法中，只需一个数值——小正方形的边长l——便可表示出P(D)（假设D边界值为已知参数）。The MinCs method is the simplest division method, which divides D and its voronoi diagram into a grid shape, where the geometric size of each grid in the "grid" is the same. Such a division determines that the elements in P(D) are square lattices whose sides are parallel to the X-coordinate axis and the Y-coordinate axis. Once P(D) is given, G(D) can be generated according to the ideas shown in Figure 2. As shown in Figure 4, where k=4; please note that here k must be a square number. In the MinCs method, only one value—the side length l of the small square—can express P(D) (assuming that the boundary value of D is a known parameter).

为简化讨论，可按由左至右、由下至上的顺序用一对参数值{x,y}来表示一个格子。例如，在图4中，最左下端的格子标为C_1,1，最右上端的格子标为C_2,2。根据每个格子C_i，j与每个划分的几何边界B_x的对应关系，此例中我们可得 $C_{i, j} = B_{(i - 1) \cdot \sqrt{k} + j} .$ To simplify the discussion, a grid can be represented by a pair of parameter values {x, y} in order from left to right and bottom to top. For example, in FIG. 4 , the lower left grid is marked C _1,1 , and the upper right grid is marked C _2,2 . According to the corresponding relationship between each grid C _{i, j} and each divided geometric boundary B _x , in this example we can get $C_{i, j} = B_{(i - 1) &Center Dot; \sqrt{k} + j} .$

易证得：只需知道l和k（k可由l和D边界值算得）和q点坐标，数据用户便可以以O(1)的时间代价找出q所在的格子。因此，在所有的划分方法中，这种方法在数据用户端的存储代价最低，因为P(D)只需一个数值l便可表示，故而称之为“MinCs（Minimum Client Storage）方法”。It is easy to prove: only need to know l and k (k can be calculated from the boundary value of l and D) and the coordinates of point q, the data user can find out the grid where q is located at the time cost of O(1). Therefore, among all partition methods, this method has the lowest storage cost on the data client side, because P(D) only needs a value l to represent, so it is called "MinCs (Minimum Client Storage) method".

更详细的，可用C++语言对上述MinCs划分方法进行实现。本实现的实验中，使用Qhull库对数据集D进行了voronoi划分；使用最新的Crypto++库进行了加密。随后实验的进行是在一台配置为Intel Xeon 3.07GHz CPU、8GB内存的Linux机上。In more detail, the above MinCs division method can be implemented in C++ language. In the experiment implemented in this implementation, the data set D was divided by voronoi using the Qhull library; it was encrypted using the latest Crypto++ library. Subsequent experiments were performed on a Linux machine configured with an Intel Xeon 3.07GHz CPU and 8GB of memory.

针对二维外包数据库D，实验时具体数据集使用了取样自美国加利福尼亚州（CA）和德克萨斯州（TX）的千万个数据点作为原始数据集，这些数据都来自OpenStreetMap项目。在CA和TX数据集中，各随机选取2,000,000个数据点作为最大实验数据集Dmax，并基于Dmax形成了较小规模的数据集。这里需特别一提的是，当改变数据集大小以测试本实施例的划分方法的可扩展性时，会确保小数据集总是大数据集的子集，这样是为了避免D中具体数据点变化带来的影响，从而单单体现出|D|的影响。For the two-dimensional outsourcing database D, tens of millions of data points sampled from California (CA) and Texas (TX) were used as the original data set in the experiment, and these data came from the OpenStreetMap project. In the CA and TX data sets, 2,000,000 data points were randomly selected as the largest experimental data set Dmax, and a smaller-scale data set was formed based on Dmax. What needs to be specially mentioned here is that when changing the size of the data set to test the scalability of the division method of this embodiment, it will ensure that the small data set is always a subset of the large data set, so as to avoid specific data points in D The impact of the change, thus only reflecting the impact of |D|.

对实验所涉及参数的默认设置如下：|D|=106，k=625（|D|、k分别为数据点的个数和最后的划分数）；数据点的个数默认使用来自CA的数据；使用AES加密算法进行加密，其key大小和块大小均为256比特。这里需特别一提的是，实验过其他加密算法后，发现不同加密算法对本实施例的性能几乎不构成影响。因此任何安全的公共密钥或对称密钥加密算法都可用于实现本实施例，并且不同加密算法实现本实施例性能都可由实验说明。最后要说明的是，在全部实验中，除非特别声明，当将某个参数作为变量进行研究时，其他参数均为默认值。The default settings for the parameters involved in the experiment are as follows: |D|=106, k=625 (|D|, k are the number of data points and the number of final divisions respectively); the number of data points uses the data from CA by default ; Use the AES encryption algorithm for encryption, and its key size and block size are both 256 bits. What needs to be particularly mentioned here is that after experimenting with other encryption algorithms, it is found that different encryption algorithms have almost no impact on the performance of this embodiment. Therefore, any secure public key or symmetric key encryption algorithm can be used to implement this embodiment, and the performance of different encryption algorithms to achieve this embodiment can be demonstrated by experiments. Finally, it should be noted that in all experiments, unless otherwise stated, when a parameter is studied as a variable, other parameters are default values.

具体实验结果如下：The specific experimental results are as follows:

1.预处理阶段1. Preprocessing stage

在预处理阶段，数据主端需进行划分和加密两项工作，它们均主要受划分数k和数据集大小|D|的影响。图5a、5b所示分别为MinCs方法下k和|D|对运行时间的影响。在图5a中，很明显MinCs的时间代价几乎是一个与k无关的常数，这是因为MinCs的划分过程简繁程度本身与k大小无关。MinCs实际上是最简单直接的一种划分方法；特别值得一提的，它将1,000,000个点分割为1,225个划分只需22秒。In the preprocessing stage, the data master needs to perform two tasks: division and encryption, both of which are mainly affected by the number of divisions k and the size of the data set |D|. Figures 5a and 5b show the effects of k and |D| on the running time under the MinCs method, respectively. In Figure 5a, it is obvious that the time cost of MinCs is almost a constant independent of k, because the simplicity and complexity of the division process of MinCs itself has nothing to do with the size of k. MinCs is actually the most simple and direct method of division; it is particularly worth mentioning that it only takes 22 seconds to divide 1,000,000 points into 1,225 divisions.

图5b显示的是数据大小|D|变化（从250,000到2,000,000）对划分时间代价的影响。显然，MinCs的处理时间同N=|D|呈线性关系。Figure 5b shows the effect of varying the data size |D| (from 250,000 to 2,000,000) on the partition time cost. Obviously, the processing time of MinCs is linear with N=|D|.

下面来看MinCs方法下产生的划分G(D)={G1，...,Gk}的大小（进行“随机填补操作”前）。由于“随机填补操作”会通过填补随机字节将所有划分的大小增至与最大划分一样大，因此以下两个数值对于评估本实施例的划分方法的性能至关重要：最大划分的大小|G_x|和(|G_x|-|G_i|)的方差（i∈[1,k]）。|G_x|决定了服务器端的存储代价和每次查询的通信代价；(|G_x|-|G_i|)的方差决定了填补操作本身的代价。为了将这些数值在一个图中简单直观地呈现，图6a、6b分别显示了划分大小的平均值(avg partition size)、最大值|G_x|＝max_i∈[1，k]|G_i|和最小值|G_y|=min_i∈[1，k]|G_i|随参数k和|D|的变化情况。Let's look at the size of the partition G(D)={G1,...,Gk} generated under the MinCs method (before performing the "random filling operation"). Since the "random pad operation" increases the size of all partitions to be as large as the largest partition by padding with random bytes, the following two values are critical for evaluating the performance of the partitioning method of this embodiment: Size of largest partition|G The variance of _x | and (|G _x |-|G _i |) (i∈[1,k]). |G _x | determines the storage cost on the server side and the communication cost of each query; the variance of (|G _x |-|G _i |) determines the cost of the filling operation itself. In order to present these values simply and intuitively in one graph, Figures 6a and 6b show the average value of the division size (avg partition size), maximum |G _x |=max _{i∈[1, k]} |G _i | and minimum |G _y |=min _{i∈[1, k]} |G _i | with parameters k and | The change of D|.

如图6a、6b所示，MinCs划分方法中，其划分大小的平均值和最大值随k增加都是递减的。As shown in Figures 6a and 6b, in the MinCs division method, the average value and maximum value of the division size decrease with the increase of k.

下面通过图7a、7b来看在预处理阶段MinCs方法的总运行时间（total runningtime）。所谓总运行时间，具体含划分和加密两个步骤的时间（voronoi划分时间和“随机填补操作”时间也包含在内，不过它们相对划分和加密时间来说要小一些）。在图7a、7b中，还加入了所述Send-D方法的预处理时间以作参照，Send-D方法的预处理时间就是将D看做一个整体进行加密的时间。Let's look at the total running time (total running time) of the MinCs method in the preprocessing stage through Figures 7a and 7b. The so-called total running time specifically includes the time of the two steps of division and encryption (Voronoi division time and "random filling operation" time are also included, but they are relatively smaller than the division and encryption time). In Figs. 7a and 7b, the preprocessing time of the Send-D method is also added for reference, and the preprocessing time of the Send-D method is the time for encrypting D as a whole.

另外，从图7a、7b中可见，MinCs方法预处理阶段的总时间随k或|D|增加都是线性增长的。In addition, it can be seen from Figures 7a and 7b that the total time of the preprocessing stage of the MinCs method increases linearly with the increase of k or |D|.

再来看最终产生的E(D)的大小，这是影响服务器端存储代价和数据主至服务器通信代价的关键因素。进行过随机填补操作后，每个划分都具有了与最大划分相同的大小，因此，|E(D)|=k|E(Gx)|=k|E(Gi)|（i∈[1,k]）。Let's look at the size of the final E(D), which is a key factor affecting the storage cost on the server side and the communication cost from the data master to the server. After random padding, each partition has the same size as the largest partition, so |E(D)|=k|E(Gx)|=k|E(Gi)|(i∈[1, k]).

图8a、8b分别显示了MinCs方法下|E(D)|（size of E(D)）随k或|D|变化的情况。类似于针对图7a、7b的讨论，也把D的大小和将D作为一个整体进行加密得到的E(D)的大小（即Send-D的代价）加入图8a、8b中以作参照。显然，MinCs方法中E(D)的大小随k或|D|增加都是线性增长的。Send-D对应的E(D)大小也是随|D|线性增长的，但却与k无关。自然，相对直接传送外包数据库D的明文本身，MinCs方法会引入数据主至服务器的通信代价和服务器端的存储代价。Figures 8a and 8b show the variation of |E(D)| (size of E(D)) with k or |D| under the MinCs method, respectively. Similar to the discussion of Figures 7a and 7b, the size of D and the size of E(D) obtained by encrypting D as a whole (that is, the cost of Send-D) are also added to Figures 8a and 8b for reference. Obviously, the size of E(D) in the MinCs method increases linearly with the increase of k or |D|. The size of E(D) corresponding to Send-D also increases linearly with |D|, but it has nothing to do with k. Naturally, compared to directly transmitting the plaintext of the outsourced database D itself, the MinCs method will introduce communication costs from the data master to the server and storage costs on the server side.

数据用户端的存储代价是取决于P(D)的大小的，并且这一代价在MinCs方法中是O(1)。The storage cost of the data client depends on the size of P(D), and this cost is O(1) in the MinCs method.

最后需声明的是，在实验中观察到使用哪种数据集（CA数据集或TX数据集）对实验结果几乎没有什么明显的差别，因此为简便起见，这里没有讨论TX数据集上的实验结果。Finally, it should be stated that there is almost no significant difference in the experimental results which data set (CA data set or TX data set) is used in the experiment, so for the sake of brevity, the experimental results on the TX data set are not discussed here .

2.查询处理代价2. Query processing cost

首先，对任意查询点q，采用本实施例的方法，服务器至数据用户端的通信代价仅仅取决于|E(G_j)|。然而，正如上述对图8a、8b结果的分析，由于进行了随机填补操作，因此每个划分有了相同的大小，并且|E(G_i)|=|E(D)|/k（i∈[1,k]）。相反，在Send-D方法中，服务器至数据用户端的通信代价就是对数据集D整体加密的大小，即|E(D as one message)|。因此，虽然|E(D as one message)|比MinCs方法生成的|E(D)|小得多（如图8a、8b所示），如图9a、9b所示，本实施例的方法中服务器至数据用户端的查询通信代价（query communication）仍然比Send-D的小得多。First, for any query point q, using the method of this embodiment, the communication cost from the server to the data client only depends on |E(G _j )|. However, just like the above analysis of the results of Fig. 8a, 8b, due to the random padding operation, each partition has the same size, and |E(G _i )|=|E(D)|/k(i∈ [1,k]). On the contrary, in the Send-D method, the communication cost from the server to the data client is the size of the overall encrypted data set D, ie |E(D as one message)|. Therefore, although |E(D as one message)| is much smaller than |E(D)| generated by the MinCs method (as shown in Figures 8a and 8b), as shown in Figures 9a and 9b, in the method of this embodiment The query communication cost from the server to the data client is still much smaller than that of Send-D.

下面，再来看数据用户端的查询处理代价。每次实验，都随机进行了100次查询，然后得到如图10a、10b所示的MinCs方法的平均处理时间。图10a显示MinCs方法的查询时间（query time）是随k增大而递减的，这显然是因为k增大导致了划分变小。相比Send-D，MinCs方法的性能要好得多。图10b所示，而当|D|增加时，MinCs方法的查询时间都随之线性增长。Next, let's look at the query processing cost of the data client. For each experiment, 100 queries were randomly performed, and then the average processing time of the MinCs method as shown in Figure 10a and 10b was obtained. Figure 10a shows that the query time of the MinCs method decreases as k increases, which is obviously because the division becomes smaller as k increases. Compared with Send-D, the performance of MinCs method is much better. As shown in Figure 10b, when |D| increases, the query time of the MinCs method increases linearly.

本实施例的算法效率包括预处理阶段的时空代价和查询阶段的查询代价，所述预处理阶段的时空代价包括时间代价和存储代价，查询阶段的查询代价包括时间代价和通信代价：The algorithm efficiency of this embodiment includes the space-time cost of the preprocessing stage and the query cost of the query stage. The space-time cost of the preprocessing stage includes time cost and storage cost, and the query cost of the query stage includes time cost and communication cost:

1.SVD算法进行预处理时的时间代价主要体现在以下三阶段：1. The time cost of the SVD algorithm for preprocessing is mainly reflected in the following three stages:

（1）得到D的voronoi图；(1) Get the voronoi diagram of D;

（2）对D进行划分；(2) Divide D;

（3）生成E(D)。(3) Generate E(D).

针对一维和二维的外包数据库，阶段（1）的代价是O(NlogN)的。For one-dimensional and two-dimensional outsourced databases, the cost of stage (1) is O(NlogN).

而在第（2）阶段（对D进行划分），一维情形下，很显然可以在对数据进行排序后通过一次遍历得到所需的分位点，因此阶段2的代价也是O(NlogN)的；二维情形下，该阶段的代价取决于我们所选用的划分方法。MinCs方法的代价为O(N)。In the (2) stage (dividing D), in the one-dimensional case, it is obvious that the required quantile points can be obtained through one traversal after sorting the data, so the cost of stage 2 is also O(NlogN) ; in the two-dimensional case, the cost of this stage depends on the division method we choose. The cost of the MinCs method is O(N).

阶段（3）的代价同加密代价呈线性关系。假设通过加密算法E对信息m进行加密的代价为e(m)；由于“随机填补操作”将每个划分的大小都增至最大，因此可得生成E(D)的时间复杂度是O(ke(|Gx|b))的，其中 $G_{x} = {\arg \max}_{G_{i} &Element; G (D)} | G_{i} | .$ The cost of stage (3) is linear with the encryption cost. Assume that the cost of encrypting information m through the encryption algorithm E is e(m); since the "random padding operation" increases the size of each division to the maximum, the time complexity of generating E(D) is O( ke(|Gx|b)), where $G_{x} = {\arg \max}_{G_{i} &Element; G (D.)} | G_{i} | .$

2.预处理阶段的存储代价2. Storage cost in the preprocessing stage

服务器端的存储代价为|E(D)|，是O(k|E(Gx)|)的。The storage cost on the server side is |E(D)|, which is O(k|E(Gx)|).

数据用户端的存储代价为P(D)和索引i所占的空间。对于同类型的大部分的索引结构（如kd树、R树、线段树等），i的大小是与|P(D)|线性相关的，因此数据用户端的存储代价是O(|P(D)|)的；至于P(D)，一维情形下，P(D)仅仅含有(k-1)个数值，因此|P(D)|=k-1；二维情形下，|P(D)|由所选用的划分方法所决定。MinCs方法的|P(D)|是O(1)。The storage cost of the data client is the space occupied by P(D) and index i. For most index structures of the same type (such as kd tree, R tree, line segment tree, etc.), the size of i is linearly related to |P(D)|, so the storage cost of the data client is O(|P(D )|); As for P(D), in one-dimensional case, P(D) only contains (k-1) values, so |P(D)|=k-1; in two-dimensional case, |P( D) | is determined by the selected division method. The |P(D)| of the MinCs method is O(1).

3.查询阶段的查询代价3. Query cost in the query phase

查询阶段的时间代价主要体现在两端：数据用户端和服务器端。其中，数据用户需通过P(D)的索引查找包含查询点q的B_i∈P(D)，其中任意B_i都是由平行于X坐标轴和Y坐标轴的边围成的d维（一至三维）格子；由于已保证P(D)中的任意两个格子不会相交，且必有一个格子包含q，因此这种查找实际是一个输出大小为1（结果有且只有一个）的典型“点位置查询”过程；在一二维情形下，以上过程的代价仅是O(logk)的。The time cost of the query phase is mainly reflected in two ends: data client and server. Among them, the data user needs to look up the Bi _∈ P(D) containing the query point q through the index of P(D), where any _Bi is a d-dimension surrounded by sides parallel to the X coordinate axis and the Y coordinate axis ( One to three-dimensional) grids; since it has been guaranteed that any two grids in P(D) will not intersect, and there must be a grid containing q, this kind of search is actually a typical output size of 1 (there is only one result). "Point position query"process; in a one-dimensional situation, the cost of the above process is only O(logk).

在服务器端，数据用户给定一个请求E(g(j))，服务器便通过查询哈希表T来找到E(Gj)，这个过程是O(1)的。On the server side, when the data user gives a request E(g(j)), the server finds E(Gj) by querying the hash table T, and this process is O(1).

系统中的单次通信代价为|E(D)|和|P(D)|，这可由我们上面对存储代价的讨论自然得出。查询的通信代价为|E(g(j))|+|E(Gj)|，或|E(g(j))|+|E(Gx)|，或|E(g(j))|+|E(Gj)|/k。The single communication cost in the system is |E(D)| and |P(D)|, which can be derived naturally from our discussion of storage cost above. The communication cost of the query is |E(g(j))|+|E(Gj)|, or |E(g(j))|+|E(Gx)|, or |E(g(j))| +|E(Gj)|/k.

在本实施例的安全性方面，由于本实施例中数据用户仅仅是把E(D)={(E(g(1)),E(G1))，...,(E(g(k)),E(Gk))}传送给服务器，并且在查询处理过程中，只有E(g(j))是对服务器可见的，由此我们可证得下述定理1。In terms of security in this embodiment, since the data user in this embodiment only uses E(D)={(E(g(1)),E(G1)),...,(E(g(k )), E(Gk))} are sent to the server, and only E(g(j)) is visible to the server during query processing, so we can prove the following Theorem 1.

定理1.假设E是某种已在标准安全模型M（如，IND-CPA）中被证明安全的加密算法，那么在M中SVD算法与E具有相同的安全性。Theorem 1. Assuming that E is an encryption algorithm that has been proven safe in the standard security model M (eg, IND-CPA), then the SVD algorithm in M has the same security as E.

证明：在整个处理过程中，服务器只能看到来自数据主的E(D)和来自数据用户的E(g(j))随机序列，因此，服务器只能了解到划分个数k。由于“随机填补操作”保证了|E(G_j)|=|E(G_i)|（i≠j），如果E在M中是安全的，那么显然，服务器不会了解到关于任意划分G_i的边界信息。再者，因为随机哈希函数g:[1,N]→Z⁺不为服务器所知，因此服务器不可能在仅仅被给定E(g(j))的情况下还原出原始索引值i，也即，服务器不可能知道E(D)中(E(g(i),E(G_i))对的原始索引值i。Proof: During the whole process, the server can only see the E(D) from the data master and the E(g(j)) random sequence from the data user, therefore, the server can only know the number k of partitions. Since the "random padding operation" guarantees that |E(G _j )|=|E(G _i )|(i≠j), if E is safe in M, then obviously, the server will not learn about arbitrary division of G Boundary information of _i . Furthermore, because the random hash function g:[1,N]→Z ⁺ is unknown to the server, it is impossible for the server to restore the original index value i given only E(g(j)), That is, it is impossible for the server to know the original index value i of the pair (E(g(i), E(G _i )) in E(D).

综上，本实施例能够在数据用户对服务器上存储的外包数据库中进行最近邻查询时，使服务器无法获知外包数据库中的数据、数据用户的查询点及最近邻的查询结果，保证数据安全。In summary, this embodiment can prevent the server from knowing the data in the outsourced database, the query point of the data user, and the query result of the nearest neighbor when the data user performs the nearest neighbor query on the outsourced database stored on the server, thereby ensuring data security.

如图11所示，本发明还提供另一种基于等长数据划分的安全最近邻查询的系统，包括数据主1、数据用户2和服务器3。As shown in FIG. 11 , the present invention also provides another secure nearest neighbor query system based on equal-length data division, which includes a data master 1 , a data user 2 and a server 3 .

数据主1，用于给定所述参数k，生成包含外包数据库的所有数据点的voronoi图，其中，每个数据点的字节数相同,外包数据库中的数据点的个数为N，N为正整数；根据参数k将所述voronoi图分割成为k个划分，记录每个划分对应的边界，其中，每个划分互不相交，不同划分包含的数据点部分重复或完全不重复，k大于等于1且小于等于N；获取所有划分中包含最多个数据点的划分的字节数作为最长字节数，在除包含最多个数据点的划分之外的每个其它划分中添加随机字节，使除包含最多个数据点的划分之外的每个其它划分的字节数等于所述最长字节数；根据预设的哈希函数对每个边界建立对应的索引，并根据一预设的加密算法将加密后的所有划分及其所有对应的边界对应的索引发送给服务器存储及其所有与相应的边界对应的索引发送给服务器存储；将所有划分对应的边界、与所述加密算法对应的解密算法和所述哈希函数发送给所述数据用户存储。其中，所述参数k由数据主1或数据用户2给定。Data master 1 is used to generate a Voronoi diagram containing all data points of the outsourced database given the parameter k, wherein each data point has the same number of bytes, and the number of data points in the outsourced database is N, N is a positive integer; divide the Voronoi diagram into k divisions according to the parameter k, and record the boundary corresponding to each division, wherein each division does not intersect with each other, and the data points contained in different divisions are partially repeated or not repeated at all, and k is greater than Equal to 1 and less than or equal to N; get the number of bytes of the partition containing the most data points among all partitions as the longest byte count, and add random bytes in every other partition except the partition containing the most data points , make the number of bytes of each other division equal to the longest number of bytes except the division containing the most data points; establish a corresponding index for each boundary according to a preset hash function, and according to a preset The encrypted algorithm sends all encrypted partitions and indexes corresponding to all corresponding boundaries to the server for storage and sends all indexes corresponding to corresponding boundaries to the server for storage; The corresponding decryption algorithm and the hash function are sent to the data user for storage. Wherein, the parameter k is given by the data master 1 or the data user 2 .

数据用户2，用于给定所述参数k，确定真实查询点，根据所述真实查询点确定包含所述真实查询点的划分的对应的边界，根据所述哈希函数获取与包含所述真实查询点的划分的对应的边界的对应的索引，并将包含所述真实查询点的划分的对应的边界的对应的索引发送给服务器；根据所述解密算法将接收到的加密后的包含所述真实查询点的划分进行解密，获取包含所述真实查询点的划分，并从包含所述真实查询点的划分中获取所述真实查询点的最近邻的数据点。The data user 2 is used to determine the real query point given the parameter k, determine the corresponding boundary of the partition containing the real query point according to the real query point, obtain and contain the real query point according to the hash function Query the corresponding index of the corresponding boundary of the division of the query point, and send the corresponding index of the corresponding boundary of the division containing the real query point to the server; according to the decryption algorithm, receive the encrypted data containing the The division of the real query point is decrypted, the division containing the real query point is obtained, and the nearest neighbor data point of the real query point is obtained from the division containing the real query point.

服务器3，用于根据接收到的包含所述真实查询点的划分的对应的边界的对应的索引向所述数据用户发送对应的加密后的包含所述真实查询点的划分。The server 3 is configured to send the corresponding encrypted partition containing the real query point to the data user according to the received corresponding index of the corresponding boundary of the partition containing the real query point.

优选的，当所述外包数据库为一维外包数据库时，每个划分的边界为两个相邻数据点之间的垂直平分线。Preferably, when the outsourced database is a one-dimensional outsourced database, the boundary of each division is a perpendicular bisector between two adjacent data points.

优选的，当所述外包数据库为二维外包数据库时，每个划分的边界为由与所述voronoi图的X坐标轴和Y坐标轴平行的直线围成的格子。Preferably, when the outsourced database is a two-dimensional outsourced database, the boundary of each division is a grid surrounded by straight lines parallel to the X-coordinate axis and the Y-coordinate axis of the voronoi diagram.

其中，所述数据主将所述voronoi图分割成为k个等大小的正方形的格子，其中，k为一平方数。Wherein, the data master divides the Voronoi diagram into k square grids of equal size, where k is a square number.

综上所述，本发明通过数据主生成包含外包数据库的所有数据点的voronoi图，其中，每个数据点的字节数相同,外包数据库中的数据点的个数为N，N为正整数；数据用户或数据主给定参数K，数据主根据所述参数k将所述voronoi图分割成为k个划分，记录每个划分对应的边界，其中，每个划分互不相交，不同划分包含的数据点部分重复或完全不重复，k大于等于1且小于等于N；数据主获取所有划分中包含最多个数据点的划分的字节数作为最长字节数，在除包含最多个数据点的划分之外的每个其它划分中添加随机字节，使除包含最多个数据点的划分之外的每个其它划分的字节数等于所述最长字节数；数据主根据预设的哈希函数对每个边界建立对应的索引，并将加密后的所有划分及其对应的所有索引发送给服务器存储；数据主将所有划分对应的边界、与所述加密算法对应的解密算法和所述哈希函数发送给所述数据用户存储；所述数据用户确定真实查询点，根据所述真实查询点确定包含所述真实查询点的划分的对应的边界，根据所述哈希函数获取与包含所述真实查询点的划分的对应的边界的对应的索引，并将包含所述真实查询点的划分的对应的边界的对应的索引发送给服务器；所述服务器根据接收到的包含所述真实查询点的划分的对应的边界的对应的索引向所述数据用户发送对应的加密后的包含所述真实查询点的划分；所述数据用户根据所述解密算法将接收到的加密后的包含所述真实查询点的划分进行解密，获取包含所述真实查询点的划分，并从包含所述真实查询点的划分中获取所述真实查询点的最近邻的数据点，能够在数据用户对服务器上存储的外包数据库中进行最近邻查询时，使服务器无法获知外包数据库中的数据、数据用户的查询点及最近邻的查询结果，保证数据安全。In summary, the present invention generates a Voronoi diagram containing all data points of the outsourced database through the data master, wherein each data point has the same number of bytes, and the number of data points in the outsourced database is N, and N is a positive integer ; The data user or the data master gives a parameter K, and the data master divides the Voronoi diagram into k divisions according to the parameter k, and records the boundary corresponding to each division, wherein each division is mutually disjoint, and different divisions include Data points are partially repeated or not repeated at all, k is greater than or equal to 1 and less than or equal to N; the data master obtains the number of bytes of the division that contains the most data points in all divisions as the longest number of bytes, except for the division that contains the most data points Random bytes are added to every other partition except the partition containing the most data points so that the number of bytes in every other partition is equal to the longest number of bytes; The Hive function establishes a corresponding index for each boundary, and sends all encrypted partitions and all corresponding indexes to the server for storage; the data master stores the corresponding boundaries of all partitions, the decryption algorithm corresponding to the encryption algorithm and the The hash function is sent to the data user for storage; the data user determines the real query point, determines the corresponding boundary of the partition containing the real query point according to the real query point, and obtains and contains the real query point according to the hash function. The corresponding index of the corresponding boundary of the division of the real query point, and send the corresponding index of the corresponding boundary of the division containing the real query point to the server; The corresponding index of the corresponding boundary of the partition sends the corresponding encrypted partition containing the real query point to the data user; the data user sends the received encrypted partition containing the real query point according to the decryption algorithm Decrypt the division of points, obtain the division containing the real query point, and obtain the nearest neighbor data point of the real query point from the division containing the real query point, which can be stored on the server by the data user When the nearest neighbor query is performed in the database, the server cannot know the data in the outsourced database, the query point of the data user, and the query result of the nearest neighbor, ensuring data security.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言，由于与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant information, please refer to the description of the method part.

专业人员还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.

显然，本领域的技术人员可以对发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包括这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the invention without departing from the spirit and scope of the invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention also intends to include these modifications and variations.

Claims

1. based on a method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, comprising:

The main generation of data comprises the voronoi figure of all data points of Outsourced database, and wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer;

Data user or the main given parameters K of data, described voronoi figure is partitioned into k division according to described parameter k by data main root, record each border dividing correspondence, wherein, each division is mutually disjointed, the data point part that different demarcation comprises repeats or does not repeat completely, and k is more than or equal to 1 and is less than or equal to N;

The byte number comprising the division of an at most data point in all divisions of the main acquisition of data is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number;

Data main root sets up corresponding index according to the hash function preset to each border, and according to a cryptographic algorithm preset, index corresponding to all divisions after encryption and all with corresponding border thereof is sent to server stores;

All corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and the described hash functions of dividing of data chief commander send to described data user to store;

Described data user determines true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server;

Described server sends the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user;

The division comprising described true query point after the encryption received is decrypted according to described decipherment algorithm by described data user, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point.

2. as claimed in claim 1 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, described Outsourced database is one to three-dimensional Outsourced database.

3., as claimed in claim 2 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.

4. as claimed in claim 2 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.

5. as claimed in claim 4 based on the method for the safe nearest neighbor of isometric Data Placement, it is characterized in that, described voronoi figure is partitioned in k the step divided according to described parameter k by data main root, described voronoi figure is partitioned into the foursquare grid of the sizes such as k, wherein, k is a square number.

6. based on a system for the safe nearest neighbor of isometric Data Placement, it is characterized in that, comprising:

Data master, for given parameters k, generate the voronoi figure comprising all data points of Outsourced database, wherein, the byte number of each data point is identical, and the number of the data point in Outsourced database is N, N is positive integer; According to parameter k, described voronoi figure is partitioned into k division, record each border dividing correspondence, wherein, each division is mutually disjointed, and the data point part that different demarcation comprises repeats or do not repeat completely, and k is more than or equal to 1 and is less than or equal to N; The byte number obtaining in all divisions the division comprising an at most data point is as most long word joint number, in each other except comprising the division of an at most data point divides, add random bytes, make each byte number that other divides except comprising the division of a data point at most equal described in most long word joint number; According to the hash function preset, corresponding index is set up to each border, and to be sent to by index corresponding for the border of all divisions after encryption and all correspondences thereof index corresponding to server stores and all with corresponding border thereof to send to server stores according to a cryptographic algorithm preset; Described data user is sent to store all corresponding border, the decipherment algorithm corresponding with described cryptographic algorithm and described hash functions of dividing;

Data user, for given described parameter k, determine true query point, the border of the correspondence of the division comprising described true query point is determined according to described true query point, obtain the corresponding index with the corresponding border of the division comprising described true query point according to described hash function, and the index of the correspondence comprising the border of the correspondence of the division of described true query point is sent to server; According to described decipherment algorithm, the division comprising described true query point after the encryption received is decrypted, obtain the division comprising described true query point, and from the division comprising described true query point, obtain the data point of the arest neighbors of described true query point;

Server, for sending the division comprising described true query point after corresponding encryption according to the index comprising the correspondence on the border of the correspondence of the division of described true query point received to described data user.

7. as claimed in claim 6 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, described Outsourced database is one to three-dimensional Outsourced database.

8., as claimed in claim 7 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is one dimension Outsourced database, the border of each division is the perpendicular bisector between two consecutive number strong points.

9. as claimed in claim 7 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, when described Outsourced database is two-dimentional Outsourced database, the border of each division is the grid surrounded by the straight line parallel with Y-coordinate axle with the X-coordinate axle of described voronoi figure.

10. as claimed in claim 9 based on the system of the safe nearest neighbor of isometric Data Placement, it is characterized in that, voronoi figure described in described data chief commander is partitioned into the foursquare grid of the size such as k, and wherein, k is a square number.