CN108337085B - Approximate neighbor search construction method supporting dynamic update - Google Patents

Approximate neighbor search construction method supporting dynamic update Download PDF

Info

Publication number
CN108337085B
CN108337085B CN201810005649.0A CN201810005649A CN108337085B CN 108337085 B CN108337085 B CN 108337085B CN 201810005649 A CN201810005649 A CN 201810005649A CN 108337085 B CN108337085 B CN 108337085B
Authority
CN
China
Prior art keywords
user
data
tree
rectangle
cloud server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810005649.0A
Other languages
Chinese (zh)
Other versions
CN108337085A (en
Inventor
陈晓峰
郭晶晶
王剑锋
袁浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810005649.0A priority Critical patent/CN108337085B/en
Publication of CN108337085A publication Critical patent/CN108337085A/en
Application granted granted Critical
Publication of CN108337085B publication Critical patent/CN108337085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of cloud computing, and discloses an approximate neighbor search method supporting dynamic update, which comprises the following steps: the client outsourcing a part of own storage and calculation service due to the limited storage capacity of the client, and when using data, the client puts forward an inquiry request; and the cloud server provides storage and computing services for the user and searches in the external database according to the search requirement provided by the user. The cloud server includes: the semi-honest cloud server executes a user protocol, provides a correct retrieval result for the user and snoops the privacy information of the user; malicious cloud servers that spoof users in any indistinguishable way. The invention achieves IND-CKA security higher than the ideal security of order-preserving encryption in security, and only uses the common AES-CBC encryption and Hash function in efficiency, thereby having high efficiency.

Description

Approximate neighbor search construction method supporting dynamic update
Technical Field
The invention belongs to the technical field of cloud computing, and particularly relates to an approximate neighbor retrieval system supporting dynamic updating.
Background
Cloud computing is a pay-as-needed metering mode, a portable and available mode for entering a configurable computing resource sharing pool (resources comprise networks, storage, application software and services) is provided for people, the resources can be rapidly used when a user is enabled to put into management work, the cloud computing is a product formed by the development and fusion of traditional computing technologies and network technologies such as grid computing, parallel computing, utility computing, distributed computing, network storage, network virtualization and load balancing, and the dream of 'taking computing as a facility' for a long time is realized. Cloud computing provides many convenience conditions for real life, such as fast resource consolidation, on-demand services, location-independent resource pools, data outsourcing, which is the most common service model. Due to the limited storage and computing resource capacity of users, data outsourcing provides a portable condition for the users, so that the users with limited resources can outsource the storage and computing burden to the cloud server. Thus, users may enjoy these fast storage and computing resources with little administrative time investment. Since the user data contains some private data (medical data and economic data) and the cloud server is not trusted, the user encrypts his/her data before uploading the data to the cloud server. The data encryption can protect the private data of the user from being acquired by a cloud server or an external attacker, so that the safety of the private data of the user is ensured. However, conventional encryption schemes will destroy the relational information of the data, making the usual operations on the data space no longer possible. The approximate neighbor search is to find the closest data point to point Q in the data set D given query point Q. Approximate neighbor retrieval is the most common retrieval mode in data space, and is applied to multiple fields of pattern recognition, data classification, data mining, artificial intelligence and the like. For example, a resident looking for a store closest to him, a researcher may speed up data aggregation based on approximate neighborhood searching. Traditional encryption methods can protect the data privacy of the user, but traditional data encryption schemes will also destroy the usability of the data. Another conceivable encryption scheme that can secure user data is a fully homomorphic encryption scheme. However, the existing fully homomorphic encryption scheme is not suitable for practical life because of excessive ciphertext expansion and low efficiency. Encryption schemes that can handle ciphertext space-approximating neighbor search have also been proposed, but these schemes have more or less drawbacks in terms of security and efficiency. First, a new Asymmetric Scalar-Product-compressing Encryption scheme is maintained and two approximate neighbor search schemes are constructed simultaneously for different attacks. However, the chosen-plaintext attack cannot be resisted and only linear retrieval complexity is achieved. Then, an approximate neighbor search scheme with compressive linear search complexity is effectively attacked and constructed. However, approximate neighbor retrieval of high dimensional data is not supported. A new approximate adjacent search scheme is constructed based on the existing R-tree data structure to process the approximate adjacent problem of high-dimensional data. And constructing an approximate adjacent retrieval scheme based on a lightweight order-preserving encryption scheme, and simultaneously achieving a line-pressing retrieval scheme under an average condition. However, the infeasibility of a guaranteed-order encryption scheme is not considered. To date, no order-preserving encryption scheme has been able to achieve both non-interactivity and perfect security. The variable cipher text is the premise that the order-preserving encryption scheme achieves ideal security. The problem of low search efficiency exists in other existing encryption schemes, and the approximate neighbor search scheme in the case of a ciphertext is constructed Based on the approximate neighbor search problem in Location Based Service (LBS) and according to the Private Information extraction technology (PIR). However, only query privacy is considered and data privacy is not considered. Furthermore, both are based on secure multi-party computing, which is inefficient. A safe approximate neighbor retrieval scheme suitable for a large data set is designed based on a lightweight order-preserving encryption scheme. However, the order-preserving encryption scheme constructed today does not meet the requirements required in the Wang scheme. If the scheme is guaranteed to achieve ideal security, the existing order-preserving encryption schemes are all interactive encryption schemes, interaction between a cloud server and a client is needed, communication burden is increased, and a user needs to be online in real time. If the number of interactions is reduced in consideration of lowering the security, a loss in security will result. A comparative encryption scheme suitable for the Wang et al scheme would reveal bit information of two different data at a first different bit. In summary, existing order-preserving encryption schemes suffer from various deficiencies in efficiency and security.
In summary, the problems of the prior art are as follows: the existing approximate neighbor search scheme supporting the ciphertext space has low security and low search efficiency.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an approximate neighbor search construction method supporting dynamic updating.
The invention is realized in such a way that an approximate adjacent search method supporting dynamic update comprises the following steps:
(1) key generation keygen (k): DET (Gen, Enc, Dec) is an IND-CPA secure symmetric encryption algorithm, the user generates the key:
sk=(sk1,sk2);
wherein sk1K is the encryption key of the DET encryption algorithm, sk2=(k1,k2,…kl) L different keys for the Hash function H;
(2) construction of the Tree BuildTree (D): the user takes the data set D as input and builds an R-tree from the data set D, wherein the tree is:
={d1,d2,…,dn,B1,B2,…Bm};
(3) data encryption Enc (sk,): the encryption key sk and the R-tree are used as input, a user encrypts data, a retrieval index under a plaintext is constructed for the R-tree, and privacy protection is carried out on the plaintext index for protecting privacy information of the user on the basis of construction of a ciphertext index;
(4) tag generation GenToken (sk, Q) for each interrogation point Q ═ x, y, the user first calculates its interval code:
[Q]=(S([0,x]),S([0,y]));
user hash matrix for the computation set:
Figure BDA0001538532420000041
and will hash the matrix MQSubmitting to a cloud server;
(5) user Query (M)Q,*): receiving search tag MQThe cloud server searches in the encrypted R-tree to find the deepest non-leaf node B containing QdeepestAnd non-leaf node BdeepestEncrypted data in the leaf node involved
Figure BDA0001538532420000044
Returning to the user, decrypting by the user, and calculating to obtain a temporary NN closest to the query point Q; the user obtains a temporary rectangle R and labels M of the temporary rectangle RRSubmitting the temporary rectangle R to a cloud server, and searching the label M by the cloud server according to the temporary rectangle RRAnd finding out the leaf nodes contained in the NN, returning the leaf nodes to the user, and calculating and obtaining the NN by the user.
Further, the step of using the encryption key sk and the R-tree as input, encrypting data by a user, and constructing a retrieval index in a plaintext for the R-tree includes:
(1) data encryption, input encryption key sk1And a data set D, wherein the user encrypted data is as follows:
Figure BDA0001538532420000042
wherein the content of the first and second substances,
Figure BDA0001538532420000043
(2) a search index is built over the data interval.
Further, the constructing the search query on the data interval specifically comprises:
(1) the user carries out different coding on the data in the R-tree, the submitted inquiry point and the temporary rectangle R, carries out data coding on the data in the R-tree, and carries out interval coding on the inquiry point Q and the temporary rectangle R; for data di=(xi,yi) And MBRBi=(xi1,xi2)×(yi1,yi2) The user encodes it by data encoding:
[di]=(F(xi),F(yi)),1≤i≤n;
[Bi]=(F(xi1),F(xi2))×(F(yi1)F(yi2)),1≤i≤m;
after encoding, the user computes its bloom filter for each set, noted as:
BFdi=(BFxi,BFyi),1≤i≤n;
BFdi=(BFBxi,BFByi),1≤i≤m;
wherein the content of the first and second substances,
Figure BDA0001538532420000051
and
Figure BDA0001538532420000052
(2) for different data x and y, which may have the same elements in their corresponding data code sets f (x) and f (y), a random element r is introduced for each bloom filter, as follows:
H(r,H(k1,P)),H(r,H(k2,P)),…,H(r,H(kl,P))
calculating the value of the bloom filter;
the encrypted data set and the search index are:
Figure BDA0001538532420000053
further, the specific algorithm for calculating and obtaining the NN is as follows:
(1) finding the deepest non-leaf node BdeepestThereafter, the user will BdeepestThe contained data is returned to the user;
(2) determining a circumscribed rectangle R ═ x of the circle Or1,xr2)×(yr1,yr2) The temporary rectangle will be encoded as [ R ]]=S([xr1,xr2])×S([yr1,yr2]) For S ([ x ]r1,xr2]) And S ([ y)r1,yr2]) Calculating the Hash matrix to obtain a retrieval label MR
(3) The user judges whether the temporary rectangle R and the MBR B are intersected or not, the user judges whether the temporary rectangle R and the MBR B are intersected or not by excluding four conditions, after the leaf nodes are reached, the cloud server judges whether an element belongs to a set or not, the leaf nodes belonging to the temporary rectangle R are returned to the user, and the user decrypts and calculates to obtain the NN.
Another object of the present invention is to provide the approximate neighbor search system supporting dynamic update, which includes:
a client for making an inquiry request;
and the cloud server searches in the external database according to the search requirement provided by the user.
Further, the cloud server includes:
the system comprises a semi-honest cloud server, a database and a database, wherein the semi-honest cloud server is used for executing a user protocol and providing a correct retrieval result for a user;
a malicious cloud server that will spoof the user in any indistinguishable way.
Another object of the present invention is to provide a pattern recognition system using the approximate neighbor search method supporting dynamic update.
Another object of the present invention is to provide a machine learning system using the approximate neighbor search method supporting dynamic update.
Another object of the present invention is to provide a location-based service control system using the approximate neighbor search method supporting dynamic update.
It is another object of the present invention to provide artificial intelligence using the approximate neighbor search method supporting dynamic update.
Compared with the existing approximate neighbor search scheme, the method has certain advantages. In terms of security, the present invention achieves higher security than the prior optimized approximate neighbor search scheme Wang et al [ Boyang Wang, Yantian Hou, and Ming Li.2016.practical and secure neighbor search on encrypted large-scale data. In 35th Annual IEEE International Conference Computer Communications, INFOCOM 2016, SanFrancisco, CA, USA, April 10-14,2016.1-9 https:// doi. org/10.1109/INFOCOM.2016.2475389 ], and achieves index indistinguishable security (IND-CKA). In terms of efficiency, the present invention requires only two rounds of interaction, while the scheme of Wong et al [ Wai KitWong, DavidWai-Lok Cheng, Ben Kao, and Nikos Mamoulis.2009.secure kNN computation on encrypted data bases. in Proceedings of the same SIGMOD International Conference management of data, SIGMOD 2009, Providence, Rhode Island, USA, June 29-July 2,2009.139-152. https:/doi.org/10.1145/1559845.1559862 ] and the scheme of Wang et al require O (log) round of interaction, which requires the user to be online in real time, which will put a significant burden on the user. Meanwhile, in terms of retrieval efficiency, the scheme of the invention can achieve sub-linear retrieval efficiency on average, while the scheme of Wong et al and the scheme of Hu et al [ Haibo Hu, JianliangXu, Chushi Ren, and Byron Choi.2011.processing private queries over data clustering. Inproceedings. 27th International Conference on DataEngineering, ICDE 2011, April 11-16,2011, Hannover, Germany.601-612. htps t:// doi.org/10.1109/ICDE.2011.5767862] require linear retrieval efficiency. Finally, in view of the requirements for the servers, the solution of the present invention requires only a single server, while the solution of Elmehdwi et al [ Youef Elmehdwi, Bharath K.Samanhull, and Wei Jiang.2014.secure k-neighbor query over encrypted Data in sourced aspects. in IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31-April 4,2014.664-675. https:// doi.org/10.1109/ICDE 2014.6816690] requires two servers that are not mutually hooked. The specific description is shown in table 1. In short, the invention achieves higher IND-CKA security than the ideal security of order-preserving encryption in security, and only uses the common AES-CBC encryption and the Hash function in the aspect of efficiency, thereby having high efficiency.
Table 1: the SecDNN scheme is compared with the existing approximate neighbor search scheme
Scheme(s) Interaction Efficiency of search Single server
Wong et al protocol O(logn) Linear search Is that
Hu et al protocol 1 Sub-linear search Is that
Elmehdwi et al protocol 1 Linear search NO (2 pieces)
Wang et al protocol O(logn) Sub-linear search Is that
SecDNN protocol 2 Sub-linear search Is that
Drawings
Fig. 1 is a flowchart of an approximate neighbor search method supporting dynamic update according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a bloom filter provided in an embodiment of the present invention.
FIG. 3 is a schematic diagram of finding a deepest non-leaf node in an R-tree according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of finding a leaf node in R according to the generated temporary rectangle R according to the embodiment of the present invention.
Fig. 5 is a schematic diagram of a system model for approximate neighbor search according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of constructing an index tree according to an embodiment of the present invention.
Fig. 7 is a schematic diagram illustrating a situation where four MBRs B and temporary rectangles R provided by the embodiment of the present invention do not intersect.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following examples. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting.
Cloud computing can ensure that users with limited resources outsource own storage and computing tasks to the cloud, and the storage and computing burden of the users is reduced. Since the outsourced data contains some private data and the cloud server is not trusted, the user usually encrypts the data and then outsources the encrypted data to the cloud server. However, data encryption will make data processing at a later time very difficult, such as near-neighbor retrieval. The approximate neighbor search has very wide application in various aspects such as pattern recognition, machine learning, location-based service, artificial intelligence and the like, and is a very common data search mode. Therefore, it becomes important how to ensure that the approximate neighbor search is performed efficiently in the ciphertext database.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
As shown in fig. 1, the approximate neighbor search method supporting dynamic update provided by the embodiment of the present invention includes the following steps:
s101: and (3) key generation: the user selects the own master key according to the security parameters; the master key mainly comprises two parts: a data security encryption key and a random generation key of a bloom filter;
s102: r-tree construction: the user generates an R-tree of the data set D in the clear according to the own data set D. Subsequently, approximate neighbor searching is carried out on the R-tree along with the plaintext;
s103: privacy protection: in order to protect the data privacy of the user, the user can select to encrypt the data in the R-tree; but encryption will destroy the original size relationship of the plaintext, so that the original approximate neighbor retrieval on the R-tree is no longer possible; therefore, a user constructs a retrieval index on the R-tree, so that the original approximate adjacent retrieval is carried out on the ciphertext R-tree;
s104: and (3) label generation: when the retrieval is needed, a user generates a retrieval tag for the existing query point Q; then submitting the retrieval tag to a cloud server;
s105: approximate neighbor search: after the cloud server receives the retrieval tag, the cloud server retrieves the ciphertext database R-tree to find out the required approximate neighbor point;
s106: and (3) updating data: the user updates the encrypted R-tree of the outsource.
Step S103 includes:
data privacy. To protect the confidentiality of the data, the user will choose to encrypt the data.
And (5) index construction. Data encryption can ensure data privacy of users, but can also be performed in a ciphertext space in order to ensure retrieval. Therefore, the invention constructs the retrieval index on the basis of the existing R-tree.
Here, the user uploads the existing encrypted R-tree to the cloud server along with the search index. Thus, the storage and calculation burden of the system is reduced.
Step S105 includes:
and searching for MBRB. Firstly, the cloud server finds the deepest non-leaf node B containing Q according to a retrieval label Q provided by a userdeepest. Finally, B is mixeddeepestThe contained leaf nodes are returned to the user.
A temporary NN. After receiving the leaf nodes, the user decrypts the leaf nodes, and then calculates the temporary approximate neighbor point closest to Q, which is recorded as temporary NN. And then drawing a circle O by taking the Q as a center of a circle and taking the distance between the Q and the temporary NN as a radius, and finding a circumscribed square R of the circle O. And calculating the label of the temporary rectangle R and submitting the label to the cloud server.
NN. And the cloud server searches in the encrypted database according to the label of the temporary rectangle R, finds the leaf node in the temporary rectangle R and returns the leaf node to the user. The NN is computed by the user.
The application of the principles of the present invention will now be described in further detail with reference to the accompanying drawings.
The 1-bloom filter has a m-long bit vector and k independent Hash functions, where { h }1:{0,1}*→[1,m]I is more than or equal to 1 and less than or equal to k. A bloom filter may be used to detect whether an element belongs to an interval. The working principle of the bloom filter is shown in fig. 2.
In the initialization stage, all m positions of the bloom filter are set to be 0; when inserting a new element x, the invention calculates h1(x),h2(x),…hk(x) And sets the corresponding k positions in the bit vector to 1. If the position is already 1, no modification is made. In the stage of inquiring whether an element y belongs to a set, the invention calculates the Hash value h corresponding to the element y1(y),h2(y),…hk(y) is carried out. If there is a position 0 in the corresponding k positions, the element does not belong to this set. If all the corresponding k positions are 1, the element y belongs to the set with negligible probability, and the probability of misjudgment is called false positive (P)f(ii) a The small misjudgment rate indicates that the number n of elements in the set, the number k of Hash and the length m of the bloom filter meet the requirements
Figure BDA0001538532420000101
False positive rate P of bloom filterfTo a minimum value Pf=(1-e-kn/m)k=0.6185m/n
1.1 Prefix coding
Can change the rangeThe query question translates to testing whether the two sets have common elements. First, it will be considered if one w-bit data is encoded. Let x be a w-bit data, whose binary expression is x ═ b1b2…bwThe prefix of data x is encoded as a set f (x) ═ b1b2…bw,b1b2…*,…,b1…, …, i.e. modifying the representation of two bits of data one by one. An interval [ a, b ]]Interval code S ([ a, b)]) Is the minimum set of prefixes that cover the interval. Then
Figure BDA0001538532420000102
Assuming that data 5 is a 5-bit data, the prefix code set of 5 is F (5) {00101,0010, 001, 00, 0 }; for the interval [0,7]Is encoded as S ([0,7 ]]) (vii) {00 x }; since F (5) # S ([0,7 ]]) When {00 x }, then 5 e [0,7 ]]。
1.2R-Tree
The handling of the approximate neighbor search problem with the R-tree specifically includes:
1) according to the NN query point Q, the cloud server searches downwards from the root node to find the deepest non-leaf node B containing the query point QdeepestAnd connecting the non-leaf node BdeepestThe leaf nodes that are included are returned to the user as shown in fig. 3.
2) When the user receives the leaf data, the point closest to the inquiry node Q is calculated and recorded as a temporary NN. And drawing a circle by taking the point Q as the center of the circle and the distance between the point Q and the temporary NN as the radius, and recording as a temporary circle O. Subsequently, the smallest tangent rectangle of the circle O is found, and is denoted as a temporary rectangle R. And submits the temporary rectangle O to the cloud server.
3) After receiving the temporary rectangle O, the cloud server searches from the root node to the leaf node. If the temporary rectangle O intersects with a non-leaf node B, the search continues for the child node of the branch. If disjoint, the search is stopped on this branch. After reaching the leaf node, the cloud server determines whether the leaf node is in temporary rectangle O. If so, the leaf node is returned, and if not, no return is made, as shown in FIG. 4.
A simple example is given in fig. 3 and 4. As shown in fig. 3, it illustrates how the deepest non-leaf node containing query point Q is found from the root node. Fig. 4 shows how the leaf nodes in the temporary rectangle R are found from the root node. If the returned set is not empty, the user calculates their distance from the query point Q, finding the minimum distance point as NN. If the set contains too many elements, the invention randomly takes one element in the set as a temporary NN, and repeats the processes 2 and 3; if the set is an empty set, the temporary NN is the NN.
2 description of the problems
2.1 System model
The SecDNN scheme constructed by the invention mainly comprises two entities: a client and a cloud server. The architecture of the approximate neighbor search is shown in fig. 5.
A client refers to an entity that outsources data in order to conserve its own storage and computing resources. At the same time, it will make an inquiry request for later use.
A cloud server is an entity that provides storage and computing resources. Meanwhile, the search is carried out in the external database according to the search requirement provided by the user.
For clarity of description, the present invention will consider only the semi-honest cloud server model.
A semi-honest cloud server. Under the model, the cloud server can perform a faithful user protocol and cannot provide a correct retrieval result for a user, but for some economic reasons, the cloud server can try to explore the information of outsourced data.
2.2 safety model
IND-CKA security of the present invention is commonly used in keyword search, which means if two sets S are present0And S1Having the same data entries, attacker A cannot decide which set S of them an index is composed ofbAnd (4) the result is obtained.
Definition 1; with the security parameter k as input, the invention states that if a scheme Π is IND-CKA secure, the following equation holds if satisfied for any attacker a:
Figure BDA0001538532420000121
wherein the content of the first and second substances,
Figure BDA0001538532420000122
is an advantage of attacker a in the following game.
(1) The challenger C generates a key sk;
(2) two sets S are randomly selected by an attacker0And S1In which S is0And S1Satisfy that they have the same number of elements, i.e., | S0|=|S1L, |; subsequently, the attacker will assemble S0And S1To the challenger;
(3) is receiving the set S0And S1Then, the attacker randomly selects b ←R{0,1}, and is set SbConstructing an index I and returning the index I to an attacker;
(4) the attacker A guesses b according to the index I obtained by the attacker A to obtain b';
3 invention architecture
The invention assumes that the geometry data dimension d used is 2; pi ═ will be described by the following five sections (KeyGen, BulidTree, Enc, genoken, Query).
(1) KeyGen (k): let DET ═ (Gen, Enc, Dec) be an IND-CPA secure symmetric encryption algorithm. The user generates a key:
sk=(sk1,sk2);
wherein sk1K is the encryption key of the DET encryption algorithm, sk2=(k1,k2,…kl) Is the l different keys of the Hash function H.
(2) Buildtree (d): the user takes the data set D as input and builds an R-tree from the data set D, wherein the tree is:
={d1,d2,…,dn,B1,B2,…Bm};
(3) enc (sk,): and taking the encryption key sk and the R-tree as input, encrypting the data by the user, and constructing a retrieval index in a plaintext for the R-tree so as to help the cloud server to retrieve in a ciphertext data environment. The specific algorithm is as follows:
data encryption, input encryption key sk1And a data set D, wherein the user encrypted data is as follows:
Figure BDA0001538532420000131
wherein the content of the first and second substances,
Figure BDA0001538532420000132
the invention constructs a retrieval index for the data interval; the specific algorithm is as follows:
and (5) establishing an index. The user will encode the data in the R-tree differently from the submitted query point and temporary rectangle R, encode the data in the R-tree, and encode the query point Q and temporary rectangle R in intervals. For data di=(xi,yi) And MBRBi=(xi1,xi2)×(yi1,yi2) The user encodes it by data encoding:
[di]=(F(xi),F(yi)),1≤i≤n;
[Bi]=(F(xi1),F(xi2))×(F(yi1)F(yi2)),1≤i≤m;
after encoding, the user computes its bloom filter for each set, noted as:
BFdi=(BFxi,BFyi),1≤i≤n;
BFdi=(BFBxi,BFByi),1≤i≤m;
wherein the content of the first and second substances,
Figure BDA0001538532420000133
and
Figure BDA0001538532420000134
node randomization, for different data x and y, there may be the same element in their corresponding data encoding sets f (x) and f (y), and in order to eliminate these same elements, the present invention introduces a random element r for each bloom filter, that is, the present invention will be as follows:
H(r,H(k1,P)),H(r,H(k2,P)),…,H(r,H(kl,P))
the value of the bloom filter is calculated.
At this time, the user uploads an encrypted data set and a retrieval index to the cloud server, wherein the encrypted data set and the retrieval index are as follows:
Figure BDA0001538532420000135
(4) genoken (sk, Q) for each query point Q ═ x, y, the user first calculates its interval code:
[Q]=(S([0,x]),S([0,y]));
then, the user computes a hash matrix for the set:
Figure BDA0001538532420000141
and will hash the matrix MQAnd submitting to the cloud server.
(5)Query(MQ,*): receiving search tag MQThe cloud server will then first search through the encrypted R-tree to find the deepest non-leaf node B containing QdeepestAnd connecting the non-leaf node BdeepestEncrypted data in the leaf node involved
Figure BDA0001538532420000142
And returning to the user, decrypting by the user, and calculating to obtain the temporary NN closest to the query point Q. The user obtains a temporary rectangle R and labels M of the temporary rectangle RRSubmitting the temporary rectangle R to a cloud server, and searching the label M by the cloud server according to the temporary rectangle RRFinding out the leaf nodes contained in itAnd returns these leaf nodes to the user. And the user calculates and obtains the NN. The specific algorithm is as follows:
finding Bdeepest(ii) a The cloud server needs to determine whether the query point Q ═ x, y belongs to the non-leaf node B ═ x1,x2)×(y1,y2) (ii) a The query point Q (x, y) belongs to the group MBRB (x)1,x2)×(y1,y2) The sufficient requirements of (A) are: x is an element of [ x ∈ ]1,x2]And y ∈ [ y [ [ y ]1,y2](ii) a And x ∈ [ a, b ]]With the proviso that F (x) is n [ a, b%]Not equal to phi. It can search the label MQAnd a bloom filter stored in the index. In finding the deepest non-leaf node BdeepestThereafter, the user will BdeepestThe contained data is returned to the user.
Calculating a temporary rectangle R; the user receives BdeepestThe contained encrypted data
Figure BDA0001538532420000143
The user then decrypts it and calculates the temporary NN that is closest to the interrogation point Q. Taking the inquiry point Q as the center of a circle, taking the distance between the inquiry point Q and the temporary NN as the radius to make a circle O, and then obtaining the circumscribed rectangle R of the circle O as (x)r1,xr2)×(yr1,yr2). The temporary rectangle will then be encoded as [ R ]]=S([xr1,xr2])×S([yr1,yr2]) Then on S ([ x ]r1,xr2]) And S ([ y)r1,yr2]) Calculating the Hash matrix to obtain a retrieval label MR
Calculating NN; the cloud server receives the retrieval label M of the temporary rectangle RRThereafter, the data within the temporary rectangle R will be retrieved. First, the user needs to determine whether the temporary rectangle R intersects the MBR B. The user will determine whether the two intersect by excluding the four cases in fig. 6. After reaching the leaf node, the cloud server will determine whether an element belongs to a set. And returning the leaf nodes belonging to the temporary rectangle R to the user, and decrypting and calculating by the user to obtain the NN.
The effect of the present invention will be described in detail below with reference to the performance analysis.
1 complexity analysis
Compared with the existing approximate neighbor retrieval scheme, the SecDNN scheme provided by the invention has the advantages that the interaction, the retrieval efficiency and the requirements on a cloud server are improved. The specific description is shown in table 1.
The present invention will further analyze the SecDNN protocol of the present invention and Wang et al protocol from search efficiencies. For simplicity, the present invention refers to E/D as the encryption and decryption process of the IND-CPA encryption scheme and E as the encryption and decryption process of the IND-CPA encryption schemeOPE/DOPEFor the encryption and encryption process of the order-preserving encryption scheme, H is a hash function, m is the number of MBRBs in an R-tree, n is the number of elements in a data set D, D is the dimension of the elements in D, w is the bit length of data in the data set D, l1For the first round search result BdeepestNumber of data contained in (1), l2The number of leaf nodes in the temporary rectangle R is retrieved for the second round. A detailed comparison of the SecDNN protocol and the Wang et al protocol is shown in Table 2.
Table 2: comparison of the Performance of the Wang et al protocol with the SecDNN protocol
Figure BDA0001538532420000151
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. An approximate neighbor search method supporting dynamic update, the approximate neighbor search method supporting dynamic update comprising:
(1) key generation keygen (k): let DET ═ (Gen, Enc, Dec) be an IND-CPA secure symmetric encryption algorithm, the user generates the key:
sk=(sk1,sk2);
wherein sk1K is the encryption key of the DET encryption algorithm, sk2=(k1,k2,…kl) L different keys for the Hash function H; gen, Enc, Dec are respectively a system setting, data encryption, and data decryption algorithms for the encryption algorithm DET;
(2) construction of the Tree BuildTree (D): the user takes the data set D as input and builds an R-tree from the data set D, wherein the tree is:
={d1,d2,…,dn,B1,B2,…Bm};
d1,d2,…,dn"represents the two-dimensional data in the data set D, and is stored in the leaf node of the tree, since the leaf node of the tree stores the data D in the data set D1,d2,…,dnBy d1,d2,…,dnTo represent leaf nodes of the tree; b is1,B2,…,Bm"represents the non-leaf nodes of the tree, which is the smallest rectangle of its child nodes, and n" represents the number of data in the data set D, (which is equal to the number of leaf nodes of the tree, i.e., D1,d2,…,dnThe number of (2); m "represents the number of non-leaf nodes of the tree, i.e. B1,B2,…,BmThe number of (2);
(3) data encryption Enc (sk,): taking the encryption key sk and the R-tree as input, encrypting data by a user, constructing a retrieval index I in a plaintext for the R-tree, and after the retrieval index is searched, protecting the privacy of the index in consideration of the data privacy of the user;
(4) tag generation GenToken (sk, Q): for each query point Q ═ x, y, the user first calculates its range code:
[Q]=(S([0,x]),S([0,y]));
wherein x and y represent the abscissa and ordinate, respectively, of interrogation point Q;
user hash matrix for the computation set:
Figure FDA0002684096780000021
and will hash the matrix MQSubmitting to a cloud server; after prefix encoding of the interrogation point, S ([0, x)]) And S ([0, y)]) Are respectively a set, and we will respectively mark the elements therein as
Figure FDA0002684096780000022
And
Figure FDA0002684096780000023
that is:
Figure FDA0002684096780000024
Figure FDA0002684096780000025
MQboth hash matrix and retrieval tag are represented, the hash matrix being MQProperty of (1), i.e. MQIs a matrix composed of hash values as elements;
(5) user inquires about Query (M'Q,*):*Indicating that the encrypted memory tree obtained after the memory tree is encrypted is received at the retrieval tag M'QThe cloud server searches in the encrypted R-tree to find the deepest non-leaf node B containing QdeepestAnd non-leaf node BdeepestEncrypted data in the leaf node involved
Figure FDA0002684096780000026
Returning to the user, decrypting by the user, and calculating to obtain a temporary NN closest to the query point Q, wherein the NN represents the closest point; the user obtains a temporary rectangle R and labels M of the temporary rectangle RRSubmitting the temporary rectangle R to a cloud server, and searching the label M by the cloud server according to the temporary rectangle RRAnd finding out the leaf nodes contained in the NN, returning the leaf nodes to the user, and calculating and obtaining the NN by the user.
2. The approximate neighbor search method supporting dynamic update according to claim 1, wherein the inputting the encryption key sk and the R-tree, the user will encrypt the data, and the constructing the search index under the plain text for the R-tree comprises:
(1) data encryption, input encryption key sk1And a data set D, wherein the user encrypted data is as follows:
Figure FDA0002684096780000027
wherein the content of the first and second substances,
Figure FDA0002684096780000031
enc denotes the data encryption algorithm Enc, x in symmetric encryption DETi,yiRepresenting two-dimensional data diThe abscissa and ordinate of (a);
(2) a search index is built over the data interval.
3. The approximate neighbor search method supporting dynamic update according to claim 2, wherein said constructing a search index for the data interval specifically comprises:
(1) the user carries out different coding on the data in the R-tree, the submitted inquiry point and the temporary rectangle R, carries out data coding on the data in the R-tree, and carries out interval coding on the inquiry point Q and the temporary rectangle R; for data di=(xi,yi) And MBRBi=(xi1,xi2)×(yi1,yi2) The user encodes it by data encoding:
[di]=(F(xi),F(yi)),1≤i≤n;
[Bi]=(F(xi1),F(xi2))×(F(yi1)F(yi2)),1≤i≤m;
after encoding, the user computes its bloom filter for each set, noted as:
BFdi=(BFxi,BFyi),1≤i≤n;
BFdi=(BFBxi,BFByi),1≤i≤m;
wherein the content of the first and second substances,
Figure FDA0002684096780000032
and
Figure FDA0002684096780000033
the prefix-coding function is denoted by the function F (-), F (x)i) I.e. to data xiPerforms prefix coding, F (x)i) Is a set, whose elements are then inserted into a bloom filter, i.e. the set F (x) is acted upon by a bloom filter function BF (-)i) To obtain BF (F (x)i) Abbreviated as
Figure FDA0002684096780000034
FyiI.e. to data yiThe prefix code is carried out, and the prefix code,
(2) for different data x and y, which may have the same elements in their corresponding data code sets f (x) and f (y), a random element r is introduced for each bloom filter, as follows:
H(r,H(k1,P)),H(r,H(k2,P)),…,H(r,H(kl,P))
calculating the value of the bloom filter;
the encrypted data set and the search index are:
Figure FDA0002684096780000035
the finger-substitute bloom filter function acts on diThe result obtained by encoding the sets of abscissa and ordinate, i.e.
Figure FDA0002684096780000041
4. The approximate neighbor search method supporting dynamic update as claimed in claim 1, wherein the specific algorithm for calculating and obtaining NN is:
(1) finding the deepest non-leaf node BdeepestThereafter, the user will BdeepestThe contained data is returned to the user;
(2) determining a circumscribed rectangle R ═ x of the circle Or1,xr2)×(yr1,yr2) The temporary rectangle will be encoded as [ R ]]=S([xr1,xr2])×S([yr1,yr2]) For S ([ x ]r1,xr2]) And S ([ y)r1,yr2]) Calculating the Hash matrix to obtain a retrieval label MR(ii) a The circumscribed rectangle refers to the property of the rectangle R, namely the rectangle R is the circumscribed rectangle of the circle O, and the end point of the rectangle R is calculated
Figure FDA0002684096780000042
Wherein
Figure FDA0002684096780000043
Is the coordinate of the lower left point of the rectangle R,
Figure FDA0002684096780000044
is the coordinate of the upper right point of the rectangle R;
(3) the user judges whether the temporary rectangle R and the MBR B are intersected or not, the user judges whether the temporary rectangle R and the MBR B are intersected or not by excluding four conditions, after the leaf nodes are reached, the cloud server judges whether an element belongs to a set or not, the leaf nodes belonging to the temporary rectangle R are returned to the user, and the user decrypts and calculates to obtain the NN; mentioning that the non-leaf node B is the minimum rectangle MBR covering its child nodes), the rectangle B is called MBRB.
5. The approximate neighbor search system supporting dynamic update of the approximate neighbor search method according to claim 1, wherein the approximate neighbor search system supporting dynamic update comprises:
a client for making an inquiry request;
and the cloud server searches in the external database according to the search requirement provided by the user.
6. The approximate neighbor retrieval system that supports dynamic updating of claim 5, wherein the cloud server comprises:
the system comprises a semi-honest cloud server, a database and a database, wherein the semi-honest cloud server is used for executing a user protocol and providing a correct retrieval result for a user;
a malicious cloud server that will spoof the user in any indistinguishable way.
CN201810005649.0A 2018-01-03 2018-01-03 Approximate neighbor search construction method supporting dynamic update Active CN108337085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810005649.0A CN108337085B (en) 2018-01-03 2018-01-03 Approximate neighbor search construction method supporting dynamic update

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810005649.0A CN108337085B (en) 2018-01-03 2018-01-03 Approximate neighbor search construction method supporting dynamic update

Publications (2)

Publication Number Publication Date
CN108337085A CN108337085A (en) 2018-07-27
CN108337085B true CN108337085B (en) 2020-11-13

Family

ID=62923717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810005649.0A Active CN108337085B (en) 2018-01-03 2018-01-03 Approximate neighbor search construction method supporting dynamic update

Country Status (1)

Country Link
CN (1) CN108337085B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069592A (en) * 2019-04-24 2019-07-30 上海交通大学 The searching method that spatial key applied to electronic map is inquired
CN110059148A (en) * 2019-04-24 2019-07-26 上海交通大学 The accurate searching method that spatial key applied to electronic map is inquired
CN113111090B (en) * 2021-04-15 2023-01-06 西安电子科技大学 Multidimensional data query method based on order-preserving encryption

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930051A (en) * 2012-11-16 2013-02-13 上海交通大学 Safe nearest neighbor search method and system based on isometric partition and random filling
CN102945281A (en) * 2012-11-16 2013-02-27 上海交通大学 Security nearest neighbor querying method and system based on maximum data block division
CN105721485A (en) * 2016-03-04 2016-06-29 安徽大学 Secure nearest neighbor query method oriented to plurality of data owners in outsourcing cloud environment
CN106686010A (en) * 2017-03-08 2017-05-17 河南理工大学 Multi-mechanism attribute-based encryption method supporting strategy dynamic updating
CN107103031A (en) * 2017-03-21 2017-08-29 东莞理工学院 A kind of safe nearest _neighbor retrieval method in cloud computing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930051A (en) * 2012-11-16 2013-02-13 上海交通大学 Safe nearest neighbor search method and system based on isometric partition and random filling
CN102945281A (en) * 2012-11-16 2013-02-27 上海交通大学 Security nearest neighbor querying method and system based on maximum data block division
CN105721485A (en) * 2016-03-04 2016-06-29 安徽大学 Secure nearest neighbor query method oriented to plurality of data owners in outsourcing cloud environment
CN106686010A (en) * 2017-03-08 2017-05-17 河南理工大学 Multi-mechanism attribute-based encryption method supporting strategy dynamic updating
CN107103031A (en) * 2017-03-21 2017-08-29 东莞理工学院 A kind of safe nearest _neighbor retrieval method in cloud computing

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Practical and secure nearest neighbor search on encrypted large-scale data";Boyang Wang等;《IEEE》;20160728;第1-9页 *
"Secure Approximate Nearest Neighbor Search over Encrypted Data";Yaqian Gao等;《IEEE》;20150122;第578-583页 *
"secure k nearest neighbor query over encrypted data in outsourced environments";Wenzhuo Xue等;《IEEE》;20170817;第4卷(第4期);第586-599页 *
"可搜索加密研究进展";董晓蕾等;《计算机研究与发展》;20170926;第54卷(第10期);第2107-2120页 *

Also Published As

Publication number Publication date
CN108337085A (en) 2018-07-27

Similar Documents

Publication Publication Date Title
Lei et al. SecEQP: A secure and efficient scheme for SkNN query problem over encrypted geodata on cloud
WO2022099495A1 (en) Ciphertext search method, system, and device in cloud computing environment
WO2017151602A1 (en) Efficient encrypted data management system and method
CN108337085B (en) Approximate neighbor search construction method supporting dynamic update
EP3342090A1 (en) Method for providing encrypted data in a database and method for searching on encrypted data
Wang et al. Forward/backward and content private DSSE for spatial keyword queries
Lei et al. Fast and secure knn query processing in cloud computing
Wang et al. An efficient and privacy-preserving range query over encrypted cloud data
Ge et al. Privacy-preserving graph matching query supporting quick subgraph extraction
Wang et al. Fast and secure location-based services in smart cities on outsourced data
CN113836571B (en) Medical data possession terminal position matching method and system based on cloud and blockchain
CN113434739B (en) Forward-safe multi-user dynamic symmetric encryption retrieval method in cloud environment
Wang et al. QuickN: Practical and secure nearest neighbor search on encrypted large-scale data
CN106874379B (en) Ciphertext cloud storage-oriented multi-dimensional interval retrieval method and system
Zhou et al. PPT-LBS: Privacy-preserving top-k query scheme for outsourced data of location-based services
CN116107967B (en) Multi-keyword ciphertext searching method and system based on homomorphic encryption and tree structure
Wang et al. Secure and efficient similarity retrieval in cloud computing based on homomorphic encryption
Xu et al. Dynamic chameleon authentication tree for verifiable data streaming in 5G networks
CN114416720B (en) Efficient, flexible and verifiable multi-attribute range retrieval method and system in cloud environment
Gao et al. Secure approximate nearest neighbor search over encrypted data
Wang et al. Secure string pattern query for open data initiative
Tang et al. An effective encrypted scheme over outsourcing data for query on cloud platform
Kamble et al. A study on fuzzy keywords search techniques and incorporating certificateless cryptography
Zheng et al. An efficient and privacy-preserving $ k $-nn query scheme for ehealthcare data
Tian et al. A Privacy-Preserving Hybrid Range Search Scheme Over Encrypted Electronic Medical Data in IoT Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant