CN109446436A - The safe range querying method and system towards multidimensional data based on LSH - Google Patents

The safe range querying method and system towards multidimensional data based on LSH Download PDF

Info

Publication number
CN109446436A
CN109446436A CN201811095417.5A CN201811095417A CN109446436A CN 109446436 A CN109446436 A CN 109446436A CN 201811095417 A CN201811095417 A CN 201811095417A CN 109446436 A CN109446436 A CN 109446436A
Authority
CN
China
Prior art keywords
data collection
inquiry
data set
obtains
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811095417.5A
Other languages
Chinese (zh)
Other versions
CN109446436B (en
Inventor
彭延国
王龙
崔江涛
吕桢
吴瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201811095417.5A priority Critical patent/CN109446436B/en
Publication of CN109446436A publication Critical patent/CN109446436A/en
Application granted granted Critical
Publication of CN109446436B publication Critical patent/CN109446436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of safe range querying method towards multidimensional data based on LSH, comprising: raw position data collection is pre-processed, preset data collection is obtained;It obtains input to instruct and handled, obtains input and instruct corresponding inquiry instruction;By the inquiry instruction and the preset data collection, inquiry data set is obtained;The inquiry data set is handled, query result is obtained.The present embodiment is by carrying out homalographic processing to raw data set or carries out Vector Processing, using code key triple, to treated, data set is encrypted simultaneously, and the method for using multiple range query obtaining its intersection, so that query result is more accurate, and the privacy compromise of user is effectively prevented, it guarantees data security.

Description

The safe range querying method and system towards multidimensional data based on LSH
Technical field
The invention belongs to data processing fields, and in particular to a kind of safe range towards multidimensional data based on LSH is looked into Ask method and system.
Background technique
Position sensing Hash (Locality Sensitive Hashing, abbreviation LSH) is as approximate similarity query mesh Preceding most effective technology obtains extensive research and application.Therefore, it is widely used in many scenes, such as based on interior Image retrieval, audio retrieval, video copy detection and the DNA sequence dna similitude of appearance are than equity.LSH is a kind of probabilistic method, Using filtering-verifying frame (Filter-and-Refine Framework).In filtration stage, LSH utilizes Hash technology handle Non- data object that is similar, can not becoming result filters out, and the data object after filtering is as Candidate Set (Candidate Set), so that similar data object is deposited in candidate collection with very high probability, and then reality is carried out in candidate collection Distance or similarity measurement calculate.Due to being largely filtered in the non-similar data object of filtration stage, Candidate Set The size of conjunction is much smaller than raw data set, thus highly shortened inquiry and calculate the time, improves efficiency.
Currently, the fusion of internet and Internet of Things is the popular tendency of academia and industry.A large amount of geodata (GPS Target information, the point of interest etc. in city) collected between internet and Internet of Things incessantly, it transmits, storage and benefit With.Flourishing for geodata constitutes severe challenge to private data center, and them is promoted to subcontract a large amount of geodatas To public cloud platform, while reducing investment and maintenance cost.However, uncertain network intrusions and per hour harmful network Attack has seriously constrained current range query method, meanwhile, the safety of the privacy and data that make user is by serious Damage.
Summary of the invention
In order to solve the above-mentioned problems in the prior art, the present invention provides a kind of based on LSH towards multidimensional number According to safe range querying method and system.The technical problem to be solved in the present invention is achieved through the following technical solutions:
The embodiment of the invention provides a kind of safe range querying method towards multidimensional data based on LSH, comprising:
Raw position data collection is pre-processed, preset data collection is obtained;
It obtains input to instruct and handled, obtains input and instruct corresponding inquiry instruction;
By the inquiry instruction and the preset data collection, inquiry data set is obtained;
The inquiry data set is handled, query result is obtained.
In one embodiment of the invention, raw position data collection is pre-processed, obtains preset data collection, wrapped It includes:
Obtain the peak coordinate that the raw position data is concentrated;
The raw position data collection is handled by the peak coordinate, obtains preprocessed data collection;
The preprocessed data collection is encrypted, the preset data collection is obtained.
In one embodiment of the invention, the raw position data collection is handled by the peak coordinate, Obtain preprocessed data collection, comprising:
The raw position data collection is divided by with the peak coordinate, obtains unit rectangles data set;
The unit rectangles data set is subjected to equal area partition and is rotated, homalographic data set is acquired;Wherein, often The corresponding hashed value in one homalographic region;
Merge algorithm by greed to handle the homalographic data set, obtains the first merging data collection;
The first merging data collection is handled by the false point of addition, obtains preprocessed data collection.
In one embodiment of the invention, the raw position data collection is handled by the peak coordinate, Obtain preprocessed data collection, further includes:
Obtain reference vector;
The raw position data collection is handled by the peak coordinate value and the reference vector, described in acquisition The corresponding hashed value data set of raw position data collection;
The hashed value data set is merged, the second merging data collection;
The second merging data collection is handled by the false point of addition, obtains preprocessed data collection.
In one embodiment of the invention, it obtains input to instruct and handled, obtains input and instruct corresponding inquiry Instruction, comprising:
Obtain the marginal range of the input instruction;
Inquiry coordinate points are obtained according to the marginal range, the inquiry coordinate points are two;Wherein,
If the marginal range is rectangle, the inquiry coordinate points are the corresponding seat in the rectangle lower left corner and the upper right corner Punctuate;
If the marginal range is circle, the inquiry coordinate points are the round left side and the corresponding coordinate in the right Point.
In one embodiment of the invention, by the inquiry instruction and the preset data collection, inquiry data are obtained Collection, comprising:
It obtains the inquiry coordinate points and concentrates corresponding two hashed values in the preset data;
Multiple regions data set is obtained by described two hashed values;Wherein, the area data collection is described default In data set between two hashed values region data set,
Data deduplication processing is carried out to each area data collection, obtains inquiry data set.
In one embodiment of the invention, by the inquiry instruction and the preset data collection, inquiry number is obtained After collection, further includes:
Data update is carried out to the preset data collection.
In one embodiment of the invention, the inquiry data set is handled, obtains query result, comprising:
Obtain the intersection of two area data collections in the inquiry data set;
Two area data collections are replaced using the intersection;
Traversal queries data set repeats aforesaid operations, obtains query result.
In one embodiment of the invention, pre-processed to raw position data collection, obtain preset data collection it Afterwards, further includes:
The preset data collection is added or is deleted.
Another embodiment of the invention additionally provides a kind of safe range inquiry system towards multidimensional data based on LSH System, comprising: input equipment and query facility;Wherein,
The input equipment is for acquiring input instruction, and by the input instruction input to the query facility;
The query facility includes input command process module, raw position data collection processing module, encrypting module, inquiry Module and display module;Wherein,
It is corresponding to obtain the input instruction for obtaining and handling the input instruction for the input command process module Inquiry instruction;
Real-time raw position data collection processing module obtains preprocessed data for handling raw position data collection Collect and is stored;
The encrypting module obtains preset data collection for encrypting to the preprocessed data collection;
The enquiry module is used to be inquired by the inquiry instruction in preset data concentration and obtains inquiry As a result;
The raw position data collection processing module is also used to that the preset data collection is added or is deleted;
The display module shows the query result for exporting.
Compared with prior art, beneficial effects of the present invention:
The embodiment of the present invention is by carrying out homalographic processing to raw data set or carries out Vector Processing, while utilizing secret To treated, data set encrypts key triple, and the method for obtaining its intersection using multiple range query, so that inquiry As a result more accurate, and the privacy compromise of user is effectively prevented, it guarantees data security.
Detailed description of the invention
Fig. 1 is that a kind of process of the safe range querying method towards multidimensional data based on LSH provided by the invention is shown It is intended to;
A kind of position Fig. 2 structure of the safe range inquiry system towards multidimensional data based on LSH provided by the invention is shown It is intended to.
Specific embodiment
Further detailed description is done to the present invention combined with specific embodiments below, but embodiments of the present invention are not limited to This.
Embodiment one
Referring to Figure 1, Fig. 1 is a kind of safe range issuer towards multidimensional data based on LSH provided by the invention The flow diagram of method;A kind of position Fig. 2 safe range inquiry system towards multidimensional data based on LSH provided by the invention Structural schematic diagram.
As shown in Figure 1, a kind of safe range querying method towards multidimensional data based on LSH, comprising:
Raw position data collection is pre-processed, preset data collection is obtained;
It obtains input to instruct and handled, obtains input and instruct corresponding inquiry instruction;
By the inquiry instruction and the preset data collection, inquiry data set is obtained;
The inquiry data set is handled, query result is obtained.
Further, raw position data collection is pre-processed, obtains preset data collection, including
Obtain the peak coordinate that the raw position data is concentrated;
Specifically, raw position data collection is existing published position data collection.Raw position data is concentrated Multiple location point information datas, location point information data include the cross of the ID and the location point of the location point in geography information Ordinate.
The peak coordinate that raw position data is concentrated is obtained, specifically: traversal raw position data collection divides and finds out the data The maximum value of transverse and longitudinal coordinate is concentrated, as peak coordinate, wherein the transverse and longitudinal coordinate of peak coordinate is all the maximum value, and will It is denoted as maxCoornidate.
The raw position data collection is handled by the peak coordinate, obtains preprocessed data collection;
Further, the raw position data collection is handled by the peak coordinate, obtains preprocessed data Collection, can be with are as follows:
The raw position data collection is divided by with the peak coordinate, obtains unit rectangles data set;
Specifically, by raw position data concentrate all location point information datas transverse and longitudinal coordinate respectively divided by The transverse and longitudinal coordinate of maxCoornidate obtains the corresponding new coordinate of each location point information data, the institute for concentrating initial data There is being distributed in the rectangular extent of [(0,0), (0,1), (1,0), (1,1)] four coordinate points composition for location point information data, To obtain unit rectangles data set.
The unit rectangles data set is subjected to equal area partition and is rotated, homalographic data set is acquired;Wherein, often The corresponding hashed value in one homalographic region;
Specifically, it is determined that a parameter L, being evenly dividing the unit rectangles data set according to the parameter L is 2LIt is a Homalographic region, wherein each homalographic region is referred to as Bucket (bucket).Then, it is determined that a rotation angle [alpha], passes through Rotation angle [alpha] can determine the angle for rotating unit rectangles data set, wherein the value of α has with the number of revolutions to be carried out It closes.Assuming that carry out n times range rotation altogether, and it is currently the secondary rotation of m (m is less than or equal to n), then (π * (m-1))/ (2*n)≤α≤(π * m)/(2*n), rotation angle [alpha] are obtained by random number functions.Preferably, random function is exactly some programmings The pseudo-random function carried in language.
Specifically, the ID for generating each Bucket, is denoted as BucketID, BucketID be generated by random number, and And sequentially generated according to sequence from left to right, wherein the corresponding hashed value of n-th of Bucket is n-1, there is 2LA Bucket, Then the range of hashed value is [0,2L-1].By calculating raw position data collection regional scope corresponding to each Bucket, Until whole region division finishes.Then raw position data collection is traversed, the position in some homalographic region will be fallen into Point information data is included into corresponding Bucket, and these location points corresponding hashed value under present rotation angel degree is mapped to this Hashed value corresponding to Bucket, to obtain homalographic data set.
Merge algorithm by greed to handle the homalographic data set, obtains the first merging data collection.
Specifically, after obtaining homalographic data set, the homalographic data set is traversed, is found out comprising most location point information The Bucket of data, and the quantity for the location point information data that the Bucket includes is set to maxnodesOfBuckets.Then According to maxnodesOfBuckets, start to carry out greedy merging, since hashed value is 0 Bucket, calculate continuous The summation for the location point information data that Bucket is included, when the location point information data that one section of continuous Bucket is included Quantity levels off to but is no more than maxnodesOfBuckets, and next Bucket is added, and location point quantity is just more than When maxnodesOfBuckets, it is an average bucket by this continuous Bucket merger, is denoted as EvenBucket;Similarly, will The merging of remaining Bucket is completed until all Bucket merge, obtains the set of EvenBucket.Then in the set Each EvenBucket is identified by random function EvenBucketID.And it is each EvenBucket is (average Bucket) included in whole Bucket location point information data be included into EvenBucket, while will be dissipated corresponding to Bucket Train value and EvenBucket form mapping relations, i.e. hash value set corresponding to each EvenBucket is exactly that it is wrapped The set of the hashed value of the Bucket contained, the set are the first merging data collection.
The first merging data collection is handled by the false point of addition, obtains preprocessed data collection.
Specifically, after obtaining the set of EvenBucket, it is traversed, if wrapped in current EvenBucket The location point contained is less than maxnodesOfBuckets, that just adds the false point of respective numbers, so that including in EvenBucket The sum of location point information data and the quantity of vacation point are equal to max nodesOfBuckets, wherein adding the false mode put is Generate include invalid information location point, invalid information here can be with self-defining, by defining location point in the present embodiment What ID number was negative is exactly false point, to obtain preprocessed data collection.
Further, the raw position data collection is handled by the peak coordinate, obtains preprocessed data Collection, can be with are as follows:
Obtain reference vector;
The raw position data collection is handled by the peak coordinate value and the reference vector, described in acquisition The corresponding hashed value data set of raw position data collection;
The hashed value data set is merged, the second merging data collection;
The second merging data collection is handled by the false point of addition, obtains preprocessed data collection.
Specifically, (vector.x, vector.y) is directed toward by (0,0) and creates reference vector, and the determination region to be divided Number, wherein the value of vector.x and vector.y is by generating random number.If to carry out j underrange inquiry altogether, that In kth (1≤k≤j) secondary inquiry, between the vector and x-axis of generation angulation meet [π/(2*j) * (k-1), π/ (2*j) * k] between, then vector.x and vector.y is the random number generated in the region.By formula h (Node)= (Node.x*vector.x+Node.y*vector.y)/w obtains the value of w, wherein Node.x and Node.y is respectively peak It is worth coordinate maxCoornidate, h (Node) is maximum hashed value, wherein each hashed value is referred to as Bucket (bucket), and upper Method is stated just as generating the ID of each Bucket, be denoted as BucketID.The value of hashed value is the number in the region to be divided Subtract one (such as to divide 1024 regions, then h (Node) value is 1023).After obtaining w value, using the result acquired as working as The w value of preceding reference vector saves.
Specifically, after obtaining w value, pass through formula h (Node)=(Node.x*vector.x+Node.y* Vector.y)/w and raw position data concentrate the corresponding hash of the available each position point information data of coordinate of each position point Raw position data is concentrated the coordinate of each position point to replace (Node.x, Node.y), then the available coordinate points by value Hashed value.Raw position data collection is traversed, hashed value corresponding to each location point is obtained, by the location point of same Hash value Be classified as same area, wherein the range of hashed value determined by the region to be divided (such as: to divide 1024 regions, cryptographic Hash Range be exactly 0~1023), to obtain equivalent data set,
Specifically, after obtaining equivalent data set, greed is carried out to the data set and merges algorithm process, obtains the second conjunction And data set.Wherein, greed merging algorithm process specific steps are identical as above-mentioned greed merging algorithm process method, herein no longer It repeats.
Specifically, after obtaining the second merging data collection, false point is added to the EvenBucket merged in data set, is obtained To preprocessed data collection.Wherein, the false point specific steps of addition are identical as above-mentioned addition vacation point methods, and details are not described herein again.
Specifically, the present embodiment generates the code key triple for encryption, is denoted as each EvenBucket: KeyGroup.Then each EvenBucket has a corresponding KeyGroup;Wherein, include in KeyGroup Tri- code keys of secretkey, usingkey and newkey, the length of code key can be with self-definings, code key length in the present embodiment It is unified for 120 binary systems.
The present invention provides a hash function H, its input is the character string of random length, exports two for 160 System character string.First location point information data is encrypted, present embodiments provides a kind of data structure, is used to storage location Information after point information data encryption, wherein the data structure is known as SafeNode.Data structure SafeNode contains position ID number and coordinate value after point information data encryption, the ID number pass through formula label(ID)=H (secretkey) ⊕ ID is obtained, Wherein, label(ID)For ID number, ⊕ indicates that xor operation, H () are hash function.Coordinate value passes through datai=pi⊕H (EvenBucketID, secretkey), wherein dataiFor coordinate value, piFor the coordinate of location point.Then right EvenBucketID is encrypted, by formula label(B)=H (EvenBucketID, secretkey) is obtained.
Encrypted EvenBucketID is encrypted into preceding corresponding hashed value with it and forms new mapping relations, will be encrypted The set that EvenBucketID afterwards encrypts the SafeNode generated after the preceding location point for being included is encrypted with it is formed newly Mapping relations.
Further,
It obtains input to instruct and handled, obtains input and instruct corresponding inquiry instruction, comprising:
Obtain the marginal range of the input instruction;
Inquiry coordinate points are obtained according to the marginal range, the inquiry coordinate points are two;Wherein,
If the marginal range is rectangle, the inquiry coordinate points are the corresponding seat in the rectangle lower left corner and the upper right corner Punctuate;
If the marginal range is circle, the inquiry coordinate points are the round left side and the corresponding coordinate in the right Point.
Further,
By the inquiry instruction and the preset data collection, inquiry data set is obtained, comprising:
It obtains the inquiry coordinate points and concentrates corresponding two hashed values in the preset data;
Multiple regions data set is obtained by described two hashed values;Wherein, the area data collection is described default In data set between two hashed values region data set,
Data deduplication processing is carried out to each area data collection, obtains inquiry data set.
Specifically, determining that the range to be inquired of inquiry instruction is rectangle or circle, if it is rectangle, input instruction is two A coordinate points, and the two coordinate points respectively correspond the lower-left Angle Position and upper right Angle Position of rectangle;If round, input refers to Enabling is a location point and a length, respectively indicates the position in the center of circle and the length of inquiry radius.
Preferably, if query context is rectangle, two coordinate points are handled.It is corresponding to calculate two coordinate points Then Bucket finds out the corresponding hashed value of coordinate points according to Bucket.Corresponding two hashed values of two coordinate points, the two Hashed value is the range of the inquiry minimum hashed value corresponding under current rotating vector and maximum hashed value;
Preferably, if query context is circle, the center of circle and radius are calculated.Due to having determined current rotation Vector then calculates the coordinate for two points that current rotating vector descended the straight line in the center of circle intersected with circle, i.e., round row query context Marginal point under current rotating vector.Then the corresponding Bucket of the two points is calculated, corresponding two hashed values are found out, The two hashed values are exactly the circle row query context minimum hashed value corresponding under current rotating vector and maximum hashed value.
Specifically, maximum hashed value is obtained with after minimum hashed value in preset data concentration in acquisition example query context Take the query region between minimum hashed value and maximum hashed value, obtain including all hashed values, obtain area data Collection;And according to above-mentioned hashed value and the mapping relations of encrypted EvenBucketID, encrypted EvenBucketID and add The mapping relations of close SafeNode set.The set of encrypted EvenBucketID corresponding to all hashed values is obtained, Then the set is traversed, according to mapping relations, finds the set of all corresponding SafeNode;Then, to obtaining The set of SafeNode carries out duplicate removal, deletes wherein duplicate encrypted location point.
Repeat above-mentioned query process, multiple range query carried out to preset data collection, obtains multiple regions data set, then this The intersection of a little area data collections is inquiry data set.
Further, it is also being wrapped after obtaining inquiry data set by the inquiry instruction and the preset data collection It includes:
Data update is carried out to preset data collection.
Specifically, data update is carried out to preset data collection, is for carrying out data maintenance and management, whenever query process In be related to EvenBucket when, all can to preset data collection carry out data set carry out data update, specific operating procedure It is as follows:
Step 11: the data set in each EvenBucket is divided into two parts, and one is encrypted with secretkey Preset data collection, the other is the addition data set encrypted with usingkey, wherein addition data set, it is meant that can support The addition of subsequent new data operates, but new data will not directly encrypt the data set for being added to script after being added, but Be charged first to addition data set, wait addition data set reach certain data scale after, or inquiry when find this When EvenBucket, it is then added to original data set, then addition data set is emptied.The operation of more new data set will exactly be preset Data set and addition data set are combined together, and are encrypted, are then stored in new according to concentration with a new code key newkey.Its In, newkey is generated by random function, and length is 120.
Step 12: after completing inquiry operation, data update being carried out to the EvenBucket being queried.By each The mark label of EvenBucketBPass through formula labelB=H (EveBucketID, newkey) is generated, as adding for new bucket Then close ID is updated operation to current preset data set and addition data set respectively.
Step 13: for current preset data set, to each SafeNode that data are concentrated, with formula label(ID)’= The encryption ID in mark and SafeNode that H (secretkey) ⊕ H (newKey) is generated carries out xor operation, and by its exclusive or The result of the operation encryption ID new as current SafeNode;With formula label(P)’=H (EvenBucketID, Secretkey) the encryption coordinate in the mark and SafeNode that ⊕ H (EvenBuckeID, newKey) is generated carries out exclusive or behaviour Make, and using the result of its xor operation as the new encryption coordinate of current SafeNode.
Step 14: for adding data set, to each SafeNode that data are concentrated, with formula label(ID)’=H (usingkey) the encryption ID in the mark and SafeNode that ⊕ H (newKey) is generated carries out xor operation, and its exclusive or is grasped The result of the work encryption ID new as current SafeNode;With formula label(P)’=H (, EvenBucketID, usingkey) The encryption coordinate in mark and SafeNode that ⊕ H (EvenBucketID, newKey) is generated carries out xor operation, and its is different Or new encryption coordinate of the result of operation as current SafeNode.
Step 15: the data set after step 13 and step 14 operation being merged, new preset data collection is formed and deposits It stores up and replaces current preset data set, then update the code key group of EvenBucket, secretkey is replaced by newkey, and raw The usingkey and newkey of Cheng Xin replaces original;
Step 16: mapping relations before are updated.
Further, the inquiry data set is handled, obtains query result, comprising:
Obtain the intersection of area data collection described in any two in the inquiry data set;
Utilize the intersection replacement even area data collection;
Traversal queries data set repeats aforesaid operations, obtains query result.
Specifically, the inquiry data set is handled, obtains query result, comprising the following steps:
Step 21: generating a global code key globalkey, code key length is consistent with above-mentioned code key group length, is 120 Binary system;
Step 22: the corresponding area data collection of range query each time being operated, by encrypted location in the data set The encryption code key of the corresponding EvenBucketID of point and corresponding region is handled, and formula labelr is passed through(ID)=H (secretkey) ⊕ H (globalkey) generates labelr(ID), pass through formula labelr(P)=H (EvenBucketID, Secretey) ⊕ H (globalkey) generates labelr(P), and concentrated respectively with the area data of range query each time The label of SafeNodeIDAnd dataiCarry out xor operation.
Step 23: the whole region data set handled is taken into intersection.Firstly, obtaining the number of regions of first time range query According to collection, then it is compared with the area data collection of second of range query, the number that will repeat in two group data sets According to new data set is withdrawn as, then new data set is compared with the area data collection of third time range query, And its intersection is taken, traversal queries data set repeats aforesaid operations, obtains final intersection, then final intersection is query result.
Step 24: encrypted query result being decrypted with labelr, obtains final query result.
Further, it is pre-processed to raw position data collection, after obtaining preset data collection, further includes:
The preset data collection is added or is deleted.
Specifically, the preset data collection is added specifically: traverse new data set to be added first, for time The current location point passed through is gone through, calculates its corresponding EvenBucketID under current rotating vector, and find this Addition data set under EvenBucket, if having added data set as sky, newly-built new sky has added data set, for new Data addition;
Current location point to be added is added with usingkey according to the cipher mode of the present embodiment EvenBucket It is close, new SafeNode is obtained, corresponding addition data set is added to;
Then the addition of next new location point is carried out, until whole location point additions finish;Wherein, in order to keep every Include the location point of the same quantity in one EvenBucket, add false point in addition data set, add the mode of false point with The mode of the false point of preset data collection addition is identical, and details are not described herein again;
Finally, updating mapping relations, data acquisition system corresponding to EvenBucketID is updated, obtains new preset Data set.
Specifically, the preset data collection is deleted specifically:
Step 31: traverse all EvenBucket, by formula labell (ID)=H (secretKey) ⊕ ID and Label2 (ID)=H (nsingKey) ⊕ ID generates the encrypted ID of the location point to be deleted, and respectively in current preset number The point is searched according to collecting and adding in data set, if current EvenBucket is not found, traversal continues, in next EvenBucket It finds;If finding, 32 are gone to step;
Step 32: finding the encrypted location to be deleted point SafeNode in EvenBucket, create a false point (fakenode), it and is generated by formula label (fake)=SafeNode ⊕ fakenode and deletes label, wherein label It (fake) is deletion label.Then xor operation is carried out to the location point to be deleted, becomes the information for the location point that must be deleted At the information of vacation point.Similarly, all location points to be deleted are subjected to aforesaid operations, until the position point deletion all to be deleted It completes.
In fact, the delete operation of the present embodiment is complete by putting the operation that the location point that will be deleted is replaced with vacation At, make to be originally intended to the location information failure of justice, delete operation is reached with this.
As shown in Fig. 2, the embodiment of the invention also provides a kind of, the safe range towards multidimensional data based on LSH is inquired System, comprising: input equipment and query facility;
The input equipment is for acquiring input instruction, and by the input instruction input to the query facility;
The query facility is used to handle raw position data collection and input instruction, obtains query result.
Specifically, input equipment can be common peripheral hardware input equipment, such as keyboard, microphone, the query facility packet Include input command process module, raw position data collection processing module, encrypting module, enquiry module and display module;Wherein,
Input command process module is instructed for obtaining and handling input, is obtained input and is instructed corresponding inquiry instruction, Processing method instructs the acquisition method of inquiry instruction identical with the proposition of above-described embodiment by input, and details are not described herein again.
Raw position data collection processing module obtains preprocessed data collection simultaneously for handling raw position data collection Stored, processing method and above-described embodiment propose by being pre-processed to raw position data collection, obtain pre- place The processing method for managing data set is identical, and details are not described herein again.
Encrypting module obtains preset data collection for encrypting to preprocessed data collection, processing method with it is above-mentioned right The method that preprocessed data collection is encrypted is identical, and details are not described herein again.
Enquiry module is used to be inquired and be obtained query result in preset data concentration by inquiry instruction, and acquisition is looked into The method of result is ask with above by inquiry instruction and preset data collection, obtains inquiry data set, and by inquiry data set It is handled, the method for obtaining query result is identical, and details are not described herein again.
Raw position data collection processing module is also used to that preset data collection is added or is deleted, and adds or deletes new The method of data set is identical as the above-mentioned method for adding or deleting new data set, and details are not described herein again.
Display module is for exporting display query result.
The present embodiment is by carrying out homalographic processing to raw data set or carries out Vector Processing, while utilizing code key three To treated, data set encrypts tuple, and the method for obtaining its intersection using multiple range query, so that query result It is more accurate, and the privacy compromise of user is effectively prevented, it guarantees data security.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims (10)

1. a kind of safe range querying method towards multidimensional data based on LSH characterized by comprising
Raw position data collection is pre-processed, preset data collection is obtained;
It obtains input to instruct and handled, obtains input and instruct corresponding inquiry instruction;
By the inquiry instruction and the preset data collection, inquiry data set is obtained;
The inquiry data set is handled, query result is obtained.
2. safe range querying method according to claim 1, which is characterized in that located in advance to raw position data collection Reason, obtains preset data collection, comprising:
Obtain the peak coordinate that the raw position data is concentrated;
The raw position data collection is handled by the peak coordinate, obtains preprocessed data collection;
The preprocessed data collection is encrypted, the preset data collection is obtained.
3. safe range querying method according to claim 2, which is characterized in that by the peak coordinate to the original Beginning position data collection is handled, and preprocessed data collection is obtained, comprising:
The raw position data collection is divided by with the peak coordinate, obtains unit rectangles data set;
The unit rectangles data set is subjected to equal area partition and is rotated, homalographic data set is acquired;Wherein, each Homalographic region corresponds to a hashed value;
Merge algorithm by greed to handle the homalographic data set, obtains the first merging data collection;
The first merging data collection is handled by the false point of addition, obtains preprocessed data collection.
4. safe range querying method according to claim 2, which is characterized in that by the peak coordinate to the original Beginning position data collection is handled, and preprocessed data collection is obtained, further includes:
Obtain reference vector;
The raw position data collection is handled by the peak coordinate value and the reference vector, is obtained described original The corresponding hashed value data set of position data collection;
The hashed value data set is merged, the second merging data collection;
The second merging data collection is handled by the false point of addition, obtains preprocessed data collection.
5. safe range querying method according to claim 1, which is characterized in that it obtains input and instructs and handled, It obtains input and instructs corresponding inquiry instruction, comprising:
Obtain the marginal range of the input instruction;
Inquiry coordinate points are obtained according to the marginal range, the inquiry coordinate points are two;Wherein,
If the marginal range is rectangle, the inquiry coordinate points are the corresponding coordinate in the rectangle lower left corner and the upper right corner Point;
If the marginal range is circle, the inquiry coordinate points are the round left side and the corresponding coordinate points in the right.
6. safe range querying method according to claim 5, which is characterized in that by the inquiry instruction and described pre- If data set, inquiry data set is obtained, comprising:
It obtains the inquiry coordinate points and concentrates corresponding two hashed values in the preset data;
Multiple regions data set is obtained by described two hashed values;Wherein, the area data collection is in the preset data The data set in region between two hashed values is concentrated,
Data deduplication processing is carried out to each area data collection, obtains inquiry data set.
7. safe range querying method according to claim 6, which is characterized in that passing through the inquiry instruction and described Preset data collection is obtained and is inquired after data set, further includes:
Data update is carried out to the preset data collection.
8. safe range querying method according to claim 6, which is characterized in that at the inquiry data set Reason, obtains query result, comprising:
Obtain the intersection of area data collection described in any two in the inquiry data set;
Two area data collections are replaced using the intersection;
Traversal queries data set repeats aforesaid operations, and until obtaining final intersection, then the final intersection is inquiry knot Fruit.
9. safe range querying method according to claim 1, which is characterized in that carried out in advance to raw position data collection Processing, after obtaining preset data collection, further includes:
The preset data collection is added or is deleted.
10. a kind of safe range inquiry system towards multidimensional data based on LSH characterized by comprising input equipment and Query facility;Wherein,
The input equipment is for acquiring input instruction, and by the input instruction input to the query facility;
The query facility includes input command process module, raw position data collection processing module, encrypting module, inquiry mould Block and display module;Wherein,
The input command process module obtains the input and instructs corresponding inquiry for obtaining and handling the input instruction Instruction;
Real-time raw position data collection processing module obtains preprocessed data collection simultaneously for handling raw position data collection It is stored;
The encrypting module obtains preset data collection for encrypting to the preprocessed data collection;
The enquiry module is used to be inquired by the inquiry instruction in preset data concentration and obtains query result;
The raw position data collection processing module is also used to that the preset data collection is added or is deleted;
The display module shows the query result for exporting.
CN201811095417.5A 2018-09-19 2018-09-19 LSH-based multi-dimensional data-oriented safety range query method and system Active CN109446436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811095417.5A CN109446436B (en) 2018-09-19 2018-09-19 LSH-based multi-dimensional data-oriented safety range query method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811095417.5A CN109446436B (en) 2018-09-19 2018-09-19 LSH-based multi-dimensional data-oriented safety range query method and system

Publications (2)

Publication Number Publication Date
CN109446436A true CN109446436A (en) 2019-03-08
CN109446436B CN109446436B (en) 2020-07-03

Family

ID=65533141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811095417.5A Active CN109446436B (en) 2018-09-19 2018-09-19 LSH-based multi-dimensional data-oriented safety range query method and system

Country Status (1)

Country Link
CN (1) CN109446436B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101963982A (en) * 2010-09-27 2011-02-02 清华大学 Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
CN104424253A (en) * 2013-08-28 2015-03-18 腾讯科技(深圳)有限公司 Route query method, device and terminal equipment
CN106339413A (en) * 2016-08-12 2017-01-18 宁波大学 Approximate membership query method based on high-dimensional data filter
GB2554777A (en) * 2016-09-22 2018-04-11 Zensar Tech Limited A computer implemented interactive system and method for locating products and services
CN108027816A (en) * 2015-10-28 2018-05-11 株式会社东芝 Data management system, data managing method and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101963982A (en) * 2010-09-27 2011-02-02 清华大学 Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
CN104424253A (en) * 2013-08-28 2015-03-18 腾讯科技(深圳)有限公司 Route query method, device and terminal equipment
CN108027816A (en) * 2015-10-28 2018-05-11 株式会社东芝 Data management system, data managing method and program
CN106339413A (en) * 2016-08-12 2017-01-18 宁波大学 Approximate membership query method based on high-dimensional data filter
GB2554777A (en) * 2016-09-22 2018-04-11 Zensar Tech Limited A computer implemented interactive system and method for locating products and services

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANGUO PENG等: "Towards Secure Approximate k-Nearest Neighbor Query Over Encrypted High-Dimensional Data", 《IEEE》 *

Also Published As

Publication number Publication date
CN109446436B (en) 2020-07-03

Similar Documents

Publication Publication Date Title
Manzoor et al. Fast memory-efficient anomaly detection in streaming heterogeneous graphs
Roche et al. A practical oblivious map data structure with secure deletion and history independence
CN104794162B (en) Real-time data memory and querying method
US9548866B2 (en) Deletion of content in digital storage systems
US9576073B2 (en) Distance queries on massive networks
US9063947B2 (en) Detecting duplicative hierarchical sets of files
CN108961141B (en) Vector map double zero watermarking method, system, storage medium and server
CN109753809B (en) Power grid data block segmentation method based on cloud storage system
KR100960117B1 (en) Signature Pattern Matching Method, the System for the Same and Computer Readable Medium Storing a Signature Pattern
CN105426375A (en) Relationship network calculation method and apparatus
CN103906039A (en) Method and device for preventing leakage of mobile phone numbers
CN109783667A (en) A kind of method, client and the system of image storage and retrieval
TW201810093A (en) User background information collection method and device
CN110263504A (en) The insertion of reciprocal relation database water mark and extracting method based on differential evolution algorithm
CN112752232A (en) Privacy-oriented driver-passenger matching mechanism
CN108629001A (en) A kind of De-weight method of geography information big data
KR20170122048A (en) System and method for searching encrypted data using bloom filter and binary tree
CN113836447B (en) Security track similarity query method and system under cloud platform
CN105933120A (en) Spark platform-based password hash value recovery method and device
CN109446436A (en) The safe range querying method and system towards multidimensional data based on LSH
WO2015192742A1 (en) Lookup device, lookup method and configuration method
WO2014089843A1 (en) Method and device for data encryption and decryption
CN113141369A (en) Artificial intelligence-based firewall policy management method and related equipment
CN106294407B (en) A kind of coincidence section determines method and apparatus
CN111563256A (en) Safe big data collection and storage method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant