CN107291935A - The CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman - Google Patents
The CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman Download PDFInfo
- Publication number
- CN107291935A CN107291935A CN201710536073.6A CN201710536073A CN107291935A CN 107291935 A CN107291935 A CN 107291935A CN 201710536073 A CN201710536073 A CN 201710536073A CN 107291935 A CN107291935 A CN 107291935A
- Authority
- CN
- China
- Prior art keywords
- arest neighbors
- cpir
- data
- spark
- huffman
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Abstract
The invention discloses a kind of CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, the data of arest neighbors matrix are encoded to the data Bit digits for being compressed and reducing in each grid using Huffman;Then by the data of compression, the code length and element maximum of character are stored into empty database HBase;Then server end reads the data in HBase databases and is cached in the RDD of the parallel frameworks of Spark, and the CPIR arest neighbors matrixes in RDD are grouped according to paralleling tactic, Spark service ends carry out CPIR parallel computations according to Query Information after packet, and client is sent to by the result of calculation polymerization of each packet and then by Query Result and character code length;Client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained.The secret protection search algorithm that the present invention is encoded based on Spark parallelizations and Huffman, it is ensured that under big data application scenarios, protects the inquiry privacy of user and improves search efficiency under original inquiry effect.
Description
Technical field
The present invention relates to technical field of communication network, more particularly to a kind of CPIR-V encoded based on Spark and Huffman
Arest neighbors privacy protection enquiring method.
Background technology
Continuing to develop and producing along with mobile device, the appearance of various positioning means and multiple communication modes,
Due to the generation of various location technology, the popularization of mobile terminal and widely using for communication equipment, with based on location-based service
(LBS) the mobile big data epoch have been stepped into for the Mobile solution of representative.And handle growing data volume only rely on it is existing
PC and the computing capability of organizational structure of server can not meet, but if meter is lifted by upgrading hardware equipment
Calculation ability can then waste substantial amounts of financial resources and material resources, can not also get effectively horizontal extension and maintainability.Therefore exist
Cost is saved, autgmentability of improving the standard and maintainable aspect have done very big research, and Google companies are in search engine conference
(SES San Jose 2006) proposes " concept of cloud computing (Cloud Computing) first.Cloud computing is a kind of parallel meter
Calculate, it is by making calculating be distributed on substantial amounts of distributed computer, rather than in local computer or remote server, enterprise's number
Will be more like with internet according to the operation at center.This enables enterprise by the application of resource switch to needs, according to demand
Access computer and storage system.Cloud computing is after the 1980's mainframe computer to the big change of client-server
Another great change.
Cloud platform provides good platform for the processing of mobile big data, and traditional LBS is applied and LBS secret protections
Technology, which is moved on in cloud platform, to be LBS application technologies and secret protection technology trends and has had become research at present
One of focus.In the big data epoch, by being analyzed big data, being concluded, excavated and then therefrom obtain potential information, this
A little potential informations can Bang Zu enterprises and businessman obtain huge income, such as adjust the marketing policy, reduce and avoid risk, rationality
Decision-making etc. is done in face of turn of the market.However, as the technology excavated to big data continuously emerges and perfect, it is latent excavating
Be also possible to exist the danger of leakage individual privacy while information so that it is serious threaten personal information security and
The trade secret of enterprise, nation's security secret etc..The development and popularization applied with big data, personal secret protection show
It is particularly important and will turn into a big severe challenge.
Current secret protection research direction is broadly divided into three classes:Based on extensive secret protection technology, based on the hidden of encryption
Private protection technique and the secret protection technology based on interference, wherein the secret protection technology main representative based on encryption has based on meter
The Private information retrieval (Computional private informationretrieval, CPIR) of calculation.CPIR is to be based on two
Secondary remaining double linear problems of difficulty for solving, shows in the modular arithmetic of one big composite modulus (being typically 1024bit), distinguishing quadratic residue is
The problem of hardly possible is calculated.CPIR algorithms greatly reduce communication complexity, but also improve the complexity of calculating, it is ensured that most strong
Secret protection degree.But, LBS secret protections can be related to substantial amounts of calculate and operate and complicated map function, CPIR algorithms
The data spaces for scanning whole are needed make it that the computationally intensive and calculating time is long during calculating, this causes the calculating of traditional calculations platform
Ability can not meet existing demand.
The content of the invention
In view of the above-mentioned problems, it is an object of the invention to provide a kind of CPIR-V encoded based on Spark and Huffman most
Neighbour's privacy protection enquiring method, reduces CPIR calculation costs, further improves performance.
The problem of in order to solve in the presence of background technology, the technical scheme is that:
A kind of CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, including:
1), by file process, grid is obtained, the arest neighbors matrix data of grid in file is read;
2), the element in arest neighbors matrix data is compressed using Huffman codings, the Bit of the element is reduced
Digit;
3), the arest neighbors matrix data after coding is stored into spatial database HBase;
4), receive after client data Query Information, server end is according to data query information and from database HBase
It is middle to read the storage of corresponding Query Information into the RDD of the parallel frameworks of Spark, and according to paralleling tactic to the CPIR in RDD most
Neighbour's matrix is grouped, and Spark carries out CPIR parallel computations according to Query Information, the result of calculation of each packet is polymerize right
Query Result and character code length are sent to client afterwards;
5), client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained.
The step 1) by file process, grid is obtained, reading the arest neighbors matrix data of grid in file includes:
According to the Partition for Interest Points Voronoi diagram of file spatial data, then spatial data is entered by Voronoi diagram
Row division obtains Voronoi lattice, then carries out mesh generation to Voronoi lattice, counts the potential arest neighbors number of grid, finally
To the arest neighbors matrix of grid.
The step 2) specifically include:
2.1st, an one-dimensional integer array is created, arest neighbors matrix is read by character, the frequency that statistics character occurs is simultaneously
And in the frequency storage array of character, and the frequency of termination character is the summation of matrix element;
2.2nd, the frequency of each character is calculated, according to the sequential configuration Priority Queues of character frequency from small to large;
2.3rd, Huffman tree is constructed using Priority Queues, and to the character code in Huffman tree and code length is stored in number
Group;
2.4th, each element in arest neighbors matrix is recompiled and the extra termination character that adds is encoded after element coding
Deposit coding chained list, counts each element and has encoded rear Bit digits deposit array;
2.5th, the Bit digits after being encoded according to each element obtain maximum Bit digits;
2.6th, by each element encoded in chained list according to the not enough Bit digits of maximum Bit digits completion.
The step 2.6 fills termination character simultaneously including first in cover element is treated, then full zero padding.
The step 3) specifically include:
3.1st, the data after arest neighbors matrix compression are stored in two-dimentional byte arrays, wherein one-dimensional representation arest neighbors matrix
The sum of element, the maximum byte value of two-dimensional representation element;
3.2nd, the RowKey of HBase databases is designed, using arest neighbors matrix per a line line number backward as
HBase RowKey so that the arest neighbors matrix data after coding is uniformly distributed on HBase HRegionServer;
3.3rd, row are stored according to row number to it, and its value is the element in the grid per a line respective column number, and by character pressure
Code length after contracting is stored into database.
The client data Query Information includes:
Position according to where query point calculates the grid where query point, then the mess generation according to where query point
Corresponding quadratic residue inquiry, will finally inquire about and be sent to service end with mesh generation size and the paralleling tactic of selection.
The step 4) include:
Receive the inquiry sent from client, mesh generation number and paralleling tactic, according to mesh generation data from data
Corresponding CPIR arest neighbors matrix, character code length and maximum storage are read in the HBase of storehouse into Spark RDD, then according to
The paralleling tactic that client is sent is grouped to the CPIR arest neighbors matrixes in RDD, and Spark is according to inquiry Q after having divided group
CPIR parallel computations are carried out, result of calculation are finally obtained, Spark is by the result of calculation polymerization of each packet then by Query Result
Client is sent to character code length.
The paralleling tactic sent according to client, which carries out packet to the CPIR arest neighbors matrix in RDD, to be included being based on
Row grades are grouped and based on Bit grades of packets:
It is described that then CPIR matrixes are grouped according to row based on Row grades of packets;It is described then first to be obtained based on Bit grades of packets
The quantity k for the CPU that cluster is distributed at present, the data according to CPU data to CPIR matrixes per a line are grouped.
The step 5) include:Service end returning result and character code length are received, the calculating of quadratic residue is carried out to result,
And the value of poll bit is obtained, the value of poll bit is subjected to decompression calculating, correct Query Result is obtained.
Compared with prior art, beneficial effects of the present invention are:
The invention provides a kind of CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman,
Encode to be compressed data by using Huffman and reduce data volume so as to reduce CPIR amount of calculation, service end is carried out
When CPIR is calculated, carry out parallel computation to reduce the calculating time using Spark frameworks, the problem of solution calculating time is long.This
The secret protection search algorithm that invention is encoded based on Spark parallelizations and Huffman, it is ensured that under big data application scenarios, is protected
Protect the inquiry privacy of user and improve search efficiency under original inquiry effect.
Brief description of the drawings
Fig. 1 is the CPIR-V arest neighbors privacy protection enquiring method flows that the present invention is encoded based on Spark and Huffman
Figure;
Fig. 2 is the mesh generation schematic diagram of Voronoi diagram of the present invention;
Fig. 3 is that the service end of different mesh generations of the invention calculates average time figure, wherein, (a) is to be uniformly distributed grid
The service end of division calculates average time figure, and (b) calculates average time figure for the service end of Gaussian Profile mesh generation, and (c) is
The service end of True Data mesh generation calculates average time figure.
Embodiment
The present invention is described in detail below in conjunction with the accompanying drawings.
As shown in figure 1, the invention provides a kind of CPIR-V arest neighbors privacy guarantor encoded based on Spark and Huffman
Querying method is protected, including:
1), by file process, grid is obtained, the arest neighbors matrix data of grid in file is read;
According to the Partition for Interest Points Voronoi diagram of file spatial data, then spatial data is entered by Voronoi diagram
Row division obtains Voronoi lattice, then carries out mesh generation to Voronoi lattice, counts the potential arest neighbors number of grid, finally
To the arest neighbors matrix of grid.The Voronoi diagram is that the neighbour between spatial object is embodied by the division to space
Each polygon is referred to as Voronoi lattice in topological relation, figure, and the side of Voronoi lattice is then the vertical of adjacent space object
Bisector.
The size of mesh generation is should be noted when carrying out mesh generation to Voronoi lattice, mesh generation is too small, arest neighbors
Overabundance of data in matrix, it is big that client carries out computing cost during analytical Calculation to Query Result;Mesh generation is too big, arest neighbors square
Data are very few in battle array, and effect is not good after compression, it is therefore desirable to according to the density classifying rationally grid of point of interest.Exemplary, it is whole
Individual space is divided into G*G grid, each grid can intersect with one of Voronoi lattice or by comprising.Such as Fig. 2 institutes
Show, wherein p represents point of interest and q represents query point, whole space be divided into 5*5 grid, wherein grid marked as 1,1 with
Point of interest p1, p2The Voronoi lattice at place intersect, and grid is marked as 2,1 by point of interest p1The Vorono lattice at place are included.Fig. 2
In, the grid where query point q intersects with the Voronoi lattice where point of interest q, therefore query point q arest neighbors is probably p1,
p2, therefore by p1, p2Set { the p of formation1, p2It is referred to as the potential arest neighbors set of grid 2,1.It can thus be appreciated that grid is possessed most
Not necessarily, this is relevant with the distribution of point and the size of grid for Neighbor Points quantity.CPIR-V algorithms are by each grid in Fig. 2
Arest neighbors relation is converted to arest neighbors storage matrix, and what is wherein stored in matrix is the potential nearest neighbor point of each grid, to be noted
The size of each element is identical in meaning, storage matrix.CPIR-V algorithms find the potential arest neighbors of grid first
Maximum p_max, the grid that maximum is then less than to quantity carries out default value completion.The potential arest neighbors maximum of grid is 3,
Therefore completion is carried out to the not enough grid of other quantity.Each potential arest neighbors relation of grid is preserved in a matrix, and the matrix is
Carry out basis and the core of CPIR-V K-NN search algorithms.
2), the element in arest neighbors matrix data is compressed using Huffman codings, the Bit of the element is reduced
Digit;
Huffman codings are a kind of Variable Length Code (VLC of Lossless Compression:Variable length coding) side
Formula, is that the probability that occurs constructs the unique encodings to character in need to encode file according to character, and ensure that variable
The average coding of coding is most short, is referred to as optimum binary tree, is sometimes referred to as forced coding.Because Huffman codings are variable
Long codes, therefore for the higher character of probability of occurrence, the length of coding is shorter, and for the relatively low character of probability of occurrence, coding
Length is longer, this ensure that total code length of processing alphabet is necessarily less than actual code length.
Specifically include:
2.1st, an one-dimensional integer array is created, arest neighbors matrix is read by character, the frequency that statistics character occurs is simultaneously
And in the frequency storage array of character, and the frequency of termination character is the summation of matrix element;
2.2nd, the frequency of each character is calculated, according to the sequential configuration Priority Queues of character frequency from small to large;
2.3rd, Huffman tree is constructed using Priority Queues, and to the character code in Huffman tree and code length is stored in number
Group;
2.4th, each element in arest neighbors matrix is recompiled and the extra termination character that adds is encoded after element coding
Deposit coding chained list, counts each element and has encoded rear Bit digits deposit array;
2.5th, the Bit digits after being encoded according to each element obtain maximum Bit digits;
2.6th, by each element encoded in chained list according to the not enough Bit digits of maximum Bit digits completion.Specifically include elder generation
In cover element is treated, termination character is filled simultaneously, then full zero padding.
3), the arest neighbors matrix data after coding is stored into spatial database HBase;
HBase is a Key-Value PostgreSQL database distributed, towards row storage, is high reliability, a height
Performance, towards row, telescopic distributed memory system.It is imitated simultaneously by the use of Hadoop HDFS as its document storage system
It is functional there is provided the institute of the Bigtable databases based on Google file system.Can be cheap using HBase technologies
Large-scale structure storage cluster is erected on PC Server.Because HBase is a key value database, therefore HBase is suitable
The database stored in unstructured data.
Arest neighbors matrix is stored in spatial database HBase, therefore HBase data are arrived in storage after arest neighbors matrix compression
The process in storehouse is as follows:It is designed to reach data HBase's by the RowKey to HBase databases
It is uniformly distributed on HRegionServer, RowKey main design thought is the line number backward work by arest neighbors matrix per a line
For HBase RowKey, so that the data volume stored in HBase HRegionServer will not be excessive and each
HRegionServer has data storage.Row are stored according to row number to it, and its value is the member in the grid per a line respective column number
Element, finally stores the code length after character compression into database.
Specifically include:
3.1st, the data after arest neighbors matrix compression are stored in two-dimentional byte arrays, wherein one-dimensional representation arest neighbors matrix
The sum of element, the maximum byte value of two-dimensional representation element;
3.2nd, the RowKey of HBase databases is designed, using arest neighbors matrix per a line line number backward as
HBase RowKey so that the arest neighbors matrix data after coding is uniformly distributed on HBase HRegionServer;
3.3rd, row are stored according to row number to it, and its value is the element in the grid per a line respective column number, and by character pressure
Code length after contracting is stored into database.
Specific storage format is as shown in table 1.
The H-PCIR-V information table structures of table 1
4), receive after client data Query Information, server end is according to data query information and from database HBase
It is middle to read the storage of corresponding Query Information into the RDD of the parallel frameworks of Spark, and according to paralleling tactic to the CPIR in RDD most
Neighbour's matrix is grouped, and Spark carries out CPIR parallel computations according to Query Information, the result of calculation of each packet is polymerize right
Query Result and character code length are sent to client afterwards;
The client data Query Information includes:Position according to where query point calculates the grid where query point,
Then the corresponding quadratic residue inquiry of mess generation according to where query point, finally will inquiry and mesh generation size and choosing
The paralleling tactic selected is sent to service end.Concretely comprise the following steps:
1:Grid G according to where the position of query point calculates query pointA, b;
2:Generation inquiry Q (y1, y2..., yg_x), wherein under be designated as b correspondence yi=QNR, remaining subscript value corresponds to yi
=QR;
3:Q will be inquired about, mesh generation number g_x, g_y, paralleling tactic strategy is sent to service end;
4:Service end is waited to return to Query Result;
Receive the inquiry sent from client, mesh generation number and paralleling tactic, according to mesh generation data from data
Corresponding CPIR arest neighbors matrix, character code length and maximum storage are read in the HBase of storehouse into Spark RDD, then according to
The paralleling tactic that client is sent is grouped to the CPIR arest neighbors matrixes in RDD, and Spark is according to inquiry Q after having divided group
CPIR parallel computations are carried out, result of calculation are finally obtained, Spark is by the result of calculation polymerization of each packet then by Query Result
Client is sent to character code length.
The paralleling tactic sent according to client, which carries out packet to the CPIR arest neighbors matrix in RDD, to be included being based on
Row grades are grouped and based on Bit grades of packets:
It is described that then CPIR matrixes are grouped according to row based on Row grades of packets;It is described then first to be obtained based on Bit grades of packets
The quantity k for the CPU that cluster is distributed at present, the data according to CPU data to CPIR matrixes per a line are grouped.
Specially:
1:Service end obtains CPIR matrix datas, character code length and maximum and is cached to according to mesh generation g_x, g_y
RDD;
2:The CPIR matrix datas in RDD are grouped according to row if paralleling tactic strategy is Row;
3:The CPU quantity k of current cluster distribution are obtained if paralleling tactic strategy is Bit, will be per data line
It is divided into k groups;
4:Acquiescence is that CPIR matrix datas in Row, RDD are carried out according to row if paralleling tactic strategy is mismatched
Packet;
5:Spark carries out CPIR to each packet according to inquiry Q and calculates acquisition Query Result Z;
6:Result Z polymerize by Spark, and Query Result and character code length then are sent into client;
5), client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained.
Including:Service end returning result and character code length are received, the calculating of quadratic residue is carried out to result, and obtains the value of poll bit,
The value of poll bit is subjected to decompression calculating, correct Query Result is obtained.
Experimental result is contrasted:
The cloud computing service platform of use is IBM xSeries System 3650M4, and wherein clustered node is 5, often
The detailed configuration of individual node is as follows:
CPU:2*Xeon E5-2620CPU (each have the threads of 6 core * 2);
Internal memory:32G Bytes;
Hard disk:5T Bytes,10000rpm,raid5;
Operating system:CentOS 6.4;
Developing instrument:GNU Toolkits (G++, GDB), Make, Vim, JDK etc..
Experiment development language used is standard C++, Java, scala language.
It is main that using three kinds of data sets, two kinds of data sets and True Data collection for synthesis are analyzed experimental result,
Pass through the query performance of the further parser of experimental result.True Data collection is the Sequoia selected from California.Synthesis
Data set is the data set being uniformly distributed using obeying with Gaussian Profile, and the wherein data set of Gaussian Profile obeys (X, Y)~N
(1,1,0,0,1).Data span is 1046435*1929615, it is noted that abscissa x and ordinate y type are int
Type.It is GMP, the big numbers of BigInteger that big integer calculations instrument is carried using Java JDK that big number, which calculates storehouse, used in C++
Calculate class, it is noted that the value rule of involved threshold value is in quadratic residue:The time for meeting θ Long-number multiplication is more than
Internal memory is tabled look-up the minimum values of time conditions, is drawn by laboratory, and as θ=3, the time-consuming of Long-number multiplication exceedes
The time that internal memory is tabled look-up, therefore θ value is taken as 3.Test the parameter used as shown in table 2.
The experimental data parameter of table 2
Parameter name | Excursion | Default value |
Data type | True Data (62K), Gaussian Profile (100k), is uniformly distributed (100k) | It is uniformly distributed (100K) |
Mesh generation | 10*10,20*20,50*50,100*100,200*200,400*400 | 100*100 |
Modulus k | 128,256,512,1024 | 512 |
Range query coverage rate | 1,5,10,15 | 1 |
Query Result computational methods | Formula 2.10, formula 2.11 | Formula 2.10 |
Core cpu quantity | 1,10,20,40,60 | 60 |
Data compression experimental result:
The compression contrast of the range query algorithm data of table 3
The data in grid are compressed using Huffman codings in the algorithm of data preprocessing phase, range query
Contrast before data compression and after compression is as shown in table 3.The data of range query are encoded by Huffman as can be seen from Table 3
Data volume reduces nearly half after compression, and its compression ratio is close to 55%.Data volume, which is reduced, generally means that service end is carried out
CPIR amounts of calculation also consequently reduce half, and this causes the calculating time of service end to decline.
The compression contrast of the CPIR-V algorithm datas of table 4
Table 4 shows that CPIR-V algorithms are compiled the data in arest neighbors matrix using Huffman in data preprocessing phase
Code compression obtains in matrix size after maximum compression, the maximum by itself and the matrix maximum before compression and after compressing
Contrasted.As can be seen from Table 4, after CPIR-V algorithms are by Huffman coding compressions, the size of maximum is nearly in matrix
Reduce 1/3.Because CPIR-V algorithms are first to search in arest neighbors matrix element maximum and then by remaining completion, it is assumed that most
Neighbour is N*N, and wherein maximum is m, then the big number amount of calculation of service end is m*N*N;The square after Huffman coding compressions
Maximum is (2/3) m in battle array, then uncle's amount of calculation of service end is (2/3) m*N*N, on the whole service end calculating gauge
Let it pass (1/3) m*N*N.
As shown in figure 3, being that the parallel C PIR-V algorithms (PCPIR-V) based on Spark are compiled with being based on Spark and Huffman
Parallel C PIR-V algorithms (H-PCPIR-V) Row paralleling tactics and Bit paralleling tactics the service end meter under different mesh generations of code
The comparison diagram of evaluation time.H-PCPIR-V-R refers to be based on Row paralleling tactics wherein in figure, and H-PCPIR-V-B refers to be based on
Bit paralleling tactics.It can be seen that three kinds of algorithms with the increase service end of grid the calculating time generally also with
Become big because mesh generation is bigger, CPIR-V calculating matrix it is more, it is necessary to big integer calculations it is also more so as to causing
The calculating time increases.The calculating time of three kinds of data sets service end under different grids is shown in figure, as can be seen from the figure
It is generally fewer than PCPIR-V algorithms in the calculating time of service end in H-PCPIR-V-R algorithms and H-PCPIR-V-B algorithms, and
And concentrate obvious in gaussian distribution data collection and True Data, because H-PCPIR-V-R and H-PCPIR-V-B are in number
The Data preprocess stage has carried out compression to data common in matrix to cause the digit for carrying out big integer calculations in matrix to reduce
Further reduce service end and calculate the time.It is equal that the service end of gaussian distribution data collection and True Data collection calculates lead time ratio
It is that Uniform-distributed Data concentrates matrix data than more uniform that the service end of even distributed data collection, which calculates the reason for lead time is obvious,
Without generation considerable influence, data distribution is not of uniform size in other two kinds of data set matrixes after compression, and maxima and minima is poor
Away from larger so as to cause calculating digit after compression to generate larger gap, have an impact which results in result of calculation.
By contrast experiment and analysis of experimental results find H-PCPIR-V algorithms than PCPIR-V algorithm service end meter
Calculate cost and decline 30% or so, the calculation cost of client declines 10% or so, and communication cost declines 40% or so.
It is obvious to a person skilled in the art that will appreciate that above-mentioned specific embodiment is the preferred side of the present invention
Case, therefore improvement, the variation that those skilled in the art may make to some of present invention part, embodiment is still this
The principle of invention, realization is still the purpose of the present invention, belongs to the scope that the present invention is protected.
Claims (9)
1. a kind of CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, it is characterised in that bag
Include:
1), by file process, grid is obtained, the arest neighbors matrix data of grid in file is read;
2), the element in arest neighbors matrix data is compressed using Huffman codings, the Bit digits of the element are reduced;
3), the arest neighbors matrix data after coding is stored into spatial database HBase;
4), receive after client data Query Information, server end is read according to data query information and from database HBase
Corresponding Query Information storage is taken into the RDD of the parallel frameworks of Spark, and according to paralleling tactic to the CPIR arest neighbors in RDD
Matrix is grouped, and Spark carries out CPIR parallel computations according to Query Information, and then the result of calculation polymerization of each packet will
Query Result and character code length are sent to client;
5), client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained.
2. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special
Levy and be, the step 1) by file process, grid is obtained, reading the arest neighbors matrix data of grid in file includes:
According to the Partition for Interest Points Voronoi diagram of file spatial data, then spatial data is drawn by Voronoi diagram
Get Voronoi lattice, mesh generation then is carried out to Voronoi lattice, count the potential arest neighbors number of grid, finally obtain net
The arest neighbors matrix of lattice.
3. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special
Levy and be, the step 2) specifically include:
2.1st, an one-dimensional integer array is created, arest neighbors matrix is read by character, the frequency that statistics character occurs and word
In the frequency storage array of symbol, and the frequency of termination character is the summation of matrix element;
2.2nd, the frequency of each character is calculated, according to the sequential configuration Priority Queues of character frequency from small to large;
2.3rd, Huffman tree is constructed using Priority Queues, and to the character code in Huffman tree and code length is stored in array;
2.4th, each element in arest neighbors matrix is recompiled and the extra termination character coding that adds is stored in after element coding
Chained list is encoded, each element is counted and has encoded rear Bit digits deposit array;
2.5th, the Bit digits after being encoded according to each element obtain maximum Bit digits;
2.6th, by each element encoded in chained list according to the not enough Bit digits of maximum Bit digits completion.
4. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 3 based on Spark and Huffman, it is special
Levy and be, the step 2.6 fills termination character simultaneously including first in cover element is treated, then full zero padding.
5. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special
Levy and be, the step 3) specifically include:
3.1st, the data after arest neighbors matrix compression are stored in two-dimentional byte arrays, wherein one-dimensional representation arest neighbors matrix element
Sum, the maximum byte value of two-dimensional representation element;
3.2nd, the RowKey of HBase databases is designed, the line number backward using arest neighbors matrix per a line is used as HBase's
RowKey so that the arest neighbors matrix data after coding is uniformly distributed on HBase HRegionServer;
3.3rd, row are stored according to row number to it, and its value is the element in the grid per a line respective column number, and by after character compression
Code length store into database.
6. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special
Levy and be, the client data Query Information includes:
Position according to where query point calculates the grid where query point, then the mess generation correspondence according to where query point
Quadratic residue inquiry, will finally inquire about and the paralleling tactic of mesh generation size and selection being sent to service end.
7. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special
Levy and be, the step 4) include:
Receive the inquiry sent from client, mesh generation number and paralleling tactic, according to mesh generation data from database
Corresponding CPIR arest neighbors matrix, character code length and maximum storage are read in HBase into Spark RDD, then according to visitor
The paralleling tactic that family end is sent is grouped to the CPIR arest neighbors matrixes in RDD, and Spark enters according to inquiry Q after having divided group
Row CPIR parallel computations, finally obtain result of calculation, Spark by the result of calculation polymerization of each packet and then by Query Result and
Character code length is sent to client.
8. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 7 based on Spark and Huffman, it is special
Levy and be, the paralleling tactic sent according to client, which carries out packet to the CPIR arest neighbors matrix in RDD, to be included being based on Row
Level is grouped and based on Bit grades of packets:
It is described that then CPIR matrixes are grouped according to row based on Row grades of packets;It is described then first to obtain cluster based on Bit grades of packets
The CPU distributed at present quantity k, the data according to CPU data to CPIR matrixes per a line are grouped.
9. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special
Levy and be, the step 5) include:Service end returning result and character code length are received, the calculating of quadratic residue is carried out to result,
And the value of poll bit is obtained, the value of poll bit is subjected to decompression calculating, correct Query Result is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710536073.6A CN107291935B (en) | 2017-07-04 | 2017-07-04 | Spark and Huffman coding based CPIR-V nearest neighbor privacy protection query method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710536073.6A CN107291935B (en) | 2017-07-04 | 2017-07-04 | Spark and Huffman coding based CPIR-V nearest neighbor privacy protection query method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107291935A true CN107291935A (en) | 2017-10-24 |
CN107291935B CN107291935B (en) | 2020-09-29 |
Family
ID=60098630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710536073.6A Active CN107291935B (en) | 2017-07-04 | 2017-07-04 | Spark and Huffman coding based CPIR-V nearest neighbor privacy protection query method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291935B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108200027A (en) * | 2017-12-27 | 2018-06-22 | 东南大学 | A kind of protective position privacy nearest Neighbor based on feedback angle |
CN109190809A (en) * | 2018-08-15 | 2019-01-11 | 中国石油化工股份有限公司江汉油田分公司勘探开发研究院 | The coding method of oilfield development program multivariable and device |
CN112527951A (en) * | 2021-02-09 | 2021-03-19 | 北京微步在线科技有限公司 | Byte array-based integer variable-length ordered coding method and device and storage medium |
CN114968404A (en) * | 2022-05-24 | 2022-08-30 | 武汉大学 | Distributed unloading method for computing task with position privacy protection |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073689A (en) * | 2010-12-27 | 2011-05-25 | 东北大学 | Dynamic nearest neighbour inquiry method on basis of regional coverage |
CN102708191A (en) * | 2012-05-15 | 2012-10-03 | 通唐软件技术(湖南)有限公司 | Word stock coding and decoding method capable of saving memory |
CN104268210A (en) * | 2014-09-12 | 2015-01-07 | 东北大学 | CPIR-V nearest neighbor privacy protection querying method based on local super-set |
CN104392318A (en) * | 2014-11-24 | 2015-03-04 | 蔡志明 | Medical data storing and inquiring method based on cloud platform |
CN104486434A (en) * | 2014-12-23 | 2015-04-01 | 深圳供电局有限公司 | Mobile terminal and file upload and download methods of mobile terminal |
-
2017
- 2017-07-04 CN CN201710536073.6A patent/CN107291935B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073689A (en) * | 2010-12-27 | 2011-05-25 | 东北大学 | Dynamic nearest neighbour inquiry method on basis of regional coverage |
CN102708191A (en) * | 2012-05-15 | 2012-10-03 | 通唐软件技术(湖南)有限公司 | Word stock coding and decoding method capable of saving memory |
CN104268210A (en) * | 2014-09-12 | 2015-01-07 | 东北大学 | CPIR-V nearest neighbor privacy protection querying method based on local super-set |
CN104392318A (en) * | 2014-11-24 | 2015-03-04 | 蔡志明 | Medical data storing and inquiring method based on cloud platform |
CN104486434A (en) * | 2014-12-23 | 2015-04-01 | 深圳供电局有限公司 | Mobile terminal and file upload and download methods of mobile terminal |
Non-Patent Citations (2)
Title |
---|
屈志坚: "Hadoop云构架的智能调度无损集群压缩技术", 《电力系统自动化》 * |
邓诗卓: "PCPIR-V:基于Spark的并行隐私保护近邻查询算法", 《网络与信息安全学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108200027A (en) * | 2017-12-27 | 2018-06-22 | 东南大学 | A kind of protective position privacy nearest Neighbor based on feedback angle |
CN108200027B (en) * | 2017-12-27 | 2020-11-03 | 东南大学 | Position privacy protection neighbor query method based on feedback angle |
CN109190809A (en) * | 2018-08-15 | 2019-01-11 | 中国石油化工股份有限公司江汉油田分公司勘探开发研究院 | The coding method of oilfield development program multivariable and device |
CN112527951A (en) * | 2021-02-09 | 2021-03-19 | 北京微步在线科技有限公司 | Byte array-based integer variable-length ordered coding method and device and storage medium |
CN114968404A (en) * | 2022-05-24 | 2022-08-30 | 武汉大学 | Distributed unloading method for computing task with position privacy protection |
CN114968404B (en) * | 2022-05-24 | 2023-11-17 | 武汉大学 | Distributed unloading method for computing tasks of location privacy protection |
Also Published As
Publication number | Publication date |
---|---|
CN107291935B (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tao et al. | Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization | |
CN107291935A (en) | The CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman | |
Arya et al. | Approximate nearest neighbor queries in fixed dimensions. | |
US7765172B2 (en) | Artificial intelligence for wireless network analysis | |
Kamousi et al. | Closest pair and the post office problem for stochastic points | |
CN101968806A (en) | Data storage method, querying method and device | |
CN104348490A (en) | Combined data compression algorithm based on effect optimization | |
Xue et al. | Sequence data matching and beyond: New privacy-preserving primitives based on bloom filters | |
CN104881449A (en) | Image retrieval method based on manifold learning data compression hash | |
CN112860932B (en) | Image retrieval method, device, equipment and storage medium for resisting malicious sample attack | |
CN105138527A (en) | Data classification regression method and data classification regression device | |
Tirthapura et al. | Rectangle-efficient aggregation in spatial data streams | |
Magdy et al. | Privacy preserving search index for image databases based on SURF and order preserving encryption | |
CN108153585A (en) | A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames | |
CN106202522B (en) | A kind of multiplexing method and system of flow field integral curve | |
Crume et al. | Compressing intermediate keys between mappers and reducers in scihadoop | |
Gupta et al. | Computational complexity of fractal image compression algorithm | |
CN106571909A (en) | Data encryption method and device | |
CN107077481B (en) | Information processing apparatus, information processing method, and computer-readable storage medium | |
Krasadakis et al. | Parallel based hiding of sensitive knowledge | |
CN110162549A (en) | A kind of fire data analysis method, device, readable storage medium storing program for executing and terminal device | |
CN110297952B (en) | Grid index-based parallelization high-speed railway survey data retrieval method | |
CN114003744A (en) | Image retrieval method and system based on convolutional neural network and vector homomorphic encryption | |
CN107515937B (en) | Differential account classification method and system, service terminal and memory | |
Lee et al. | Implement of MapReduce-based Big Data Processing Scheme for Reducing Big Data Processing Delay Time and Store Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |