CN107291935A

CN107291935A - The CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman

Info

Publication number: CN107291935A
Application number: CN201710536073.6A
Authority: CN
Inventors: 王波涛; 王国仁; 陈月梅; 李昂; 岳春成
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2017-07-04
Filing date: 2017-07-04
Publication date: 2017-10-24
Anticipated expiration: 2037-07-04
Also published as: CN107291935B

Abstract

The invention discloses a kind of CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, the data of arest neighbors matrix are encoded to the data Bit digits for being compressed and reducing in each grid using Huffman；Then by the data of compression, the code length and element maximum of character are stored into empty database HBase；Then server end reads the data in HBase databases and is cached in the RDD of the parallel frameworks of Spark, and the CPIR arest neighbors matrixes in RDD are grouped according to paralleling tactic, Spark service ends carry out CPIR parallel computations according to Query Information after packet, and client is sent to by the result of calculation polymerization of each packet and then by Query Result and character code length；Client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained.The secret protection search algorithm that the present invention is encoded based on Spark parallelizations and Huffman, it is ensured that under big data application scenarios, protects the inquiry privacy of user and improves search efficiency under original inquiry effect.

Description

The CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman

Technical field

The present invention relates to technical field of communication network, more particularly to a kind of CPIR-V encoded based on Spark and Huffman Arest neighbors privacy protection enquiring method.

Background technology

Continuing to develop and producing along with mobile device, the appearance of various positioning means and multiple communication modes, Due to the generation of various location technology, the popularization of mobile terminal and widely using for communication equipment, with based on location-based service (LBS) the mobile big data epoch have been stepped into for the Mobile solution of representative.And handle growing data volume only rely on it is existing PC and the computing capability of organizational structure of server can not meet, but if meter is lifted by upgrading hardware equipment Calculation ability can then waste substantial amounts of financial resources and material resources, can not also get effectively horizontal extension and maintainability.Therefore exist Cost is saved, autgmentability of improving the standard and maintainable aspect have done very big research, and Google companies are in search engine conference (SES San Jose 2006) proposes " concept of cloud computing (Cloud Computing) first.Cloud computing is a kind of parallel meter Calculate, it is by making calculating be distributed on substantial amounts of distributed computer, rather than in local computer or remote server, enterprise's number Will be more like with internet according to the operation at center.This enables enterprise by the application of resource switch to needs, according to demand Access computer and storage system.Cloud computing is after the 1980's mainframe computer to the big change of client-server Another great change.

Cloud platform provides good platform for the processing of mobile big data, and traditional LBS is applied and LBS secret protections Technology, which is moved on in cloud platform, to be LBS application technologies and secret protection technology trends and has had become research at present One of focus.In the big data epoch, by being analyzed big data, being concluded, excavated and then therefrom obtain potential information, this A little potential informations can Bang Zu enterprises and businessman obtain huge income, such as adjust the marketing policy, reduce and avoid risk, rationality Decision-making etc. is done in face of turn of the market.However, as the technology excavated to big data continuously emerges and perfect, it is latent excavating Be also possible to exist the danger of leakage individual privacy while information so that it is serious threaten personal information security and The trade secret of enterprise, nation's security secret etc..The development and popularization applied with big data, personal secret protection show It is particularly important and will turn into a big severe challenge.

Current secret protection research direction is broadly divided into three classes：Based on extensive secret protection technology, based on the hidden of encryption Private protection technique and the secret protection technology based on interference, wherein the secret protection technology main representative based on encryption has based on meter The Private information retrieval (Computional private informationretrieval, CPIR) of calculation.CPIR is to be based on two Secondary remaining double linear problems of difficulty for solving, shows in the modular arithmetic of one big composite modulus (being typically 1024bit), distinguishing quadratic residue is The problem of hardly possible is calculated.CPIR algorithms greatly reduce communication complexity, but also improve the complexity of calculating, it is ensured that most strong Secret protection degree.But, LBS secret protections can be related to substantial amounts of calculate and operate and complicated map function, CPIR algorithms The data spaces for scanning whole are needed make it that the computationally intensive and calculating time is long during calculating, this causes the calculating of traditional calculations platform Ability can not meet existing demand.

The content of the invention

In view of the above-mentioned problems, it is an object of the invention to provide a kind of CPIR-V encoded based on Spark and Huffman most Neighbour's privacy protection enquiring method, reduces CPIR calculation costs, further improves performance.

The problem of in order to solve in the presence of background technology, the technical scheme is that：

A kind of CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, including：

1), by file process, grid is obtained, the arest neighbors matrix data of grid in file is read；

2), the element in arest neighbors matrix data is compressed using Huffman codings, the Bit of the element is reduced Digit；

3), the arest neighbors matrix data after coding is stored into spatial database HBase；

4), receive after client data Query Information, server end is according to data query information and from database HBase It is middle to read the storage of corresponding Query Information into the RDD of the parallel frameworks of Spark, and according to paralleling tactic to the CPIR in RDD most Neighbour's matrix is grouped, and Spark carries out CPIR parallel computations according to Query Information, the result of calculation of each packet is polymerize right Query Result and character code length are sent to client afterwards；

5), client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained.

The step 1) by file process, grid is obtained, reading the arest neighbors matrix data of grid in file includes：

According to the Partition for Interest Points Voronoi diagram of file spatial data, then spatial data is entered by Voronoi diagram Row division obtains Voronoi lattice, then carries out mesh generation to Voronoi lattice, counts the potential arest neighbors number of grid, finally To the arest neighbors matrix of grid.

The step 2) specifically include：

2.1st, an one-dimensional integer array is created, arest neighbors matrix is read by character, the frequency that statistics character occurs is simultaneously And in the frequency storage array of character, and the frequency of termination character is the summation of matrix element；

2.2nd, the frequency of each character is calculated, according to the sequential configuration Priority Queues of character frequency from small to large；

2.3rd, Huffman tree is constructed using Priority Queues, and to the character code in Huffman tree and code length is stored in number Group；

2.4th, each element in arest neighbors matrix is recompiled and the extra termination character that adds is encoded after element coding Deposit coding chained list, counts each element and has encoded rear Bit digits deposit array；

2.5th, the Bit digits after being encoded according to each element obtain maximum Bit digits；

2.6th, by each element encoded in chained list according to the not enough Bit digits of maximum Bit digits completion.

The step 2.6 fills termination character simultaneously including first in cover element is treated, then full zero padding.

The step 3) specifically include：

3.1st, the data after arest neighbors matrix compression are stored in two-dimentional byte arrays, wherein one-dimensional representation arest neighbors matrix The sum of element, the maximum byte value of two-dimensional representation element；

3.2nd, the RowKey of HBase databases is designed, using arest neighbors matrix per a line line number backward as HBase RowKey so that the arest neighbors matrix data after coding is uniformly distributed on HBase HRegionServer；

3.3rd, row are stored according to row number to it, and its value is the element in the grid per a line respective column number, and by character pressure Code length after contracting is stored into database.

The client data Query Information includes：

Position according to where query point calculates the grid where query point, then the mess generation according to where query point Corresponding quadratic residue inquiry, will finally inquire about and be sent to service end with mesh generation size and the paralleling tactic of selection.

The step 4) include：

Receive the inquiry sent from client, mesh generation number and paralleling tactic, according to mesh generation data from data Corresponding CPIR arest neighbors matrix, character code length and maximum storage are read in the HBase of storehouse into Spark RDD, then according to The paralleling tactic that client is sent is grouped to the CPIR arest neighbors matrixes in RDD, and Spark is according to inquiry Q after having divided group CPIR parallel computations are carried out, result of calculation are finally obtained, Spark is by the result of calculation polymerization of each packet then by Query Result Client is sent to character code length.

The paralleling tactic sent according to client, which carries out packet to the CPIR arest neighbors matrix in RDD, to be included being based on Row grades are grouped and based on Bit grades of packets：

It is described that then CPIR matrixes are grouped according to row based on Row grades of packets；It is described then first to be obtained based on Bit grades of packets The quantity k for the CPU that cluster is distributed at present, the data according to CPU data to CPIR matrixes per a line are grouped.

The step 5) include：Service end returning result and character code length are received, the calculating of quadratic residue is carried out to result, And the value of poll bit is obtained, the value of poll bit is subjected to decompression calculating, correct Query Result is obtained.

Compared with prior art, beneficial effects of the present invention are：

The invention provides a kind of CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, Encode to be compressed data by using Huffman and reduce data volume so as to reduce CPIR amount of calculation, service end is carried out When CPIR is calculated, carry out parallel computation to reduce the calculating time using Spark frameworks, the problem of solution calculating time is long.This The secret protection search algorithm that invention is encoded based on Spark parallelizations and Huffman, it is ensured that under big data application scenarios, is protected Protect the inquiry privacy of user and improve search efficiency under original inquiry effect.

Brief description of the drawings

Fig. 1 is the CPIR-V arest neighbors privacy protection enquiring method flows that the present invention is encoded based on Spark and Huffman Figure；

Fig. 2 is the mesh generation schematic diagram of Voronoi diagram of the present invention；

Fig. 3 is that the service end of different mesh generations of the invention calculates average time figure, wherein, (a) is to be uniformly distributed grid The service end of division calculates average time figure, and (b) calculates average time figure for the service end of Gaussian Profile mesh generation, and (c) is The service end of True Data mesh generation calculates average time figure.

Embodiment

The present invention is described in detail below in conjunction with the accompanying drawings.

As shown in figure 1, the invention provides a kind of CPIR-V arest neighbors privacy guarantor encoded based on Spark and Huffman Querying method is protected, including：

According to the Partition for Interest Points Voronoi diagram of file spatial data, then spatial data is entered by Voronoi diagram Row division obtains Voronoi lattice, then carries out mesh generation to Voronoi lattice, counts the potential arest neighbors number of grid, finally To the arest neighbors matrix of grid.The Voronoi diagram is that the neighbour between spatial object is embodied by the division to space Each polygon is referred to as Voronoi lattice in topological relation, figure, and the side of Voronoi lattice is then the vertical of adjacent space object Bisector.

The size of mesh generation is should be noted when carrying out mesh generation to Voronoi lattice, mesh generation is too small, arest neighbors Overabundance of data in matrix, it is big that client carries out computing cost during analytical Calculation to Query Result；Mesh generation is too big, arest neighbors square Data are very few in battle array, and effect is not good after compression, it is therefore desirable to according to the density classifying rationally grid of point of interest.Exemplary, it is whole Individual space is divided into G*G grid, each grid can intersect with one of Voronoi lattice or by comprising.Such as Fig. 2 institutes Show, wherein p represents point of interest and q represents query point, whole space be divided into 5*5 grid, wherein grid marked as 1,1 with Point of interest p₁, p₂The Voronoi lattice at place intersect, and grid is marked as 2,1 by point of interest p₁The Vorono lattice at place are included.Fig. 2 In, the grid where query point q intersects with the Voronoi lattice where point of interest q, therefore query point q arest neighbors is probably p₁, p₂, therefore by p₁, p₂Set { the p of formation₁, p₂It is referred to as the potential arest neighbors set of grid 2,1.It can thus be appreciated that grid is possessed most Not necessarily, this is relevant with the distribution of point and the size of grid for Neighbor Points quantity.CPIR-V algorithms are by each grid in Fig. 2 Arest neighbors relation is converted to arest neighbors storage matrix, and what is wherein stored in matrix is the potential nearest neighbor point of each grid, to be noted The size of each element is identical in meaning, storage matrix.CPIR-V algorithms find the potential arest neighbors of grid first Maximum p_max, the grid that maximum is then less than to quantity carries out default value completion.The potential arest neighbors maximum of grid is 3, Therefore completion is carried out to the not enough grid of other quantity.Each potential arest neighbors relation of grid is preserved in a matrix, and the matrix is Carry out basis and the core of CPIR-V K-NN search algorithms.

Huffman codings are a kind of Variable Length Code (VLC of Lossless Compression：Variable length coding) side Formula, is that the probability that occurs constructs the unique encodings to character in need to encode file according to character, and ensure that variable The average coding of coding is most short, is referred to as optimum binary tree, is sometimes referred to as forced coding.Because Huffman codings are variable Long codes, therefore for the higher character of probability of occurrence, the length of coding is shorter, and for the relatively low character of probability of occurrence, coding Length is longer, this ensure that total code length of processing alphabet is necessarily less than actual code length.

Specifically include：

2.6th, by each element encoded in chained list according to the not enough Bit digits of maximum Bit digits completion.Specifically include elder generation In cover element is treated, termination character is filled simultaneously, then full zero padding.

HBase is a Key-Value PostgreSQL database distributed, towards row storage, is high reliability, a height Performance, towards row, telescopic distributed memory system.It is imitated simultaneously by the use of Hadoop HDFS as its document storage system It is functional there is provided the institute of the Bigtable databases based on Google file system.Can be cheap using HBase technologies Large-scale structure storage cluster is erected on PC Server.Because HBase is a key value database, therefore HBase is suitable The database stored in unstructured data.

Arest neighbors matrix is stored in spatial database HBase, therefore HBase data are arrived in storage after arest neighbors matrix compression The process in storehouse is as follows：It is designed to reach data HBase's by the RowKey to HBase databases It is uniformly distributed on HRegionServer, RowKey main design thought is the line number backward work by arest neighbors matrix per a line For HBase RowKey, so that the data volume stored in HBase HRegionServer will not be excessive and each HRegionServer has data storage.Row are stored according to row number to it, and its value is the member in the grid per a line respective column number Element, finally stores the code length after character compression into database.

Specifically include：

Specific storage format is as shown in table 1.

The H-PCIR-V information table structures of table 1

The client data Query Information includes：Position according to where query point calculates the grid where query point, Then the corresponding quadratic residue inquiry of mess generation according to where query point, finally will inquiry and mesh generation size and choosing The paralleling tactic selected is sent to service end.Concretely comprise the following steps：

1：Grid G according to where the position of query point calculates query point_{A, b}；

2：Generation inquiry Q (y₁, y₂..., y_{g_x}), wherein under be designated as b correspondence y_i=QNR, remaining subscript value corresponds to y_i =QR；

3：Q will be inquired about, mesh generation number g_x, g_y, paralleling tactic strategy is sent to service end；

4：Service end is waited to return to Query Result；

Specially：

1：Service end obtains CPIR matrix datas, character code length and maximum and is cached to according to mesh generation g_x, g_y RDD；

2：The CPIR matrix datas in RDD are grouped according to row if paralleling tactic strategy is Row；

3：The CPU quantity k of current cluster distribution are obtained if paralleling tactic strategy is Bit, will be per data line It is divided into k groups；

4：Acquiescence is that CPIR matrix datas in Row, RDD are carried out according to row if paralleling tactic strategy is mismatched Packet；

5：Spark carries out CPIR to each packet according to inquiry Q and calculates acquisition Query Result Z；

6：Result Z polymerize by Spark, and Query Result and character code length then are sent into client；

5), client parses Query Result the value for obtaining poll bit, and the value of poll bit is decompressed, Query Information is obtained. Including：Service end returning result and character code length are received, the calculating of quadratic residue is carried out to result, and obtains the value of poll bit, The value of poll bit is subjected to decompression calculating, correct Query Result is obtained.

Experimental result is contrasted：

The cloud computing service platform of use is IBM xSeries System 3650M4, and wherein clustered node is 5, often The detailed configuration of individual node is as follows：

CPU：2*Xeon E5-2620CPU (each have the threads of 6 core * 2)；

Internal memory：32G Bytes；

Hard disk：5T Bytes,10000rpm,raid5；

Operating system：CentOS 6.4；

Developing instrument：GNU Toolkits (G++, GDB), Make, Vim, JDK etc..

Experiment development language used is standard C++, Java, scala language.

It is main that using three kinds of data sets, two kinds of data sets and True Data collection for synthesis are analyzed experimental result, Pass through the query performance of the further parser of experimental result.True Data collection is the Sequoia selected from California.Synthesis Data set is the data set being uniformly distributed using obeying with Gaussian Profile, and the wherein data set of Gaussian Profile obeys (X, Y)~N (1,1,0,0,1).Data span is 1046435*1929615, it is noted that abscissa x and ordinate y type are int Type.It is GMP, the big numbers of BigInteger that big integer calculations instrument is carried using Java JDK that big number, which calculates storehouse, used in C++ Calculate class, it is noted that the value rule of involved threshold value is in quadratic residue：The time for meeting θ Long-number multiplication is more than Internal memory is tabled look-up the minimum values of time conditions, is drawn by laboratory, and as θ=3, the time-consuming of Long-number multiplication exceedes The time that internal memory is tabled look-up, therefore θ value is taken as 3.Test the parameter used as shown in table 2.

The experimental data parameter of table 2

Parameter name	Excursion	Default value
			Data type	True Data (62K), Gaussian Profile (100k), is uniformly distributed (100k)	It is uniformly distributed (100K)
Mesh generation	1010,2020,5050,100100,200200,400400	100*100
			Modulus k	128,256,512,1024	512
Range query coverage rate	1,5,10,15	1
			Query Result computational methods	Formula 2.10, formula 2.11	Formula 2.10
Core cpu quantity	1,10,20,40,60	60

Data compression experimental result：

The compression contrast of the range query algorithm data of table 3

The data in grid are compressed using Huffman codings in the algorithm of data preprocessing phase, range query Contrast before data compression and after compression is as shown in table 3.The data of range query are encoded by Huffman as can be seen from Table 3 Data volume reduces nearly half after compression, and its compression ratio is close to 55%.Data volume, which is reduced, generally means that service end is carried out CPIR amounts of calculation also consequently reduce half, and this causes the calculating time of service end to decline.

The compression contrast of the CPIR-V algorithm datas of table 4

Table 4 shows that CPIR-V algorithms are compiled the data in arest neighbors matrix using Huffman in data preprocessing phase Code compression obtains in matrix size after maximum compression, the maximum by itself and the matrix maximum before compression and after compressing Contrasted.As can be seen from Table 4, after CPIR-V algorithms are by Huffman coding compressions, the size of maximum is nearly in matrix Reduce 1/3.Because CPIR-V algorithms are first to search in arest neighbors matrix element maximum and then by remaining completion, it is assumed that most Neighbour is N*N, and wherein maximum is m, then the big number amount of calculation of service end is m*N*N；The square after Huffman coding compressions Maximum is (2/3) m in battle array, then uncle's amount of calculation of service end is (2/3) m*N*N, on the whole service end calculating gauge Let it pass (1/3) m*N*N.

As shown in figure 3, being that the parallel C PIR-V algorithms (PCPIR-V) based on Spark are compiled with being based on Spark and Huffman Parallel C PIR-V algorithms (H-PCPIR-V) Row paralleling tactics and Bit paralleling tactics the service end meter under different mesh generations of code The comparison diagram of evaluation time.H-PCPIR-V-R refers to be based on Row paralleling tactics wherein in figure, and H-PCPIR-V-B refers to be based on Bit paralleling tactics.It can be seen that three kinds of algorithms with the increase service end of grid the calculating time generally also with Become big because mesh generation is bigger, CPIR-V calculating matrix it is more, it is necessary to big integer calculations it is also more so as to causing The calculating time increases.The calculating time of three kinds of data sets service end under different grids is shown in figure, as can be seen from the figure It is generally fewer than PCPIR-V algorithms in the calculating time of service end in H-PCPIR-V-R algorithms and H-PCPIR-V-B algorithms, and And concentrate obvious in gaussian distribution data collection and True Data, because H-PCPIR-V-R and H-PCPIR-V-B are in number The Data preprocess stage has carried out compression to data common in matrix to cause the digit for carrying out big integer calculations in matrix to reduce Further reduce service end and calculate the time.It is equal that the service end of gaussian distribution data collection and True Data collection calculates lead time ratio It is that Uniform-distributed Data concentrates matrix data than more uniform that the service end of even distributed data collection, which calculates the reason for lead time is obvious, Without generation considerable influence, data distribution is not of uniform size in other two kinds of data set matrixes after compression, and maxima and minima is poor Away from larger so as to cause calculating digit after compression to generate larger gap, have an impact which results in result of calculation.

By contrast experiment and analysis of experimental results find H-PCPIR-V algorithms than PCPIR-V algorithm service end meter Calculate cost and decline 30% or so, the calculation cost of client declines 10% or so, and communication cost declines 40% or so.

It is obvious to a person skilled in the art that will appreciate that above-mentioned specific embodiment is the preferred side of the present invention Case, therefore improvement, the variation that those skilled in the art may make to some of present invention part, embodiment is still this The principle of invention, realization is still the purpose of the present invention, belongs to the scope that the present invention is protected.

Claims

1. a kind of CPIR-V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman, it is characterised in that bag Include：

2), the element in arest neighbors matrix data is compressed using Huffman codings, the Bit digits of the element are reduced；

4), receive after client data Query Information, server end is read according to data query information and from database HBase Corresponding Query Information storage is taken into the RDD of the parallel frameworks of Spark, and according to paralleling tactic to the CPIR arest neighbors in RDD Matrix is grouped, and Spark carries out CPIR parallel computations according to Query Information, and then the result of calculation polymerization of each packet will Query Result and character code length are sent to client；

2. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special Levy and be, the step 1) by file process, grid is obtained, reading the arest neighbors matrix data of grid in file includes：

According to the Partition for Interest Points Voronoi diagram of file spatial data, then spatial data is drawn by Voronoi diagram Get Voronoi lattice, mesh generation then is carried out to Voronoi lattice, count the potential arest neighbors number of grid, finally obtain net The arest neighbors matrix of lattice.

3. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special Levy and be, the step 2) specifically include：

2.1st, an one-dimensional integer array is created, arest neighbors matrix is read by character, the frequency that statistics character occurs and word In the frequency storage array of symbol, and the frequency of termination character is the summation of matrix element；

2.3rd, Huffman tree is constructed using Priority Queues, and to the character code in Huffman tree and code length is stored in array；

2.4th, each element in arest neighbors matrix is recompiled and the extra termination character coding that adds is stored in after element coding Chained list is encoded, each element is counted and has encoded rear Bit digits deposit array；

4. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 3 based on Spark and Huffman, it is special Levy and be, the step 2.6 fills termination character simultaneously including first in cover element is treated, then full zero padding.

5. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special Levy and be, the step 3) specifically include：

3.1st, the data after arest neighbors matrix compression are stored in two-dimentional byte arrays, wherein one-dimensional representation arest neighbors matrix element Sum, the maximum byte value of two-dimensional representation element；

3.2nd, the RowKey of HBase databases is designed, the line number backward using arest neighbors matrix per a line is used as HBase's RowKey so that the arest neighbors matrix data after coding is uniformly distributed on HBase HRegionServer；

3.3rd, row are stored according to row number to it, and its value is the element in the grid per a line respective column number, and by after character compression Code length store into database.

6. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special Levy and be, the client data Query Information includes：

Position according to where query point calculates the grid where query point, then the mess generation correspondence according to where query point Quadratic residue inquiry, will finally inquire about and the paralleling tactic of mesh generation size and selection being sent to service end.

7. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special Levy and be, the step 4) include：

Receive the inquiry sent from client, mesh generation number and paralleling tactic, according to mesh generation data from database Corresponding CPIR arest neighbors matrix, character code length and maximum storage are read in HBase into Spark RDD, then according to visitor The paralleling tactic that family end is sent is grouped to the CPIR arest neighbors matrixes in RDD, and Spark enters according to inquiry Q after having divided group Row CPIR parallel computations, finally obtain result of calculation, Spark by the result of calculation polymerization of each packet and then by Query Result and Character code length is sent to client.

8. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 7 based on Spark and Huffman, it is special Levy and be, the paralleling tactic sent according to client, which carries out packet to the CPIR arest neighbors matrix in RDD, to be included being based on Row Level is grouped and based on Bit grades of packets：

It is described that then CPIR matrixes are grouped according to row based on Row grades of packets；It is described then first to obtain cluster based on Bit grades of packets The CPU distributed at present quantity k, the data according to CPU data to CPIR matrixes per a line are grouped.

9. the CPIR-V arest neighbors privacy protection enquiring methods encoded according to claim 1 based on Spark and Huffman, it is special Levy and be, the step 5) include：Service end returning result and character code length are received, the calculating of quadratic residue is carried out to result, And the value of poll bit is obtained, the value of poll bit is subjected to decompression calculating, correct Query Result is obtained.