一种本地化差分隐私城市环卫数据报告和隐私计算方法A localized differential privacy urban sanitation data report and privacy calculation method
技术领域Technical field
本发明涉及信息安全隐私保护领域,尤其涉及一种本地化差分隐私城市环卫数据报告和隐私计算方法。The present invention relates to the field of information security and privacy protection, in particular to a localized differential privacy urban sanitation data report and privacy calculation method.
背景技术Background technique
随着传感器精度的不断提升以及移动智能设备的普及,以智能手机为代表的移动设备能随时随地地感知人和周边的环境信息,使人们可以及时地获取其所处环境的状况,更好地为人们及社会提供所需的服务,促进移动群体感知领域不断向前发展。移动群体感知为人们提供了一种新的感知世界的方式,能够使任何人都能通过移动终端执行感知任务参与到感知过程,并为人们提供各种服务。With the continuous improvement of sensor accuracy and the popularization of mobile smart devices, mobile devices represented by smart phones can perceive people and surrounding environmental information anytime and anywhere, so that people can get the status of their environment in time, and better Provide required services for people and society, and promote the continuous development of the field of mobile group perception. Mobile group perception provides a new way for people to perceive the world, enabling anyone to perform perception tasks through mobile terminals to participate in the perception process, and provide people with various services.
移动群体的应用中,感知参与用户的隐私能否得到保证是关系用户是否愿意参与感知任务的关键因素。传统加密方式存在计算开销大、基于匿名化的各种隐私保护方法不能抵抗背景知识攻击等不足之处,中心化差分隐私方法需要基于第三方数据收集者诚实可信,但是不会窃取或泄露用户的敏感信息的假设在实际应用中并不现实。而本地化差分隐私技术在继承中心化差分隐私技术定量化定义隐私攻击的基础上,将隐私保护的处理过程让每个用户独立完成,能够实现更彻底的隐私保护。In the application of mobile groups, whether the privacy of perception participating users can be guaranteed is a key factor that relates to whether users are willing to participate in perception tasks. Traditional encryption methods have disadvantages such as high computational overhead and various privacy protection methods based on anonymization cannot resist background knowledge attacks. The centralized differential privacy method needs to be based on the honesty and credibility of third-party data collectors, but will not steal or disclose users The assumption of sensitive information is not realistic in practical applications. The localized differential privacy technology inherits the centralized differential privacy technology to quantitatively define privacy attacks, and allows each user to complete the privacy protection process independently, which can achieve more thorough privacy protection.
随机响应技术是本地化差分隐私保护技术的主流扰动机制,随机响应需要的样本数据量巨大,在大部分的移动群体感知场景当中,参与用户的数据量非常有限,因此如何将随机响应技术应用在中小样本移动群体感知场景有待进一步的研究。Random response technology is the mainstream perturbation mechanism of localized differential privacy protection technology. Random response requires a huge amount of sample data. In most mobile group sensing scenarios, the amount of data for participating users is very limited. Therefore, how to apply random response technology in The scene of small and medium sample mobile group perception needs further research.
发明内容Summary of the invention
本发明的目的在于克服现有技术的不足,提供一种基于随机响应实现的本地化差分隐私城市环卫数据报告和隐私计算方法。本发明在移动群体感知领域应用随机响应技术进行报告收集,对隐私数据进行聚合计算分析,得到隐私属性的联合分布,实现了对城市环卫分布的监控。The purpose of the present invention is to overcome the shortcomings of the prior art and provide a localized differential privacy urban sanitation data report and privacy calculation method based on random response. The invention applies random response technology to report collection in the field of mobile group perception, aggregates and calculates and analyzes privacy data, obtains the joint distribution of privacy attributes, and realizes the monitoring of urban sanitation distribution.
本发明的目的能够通过以下技术方案实现:The purpose of the present invention can be achieved through the following technical solutions:
一种本地化差分隐私城市环卫数据报告和隐私计算方法,具体步骤包括:A localized differential privacy urban sanitation data report and privacy calculation method, the specific steps include:
设置位置隐私参数及垃圾量等级隐私参数,计算隐私处理参数;Set location privacy parameters and garbage volume level privacy parameters, and calculate privacy processing parameters;
用户通过定位环卫点位置及判定环卫点垃圾量等级数,得到用户的原始真实数据;The user obtains the user's original real data by locating the location of the sanitation point and judging the garbage volume level of the sanitation point;
对位置及垃圾量等级数据分别进行本地化差分隐私处理并上传到服务器;Perform localized differential privacy processing on location and waste level data and upload them to the server;
服务器接收用户上传数据并对数据进行统计及去噪处理;The server receives data uploaded by users and performs statistics and denoising processing on the data;
服务器对经过去噪处理后的数据进行聚合计算,得到各个位置的真实垃圾量等级,构成城市环卫分布图。The server aggregates and calculates the data after the denoising process to obtain the real garbage volume level of each location, which constitutes the urban sanitation distribution map.
具体地,所述设置位置隐私参数及垃圾量等级隐私参数,计算隐私处理参数的步骤中,设定位置隐私保护参数∈
1和垃圾量等级隐私参数∈
2,并将参数公布给每个用户。
Specifically, in the steps of setting location privacy parameters and garbage volume level privacy parameters, and calculating privacy processing parameters, set location privacy protection parameters ε 1 and garbage volume level privacy parameters ε 2 , and announce the parameters to each user.
通过布隆过滤器假阳率k,感知值集合M计算布隆过滤器长度l和哈希函数个数h,计算公式为:Calculate the length l of the Bloom filter and the number of hash functions h through the false positive rate k of the Bloom filter and the set of perception values M. The calculation formula is:
根据隐私参数∈
1、∈
2、环卫点总数F及哈希函数个数h,计算隐私处理参数:
According to the privacy parameters ∈ 1 , ∈ 2 , the total number of sanitation points F and the number of hash functions h, the privacy processing parameters are calculated:
r表示位置报告数,其计算公式为:r represents the number of position reports, and its calculation formula is:
p表示真实位置选中概率,其计算公式为:p represents the probability of selecting the real location, and its calculation formula is:
f表示位翻转概率,其计算公式为:f represents the bit flip probability, and its calculation formula is:
具体地,所述用户通过定位环卫点位置及判定环卫点垃圾量等级数,得到用户的原始真实数据的步骤中,用户通过携带的移动智能设备定位环卫点位置,并通过位置映射表得到该环卫点的编号i。所述位置映射表由服务器提前为每个环卫点的位置编号得到。用户通过自行判定,得到环卫点垃圾量的等级数level,编号和垃圾等级数即为用户的原始真实数据<i,level>。Specifically, the user obtains the original real data of the user by locating the location of the sanitation point and determining the garbage volume level of the sanitation point, the user locates the location of the sanitation point through the mobile smart device carried, and obtains the sanitation point through the location mapping table. The number i of the point. The location mapping table is obtained by the server for the location number of each sanitation point in advance. Through self-determination, the user obtains the level of the garbage volume of the sanitation point. The number and the level of garbage are the user's original real data <i,level>.
具体地,所述对位置及垃圾量等级数据分别进行本地化差分隐私处理的步骤中,用户在本地进行本地化差分隐私处理,处理步骤为:Specifically, in the step of performing localized differential privacy processing on location and garbage level data respectively, the user performs localized differential privacy processing locally, and the processing steps are:
初始化位置集合
和上传结果集合
Initialize location collection And upload the result set
生成一个(0,1)之间的随机数rand,结合隐私处理参数,对位置数据进行本地化差分隐私处理,使其满足位置隐私保护参数∈
1;
Generate a random number rand between (0,1), and combine the privacy processing parameters to perform localized differential privacy processing on the location data to meet the location privacy protection parameter ∈ 1 ;
根据本地化差分隐私处理后得到的位置集合L,对垃圾量等级数据进行本地化差分隐私处理,计算用户的上传结果集R。According to the location set L obtained after the localized differential privacy processing, localized differential privacy processing is performed on the garbage volume level data, and the user's upload result set R is calculated.
更进一步地,所述生成一个(0,1)之间的随机数rand,结合隐私处理参 数,对位置数据进行本地化差分隐私处理的步骤中,具体处理方法为:Furthermore, in the step of generating a random number rand between (0,1) and combining privacy processing parameters to perform localized differential privacy processing on location data, the specific processing method is:
对随机数rand及真实位置选中概率p的大小关系进行判断:Judge the relationship between the random number rand and the selection probability p of the real position:
如果随机数rand小于p,则位置集合为真实位置i与除去i之外的位置集合中任意选择r-1个位置的并集,表示为:If the random number rand is less than p, then the position set is the union of the real position i and any selected r-1 positions in the position set except i, expressed as:
L={i}∪getRandomElements(F\{i},r-1)L={i}∪getRandomElements(F\{i},r-1)
其中,函数getRandomElements(A,b)表示返回从集合A中随机选择b个元素的结果;Among them, the function getRandomElements(A,b) means to return the result of randomly selecting b elements from the set A;
如果随机数rand大于p,则位置集合为除去i之外的位置集合中任意选择r个位置的并集,表示为:If the random number rand is greater than p, the position set is the union of r positions arbitrarily selected from the position set except i, expressed as:
L=getRandomElements(F,r)L=getRandomElements(F,r)
综上,随机数rand与真实位置选中概率p的大小判断关系的表示方式为:In summary, the representation of the relationship between the random number rand and the selection probability p of the real location is as follows:
此处理方法能够保证用户的位置数据满足∈
1-差分隐私。
This processing method can ensure that the user's location data meets ε 1 -differential privacy.
更进一步地,所述根据本地化差分隐私处理后得到的位置集合L,对垃圾量等级数据进行本地化差分隐私处理,计算用户的上传结果集R的步骤中,如果用户的真实位置i包含在L中,真实位置垃圾量的对应项为真实垃圾量等级level
i。此时对于位置集合L中的其他位置元素t,其垃圾量等级对应项则在垃圾量等级值集合中等概率随机选择一个等级level′
t,随后对每一个垃圾量等级进行处理,具体处理方法为:
Furthermore, according to the location set L obtained after the localized differential privacy processing, localized differential privacy processing is performed on the garbage volume level data, and in the step of calculating the user’s upload result set R, if the user’s real location i is included in In L, the corresponding item of the real garbage amount is the real garbage amount level level i . At this time, the position of the element to the other set of positions t L in which the amount of waste is at level corresponding to the amount of waste entry level value set randomly selected middle-level probability level 't, then the amount of waste for each processing level, specific processing method for the :
采用h个哈希函数将垃圾量等级level
i或level′
t映射到长度为l的布隆过滤器,得到一个长度为l的向量S
t;
Use h hash functions to map the garbage volume level level i or level' t to a Bloom filter of length l to obtain a vector S t of length l;
对于S
t中的每一位S
t[j],运行一个随机数生成函数得到一个(0,1)之间的 随机数q,如果q<1-f/2,第j位保持不变,如果q>1-f/2,对第j位进行翻转,即原始值为0时,翻转为1;原始值为1时,翻转为0;得到S
t′;
For each of a S t S t [j], a random number generation function operation to obtain a random number between q (0,1), if q <1-f / 2, j-th bit remains unchanged, If q>1-f/2, flip the j-th bit, that is, when the original value is 0, flip to 1; when the original value is 1, flip to 0; get S t ′;
综上,对垃圾量等级数据进行处理的表示方式为:In summary, the way to process the garbage volume level data is:
此处理方式保证用户的垃圾量等级感知数据满足∈
2-差分隐私。
This processing method ensures that the user's garbage volume level perception data meets ε 2 -differential privacy.
将每个位置i和对应的处理过后的垃圾等级值S′
i组合成对(i,S′
i),加入到上传结果集合R,得到最后的上传结果集合R。
Combine each position i and the corresponding processed garbage level value S′ i into a pair (i, S′ i ), and add them to the upload result set R to obtain the final upload result set R.
具体地,所述服务器接收用户上传数据并对数据进行统计及去噪处理的步骤中,所述统计及去噪处理方法为:Specifically, in the step of receiving the data uploaded by the user by the server and performing statistics and denoising processing on the data, the statistics and denoising processing method are:
混合所有用户的上传数据Z,提取上传报告中所有的位置数据,计算每个位置的真实报告个数E
i,计算公式为:
Mix the upload data Z of all users, extract all the location data in the upload report, and calculate the number of real reports E i for each location. The calculation formula is:
q=(r-p)/(F-1)q=(r-p)/(F-1)
E
i=w
i
E i =w i
L
i=w
i/(pE
i+q(rN-E
i))
L i = w i /(pE i +q(rN-E i ))
V=∑
iL
i
V=∑ i L i
E
i=E
i(pL
i+q(V-L
i)
E i =E i (pL i +q(VL i )
E
i=E
i/r
E i =E i /r
其中,Z表示用户上传数据,F表示环卫点总数,p表示真实位置选择概率,r表示位置报告数,N表示用户数量,w
i表示Z中包含位置i的个数,i=1,…,F;
Among them, Z represents the user upload data, F represents the total number of sanitation points, p represents the actual location selection probability, r represents the number of location reports, N represents the number of users, w i represents the number of locations i included in Z, i=1,..., F;
每个上传位置i,服务器筛选出所有位置i的报告对数据集合D
i。假设D
i总共有N
i条数据项,真实报告个数为E
i,则噪声报告个数为N
i-E
i,对垃圾 量等级数据集合S′
i的每一位做如下统计:计算每一位为“1”的总数X
i[j],X
i[j]表示第i个位置的第j位,0≤j<l,组成一个向量X
i,X
i表示编号为i的位置收到的每一位为“1”的计数;
For each upload location i, the server filters out all the report pair data sets Di at the location i . Assuming a total of N i D i of data items, the number of transactions reported as E i, the noise is reported as the number N i -E i, of the amount of waste every level data set statistics as follows S 'i: Calculated per a "1" the total number of X i [j], X i [j] represents a j-th bit of the i-th position, 0≤j <l, consisting of a vector X i, X i represents the position number i is received Each digit that arrives is a count of "1";
X
i的来源由四部分组成,如表1所示。
Sources of X i consists of four parts, as shown in Table 1.
表1Table 1
在真实数据集中,假设原始垃圾量等级经过布隆过滤器后,总共有Z
i个“1”,其有(1-f/2)的概率保持为“1”,E
i-Z
i个“0”,其有f/2的概率翻转为“1”;同理,假设噪声数据的原始垃圾量等级经过布隆过滤器后共有U
i个“1”,其有(1-f/2)的概率保持为“1”;N
i-E
i-U
i个“0”,共有f/2的概率翻转为“1”。
In the real data set, assuming that the original amount of waste after the Bloom filter rating, a total Z i "1", which is the probability (1-f / 2) is maintained at "1", E i -Z i th "0", it has the probability of f/2 to flip to "1"; in the same way, assume that the original garbage level of the noise data has U i "1"s after the Bloom filter, which has (1-f/2) the probability remains "1"; N i -E i -U i "0", a total of f / 2 flip probability is "1."
因此下列等式成立:Therefore the following equation holds:
由于加入的噪声呈均匀分布,上式中的U
i能够通过计算得到,因此,上式可推导得:
Since the added noise is uniformly distributed, U i in the above formula can be obtained by calculation. Therefore, the above formula can be derived as:
由此求得原始真实的垃圾量等级在各个位上的“1”的个数组成的向量Z
i。
From this, a vector Z i composed of the number of "1"s in each bit of the original real garbage volume level is obtained.
由于步骤1的频率估计存在一定的误差,数据生成和处理过程也有一 定的随机性,Z
i计算的值也存在一定的误差,因此需要一定的分析方法来得到位置i的真实垃圾量等级。
Since there is a certain error in the frequency estimation in step 1, the data generation and processing process also has a certain randomness, and the value calculated by Z i also has a certain error. Therefore, a certain analysis method is required to obtain the real garbage volume level at position i.
具体地,所述服务器对经过去噪处理后的数据进行聚合计算,得到各个位置的真实垃圾量等级,构成城市环卫分布图步骤中,将垃圾量等级的所有w个候选值通过布隆过滤器的h个哈希函数处理后,组成一个w·l的矩阵作为特征矩阵Q,经过去噪处理得到的所有Z
i[j](0≤j<l)组成长度为l的向量Z
i,求解线性回归问题Z
i=A·Q,向量A(1·w)中最大值所对应位置的垃圾量等级即为位置i的真实垃圾量等级。
Specifically, the server aggregates and calculates the data after denoising processing to obtain the real garbage volume level of each location, and in the step of forming the city sanitation distribution map, all w candidate values of the garbage volume level are passed through the bloom filter After the h hash functions of, form a matrix of w·l as the characteristic matrix Q, all Z i [j] (0≤j<l) obtained by denoising processing form a vector Z i of length l, and solve Linear regression problem Z i =A·Q, the garbage volume level at the position corresponding to the maximum value in the vector A(1·w) is the real garbage volume level at position i.
通过上述步骤能够计算出每个位置的真实垃圾量等级,从而得到城市垃圾量分布图。Through the above steps, the real garbage volume level of each location can be calculated, thereby obtaining the urban garbage volume distribution map.
本发明相较于现有技术,具有以下的有益效果:Compared with the prior art, the present invention has the following beneficial effects:
1.本发明相较于基于物联网技术的智能垃圾桶,大大节省了硬件设施的部署成本、网络连接的通信开销和设备运维的人力资源,充分节约了社会资源;1. Compared with the smart trash can based on the Internet of Things technology, the present invention greatly saves the deployment cost of hardware facilities, the communication overhead of network connection and the human resources of equipment operation and maintenance, and fully saves social resources;
2.本发明不依赖可信的第三方服务器,数据完全由用户在移动端处理,更彻底地保证了用户的数据隐私安全;2. The present invention does not rely on a trusted third-party server, and the data is completely processed by the user on the mobile terminal, which more thoroughly guarantees the user's data privacy and security;
3.本发明在保证了聚合准确性的同时,大大减少了用户参与人数的最低限额,是现实可行的方案;3. While ensuring the accuracy of aggregation, the present invention greatly reduces the minimum number of user participants, which is a realistic and feasible solution;
4.本发明在提升速度同时保持了方案的安全性。4. The present invention increases the speed while maintaining the security of the scheme.
附图说明Description of the drawings
图1是一种本地化差分隐私城市环卫数据报告和隐私计算方法的流程图。Figure 1 is a flowchart of a localized differential privacy urban sanitation data report and privacy calculation method.
图2是一种本地化差分隐私城市环卫数据报告和隐私计算方法的结构 示意图。Figure 2 is a structural diagram of a localized differential privacy urban sanitation data report and privacy calculation method.
图3是本发明实施例中垃圾量等级随机扰动的样例图。Fig. 3 is a sample diagram of random disturbance of the garbage volume level in the embodiment of the present invention.
具体实施方式detailed description
下面结合实施例及附图对本发明作进一步详细的描述,但本发明的实施方式不限于此。The present invention will be further described in detail below in conjunction with the examples and drawings, but the implementation of the present invention is not limited thereto.
实施例Example
如图1、2所示分别为一种本地化差分隐私城市环卫数据报告和隐私计算方法的流程图及结构图,所述方法的具体步骤包括:Figures 1 and 2 are respectively a flowchart and structure diagram of a localized differential privacy urban sanitation data report and privacy calculation method. The specific steps of the method include:
设置位置隐私参数及垃圾量等级隐私参数,计算隐私处理参数;Set location privacy parameters and garbage volume level privacy parameters, and calculate privacy processing parameters;
用户通过定位环卫点位置及判定环卫点垃圾量等级数,得到用户的原始真实数据;The user obtains the user's original real data by locating the location of the sanitation point and judging the garbage volume level of the sanitation point;
对位置及垃圾量等级数据分别进行本地化差分隐私处理并上传到服务器;Perform localized differential privacy processing on location and waste level data and upload them to the server;
服务器接收用户上传数据并对数据进行统计及去噪处理;The server receives data uploaded by users and performs statistics and denoising processing on the data;
服务器对经过去噪处理后的数据进行聚合计算,得到各个位置的真实垃圾量等级,构成城市环卫分布图。The server aggregates and calculates the data after the denoising process to obtain the real garbage volume level of each location, which constitutes the urban sanitation distribution map.
在本实施例中,假设城市某区域环卫点个数总共为50个,垃圾量的等级有3个等级,分别为High、Middle和Low,设置位置的隐私参数为∈
1=1.6,垃圾量等级的隐私参数∈
2=1.9,布隆过滤器假阳率为0.1,则计算得到最优布隆过滤器长度
哈希函数个数为
隐私处理参数
In this embodiment, it is assumed that the total number of sanitation points in a certain area of the city is 50, the level of garbage volume has 3 levels, namely High, Middle, and Low. The privacy parameter of the set location is ∈ 1 = 1.6, and the level of garbage volume The privacy parameter ∈ 2 = 1.9, the false positive rate of the bloom filter is 0.1, then the optimal bloom filter length is calculated The number of hash functions is Privacy processing parameters
假设一个感知用户经过一个环卫点时,要对该环卫点的垃圾量进行报告:首先通过移动设备的GPS传感器对当前环卫点位置进行定位,然后通过提前设置的位置-编号表匹配到该环卫点的编号,如该环卫点的编号为12,用户通过对垃圾量的观察给定相应的垃圾量等级,如判定该环卫点的真实垃圾量等级为Middle,数据<12,Middle>即为该感知参与用户的真实感知数据。Suppose that when a perceiving user passes a sanitation point, he needs to report the amount of garbage at the sanitation point: first locate the current sanitation point location through the GPS sensor of the mobile device, and then match the sanitation point through the location-number table set in advance If the number of the sanitation point is 12, the user sets the corresponding garbage volume level by observing the amount of garbage. If it is determined that the actual garbage volume level of the sanitation point is Middle, the data <12, Middle> is the perception Real perception data of participating users.
根据上述计算的参数,用户在本地运行一个随机数生成函数生成一个随机数rand=0.67,由于0.67>0.486,该用户的位置集合为从集合{1,2,…11,13…,50}中随机选择8个位置作为上传位置,由于真实感知位置不在位置集合中,所以所有8个位置对应的垃圾量等级都从垃圾量等级集合中随机抽取决定。假设经过8次随机化选择后得到的数据集合为:{<1,Middle>,<5,High>,<9,Low>,<14,Middle>,<21,Low>,<23,High>,<36,Low>,<41,Middle>}。According to the above calculated parameters, the user runs a random number generation function locally to generate a random number rand=0.67. Since 0.67>0.486, the user's location set is from the set {1,2,...11,13...,50} Eight locations are randomly selected as upload locations. Since the real perception location is not in the location set, the garbage volume levels corresponding to all eight locations are randomly selected from the garbage volume level set. Assuming that the data set obtained after 8 randomization selections is: {<1,Middle>,<5,High>,<9,Low>,<14,Middle>,<21,Low>,<23,High> ,<36,Low>,<41,Middle>}.
假设垃圾量等级经过布隆过滤器后得到对应的向量分别为{High:1000001001000,Middle:0100000000011,Low:1000001000001}。集合中对应的各个垃圾量等级经过布隆过滤器后的每一位通过生成的随机数和参数
比较,小于0.58保持不变,大于0.58进行翻转,得到最终上传的集合为:
Assuming that the garbage volume level passes through the Bloom filter, the corresponding vectors are respectively {High:1000001001000, Middle:0100000000011, Low:1000001000001}. Each of the corresponding garbage volume levels in the set passes the generated random number and parameters after the Bloom filter Comparing, less than 0.58 remains unchanged, greater than 0.58 to flip, the final uploaded collection is:
{<1,0101010011011>,<5,1011100100110>,<9,0111000101011>,<14,1101000101011>,<21,1010111101010>,<23,0110101101010>,<36,1101010100111>,<41,0011011011010>}{<1,0101010011011>,<5,1011100100110>,<9,0111000101011>,<14,1101000101011>,<21,1010111101010>,<23,0110101101010>,<36,1101010100111>,<41,0011011011010>}
垃圾量等级随机扰动的样例图如图3所示。A sample diagram of random disturbance of the garbage volume level is shown in Figure 3.
当服务器接收到所有感知用户的上传值后,对所有的位置项进行概率 统计,能大致估计每个位置真实的报告个数,如服务器总共收到2000个用户发来的数据,其中位置数据共有2000×8=16000个,估计出真实位置为“1”的用户数为50个,实际收到的所有报告中包含位置为“1”的用户数为300个,由此推断所有为“1”的数据项中大约有250个数据项是噪声位置数据,噪声位置数据项的垃圾量等级是随机均匀选取的,由此可估算出每个垃圾量等级的噪声数据大约有250/3个,记为Noise=(noise)·3。When the server receives the upload value of all the perceived users, it will perform probability statistics on all the location items, and can roughly estimate the number of true reports for each location. For example, the server receives data from 2000 users in total, and the location data is shared 2000×8=16000, it is estimated that the number of users whose real position is "1" is 50, and all the reports actually received contain 300 users whose position is "1", so it is inferred that all users are "1" About 250 data items in the data items are noise location data. The garbage volume level of the noise location data items is randomly and uniformly selected. From this, it can be estimated that there are about 250/3 noise data for each garbage volume level. It is Noise=(noise)·3.
将垃圾量等级对应的原始布隆过滤器的值组合成一个矩阵Q,表示为:Combine the values of the original Bloom filter corresponding to the garbage volume level into a matrix Q, expressed as:
将每一个位置的垃圾量等级数据筛选出来单独分析,如所有位置为“1”的垃圾量等级项,表示形式为:The garbage volume level data of each location is filtered out and analyzed separately. For example, all garbage volume level items with location "1" are expressed in the form:
第0位 Bit 0
|
第1位No. 1
|
……
|
第l位 L position
|
11
|
00
|
11
|
00
|
……
|
……
|
……
|
……
|
00
|
11
|
11
|
00
|
以列为单位进行求和,得到一个长度为l的向量(sum
1,…,sum
l),记为X,根据方案中计算得到的噪声估计量,计算每个噪声等级原始为“1”的个数向量U,U=Noise·A,根据统计计算的真实估计值、噪声值和计算公式,计算各个位原始为“1”的个数,为长度为l的向量Z。
Sum in the column as a unit, and get a vector of length l (sum 1 ,...,sum l ), denoted as X, and calculate the original noise level of "1" according to the noise estimate calculated in the scheme The number vector U, U=Noise·A, according to the actual estimated value, noise value and calculation formula of statistical calculation, the number of each bit originally "1" is calculated, which is the vector Z of length l.
求解线性回归方程问题Z=A·Q,向量A中最大值对应的垃圾量等级即为该位置的估计真实垃圾量等级。Solving the linear regression equation problem Z=A·Q, the garbage volume level corresponding to the maximum value in the vector A is the estimated real garbage volume level at the location.
所有位置的垃圾量等级能够通过上述过程依次求出,从而得到城市该区域的垃圾量等级分布图。The garbage volume levels of all locations can be obtained sequentially through the above process, so as to obtain the garbage volume level distribution map in this area of the city.
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments, and any other changes, modifications, substitutions, combinations, etc. made without departing from the spirit and principle of the present invention Simplified, all should be equivalent replacement methods, and they are all included in the protection scope of the present invention.