CN113177166B - A method and system for personalized location semantic publishing based on differential privacy - Google Patents

A method and system for personalized location semantic publishing based on differential privacy Download PDF

Info

Publication number
CN113177166B
CN113177166B CN202110449465.5A CN202110449465A CN113177166B CN 113177166 B CN113177166 B CN 113177166B CN 202110449465 A CN202110449465 A CN 202110449465A CN 113177166 B CN113177166 B CN 113177166B
Authority
CN
China
Prior art keywords
semantic
noise
sensitivity
semantics
sem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110449465.5A
Other languages
Chinese (zh)
Other versions
CN113177166A (en
Inventor
王豪
李雷
肖弋杭
夏英
张旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiangzhi Haobin Technology Co.,Ltd.
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202110449465.5A priority Critical patent/CN113177166B/en
Publication of CN113177166A publication Critical patent/CN113177166A/en
Application granted granted Critical
Publication of CN113177166B publication Critical patent/CN113177166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2111Location-sensitive, e.g. geographical location, GPS

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Remote Sensing (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种基于差分隐私的个性化位置位置语义发布方法及系统,属于数据挖掘及隐私保护领域,首先根据用户的隐私保护需求设置参数为l的语义隐私保护等级,计算出距离待保护语义最近的l‑1个位置语义;接着根据用户的语义访问次数计算所有位置的语义敏感度,基于语义敏感度分别得到l个语义的发布概率;最后由特定参数的高斯变量和指数分布变量生成符合特定概率的拉普拉斯变量,即为发布的符合特定语义敏感度的用户位置。解决了现有差分隐私发布位置方法中没有保护位置语义的问题。

Figure 202110449465

The invention relates to a method and system for publishing personalized location semantics based on differential privacy, belonging to the field of data mining and privacy protection. Firstly, according to the privacy protection requirements of users, a semantic privacy protection level with a parameter of 1 is set, and the distance to be protected semantics is calculated. The semantics of the most recent l-1 locations; then, the semantic sensitivity of all locations is calculated according to the number of semantic visits of the user, and the release probabilities of l semantics are obtained based on the semantic sensitivity; finally, the Gaussian variables and exponential distribution variables of specific parameters are used to generate The Laplacian variable with a specific probability, that is, the posted user location that meets a specific semantic sensitivity. It solves the problem of not protecting the location semantics in the existing differential privacy publishing location methods.

Figure 202110449465

Description

一种基于差分隐私的个性化位置语义发布方法及系统A method and system for personalized location semantic publishing based on differential privacy

技术领域technical field

本发明属于数据挖掘及隐私保护领域,涉及一种基于差分隐私的个性化位置语义发布方法及系统。The invention belongs to the field of data mining and privacy protection, and relates to a method and system for publishing personalized location semantics based on differential privacy.

背景技术Background technique

随着手机等移动终端设备的广泛使用以及无线通信技术的快速发展,基于位置的服务(Location-Based Service,LBS)使用的越来越频繁,LBS通过定位技术,可以为用户提供如位置签到、周边商铺搜索、信息推送等的服务。在位置服务过程中会产生大量的空间位置数据,为了根据用户的喜好进行相关的推送,LBS提供商会将采集到的用户位置数据进行上传发布和分享。但是共享的位置数据中可能涉及用户的一些敏感信息,数据所有者可能不想直接分享自己的位置数据。With the widespread use of mobile terminal equipment such as mobile phones and the rapid development of wireless communication technology, location-based services (Location-Based Service, LBS) are used more and more frequently. Surrounding store search, information push and other services. In the process of location service, a large amount of spatial location data will be generated. In order to make relevant pushes according to the user's preferences, the LBS provider will upload, publish and share the collected user location data. However, some sensitive information of users may be involved in the shared location data, and the data owner may not want to share their location data directly.

目前已有的位置隐私保护方法主要分为三种:基于空间匿名、基于加密和基于扰动的方法。空间匿名主要是将用户的位置进行隐藏,设置相应的的匿名参数级别,将用户的原始值和匿名值混淆来达到保护用户位置隐私的效果,但是基于匿名的保护方式匿名参数等级难以设置,而且匿名之后的数据可用性不高;基于加密的位置隐私保护方法通常利用对称加密和非对称加密算法来加密位置数据,从而隐藏位置数据的真实值,但是基于加密的方法往往比较复杂,对通信资源的消耗非常巨大;基于扰动的方法中,以差分隐私保护方法为代表,由于其严谨的数学推理模型,并对攻击者所具有的背景知识没有限制,已经成为位置隐私保护最重要的隐私保护方法。At present, there are mainly three types of location privacy protection methods: based on spatial anonymity, based on encryption and based on perturbation. Spatial anonymity mainly hides the user's location, sets the corresponding anonymous parameter level, and confuses the user's original value with the anonymous value to protect the privacy of the user's location. However, the anonymous parameter level is difficult to set based on anonymity-based protection methods, and The availability of data after anonymity is not high; encryption-based location privacy protection methods usually use symmetric encryption and asymmetric encryption algorithms to encrypt location data, thereby hiding the true value of location data, but encryption-based methods are often more complicated, and the impact on communication resources is limited. The consumption is very huge; among the perturbation-based methods, the differential privacy protection method is the most important privacy protection method for location privacy protection due to its rigorous mathematical reasoning model and no restrictions on the background knowledge of the attacker.

目前位置差分隐私保护通常利用拉普拉斯噪声机制,对原始位置的经纬度进行小范围的扰动,在保护位置精确经纬度数据的同时,能够提供较高的数据可用性。但位置语义作为位置信息的重要组成部分,往往包含用户的敏感信息(例如,家庭住址、签到地点等),现有位置差分隐私保护方法仅仅保护了位置的经纬度数据,没有保护用户的位置语义,攻击者通过位置语义推断,能够得到用户的位置语义信息。如何在发布用户位置时既能保护用户的空间位置数据,又能保护用户的位置语义是一个亟待解决的问题。The current location differential privacy protection usually uses the Laplacian noise mechanism to perturb the latitude and longitude of the original location in a small range, which can provide high data availability while protecting the precise latitude and longitude data of the location. However, location semantics, as an important part of location information, often contains sensitive information of users (for example, home address, check-in location, etc.). The existing differential privacy protection method only protects the latitude and longitude data of the location, but does not protect the location semantics of users. The attacker can obtain the location semantic information of the user through location semantic inference. How to protect both the user's spatial location data and the user's location semantics when publishing the user's location is an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明的目的在于提供一种基于差分隐私的个性化位置语义发布方法及系统,首先根据用户的隐私保护需求设置参数为l的语义隐私保护等级,计算出距离待保护语义最近的l-1个位置语义;接着根据用户的语义访问次数计算所有位置的语义敏感度,基于语义敏感度分别得到l个语义的发布概率;最后由特定参数的高斯变量和指数分布变量生成符合特定概率的拉普拉斯变量,即为发布的符合特定语义敏感度的用户位置。In view of this, the purpose of the present invention is to provide a method and system for publishing personalized location semantics based on differential privacy. First, according to the user's privacy protection requirements, a semantic privacy protection level with a parameter of 1 is set, and the semantic privacy protection level closest to the semantics to be protected is calculated. l-1 location semantics; then calculate the semantic sensitivity of all locations according to the number of semantic visits of the user, and obtain the release probability of l semantics based on the semantic sensitivity; finally, the Gaussian variables and exponential distribution variables of specific parameters are used to generate certain probabilities. The Laplacian variable of , which is the published user location that meets a certain semantic sensitivity.

为达到上述目的,本发明提供如下技术方案:To achieve the above object, the present invention provides the following technical solutions:

一方面,本发明提供一种基于差分隐私的个性化位置语义发布方法,包括以下步骤:In one aspect, the present invention provides a method for publishing personalized location semantics based on differential privacy, comprising the following steps:

S1:数据预处理,对原始采集到的位置数据进行数据清洗和规约,得到待保护的位置敏感数据X={x1,...,xi,...,xn},共有n个位置,其中xi表示第i个位置,

Figure BDA0003038194390000021
其中
Figure BDA0003038194390000022
分别表示第i个位置的经度、纬度和语义;S1: Data preprocessing, performing data cleaning and reduction on the originally collected location data, and obtaining the location-sensitive data to be protected X={x 1 ,..., xi ,...,x n }, a total of n position, where x i represents the ith position,
Figure BDA0003038194390000021
in
Figure BDA0003038194390000022
respectively represent the longitude, latitude and semantics of the i-th position;

S2:根据语义隐私保护需求设置相应的语义隐私保护等级l;S2: Set the corresponding semantic privacy protection level l according to the semantic privacy protection requirements;

S3:计算语义敏感度;S3: Calculate semantic sensitivity;

S4:噪声生成,根据拉普拉斯噪声的生成原理生成符合特定语义敏感度的拉普拉斯噪声;S4: Noise generation, according to the generation principle of Laplacian noise to generate Laplacian noise that conforms to a specific semantic sensitivity;

S5:噪声加入,向位置数据加入所求得的拉普拉斯噪声得到新的位置数据X′lng=Xlng+Ylng,X′lat=Xlat+YlatS5: adding noise, adding the obtained Laplacian noise to the position data to obtain new position data X' lng =X lng +Y lng , X' lat =X lat +Y lat ;

S6:迭代处理,迭代处理每一个位置,重复步骤S2-S5,直到所有位置数据处理完成;S6: iterative processing, iterative processing of each position, repeating steps S2-S5, until all position data processing is completed;

S7:数据发布,对于每个处理之后的位置数据,都有新的经过扰动之后的位置数据与之对应,并且这些位置至少处于l个不同的语义中,从中选取一个位置语义作为用户的位置发布,发布的新位置为X′={x′1,x′2,...,x′i,...,x′n},其中x′i表示加噪声后的第i个位置

Figure BDA0003038194390000023
分别表示加噪声后第i个位置的经度、纬度和语义。S7: Data release, for each processed location data, there is new perturbed location data corresponding to it, and these locations are in at least l different semantics, and one location semantics is selected as the user's location release , the new position released is X′={x′ 1 ,x′ 2 ,...,x′ i ,...,x′ n }, where x′ i represents the ith position after adding noise
Figure BDA0003038194390000023
represent the longitude, latitude and semantics of the i-th position after adding noise, respectively.

进一步,步骤S3具体包括以下步骤:Further, step S3 specifically includes the following steps:

S31:根据欧式距离计算第i个位置xi最近的l-1个语义,将本身所属的语义和这l-1个语义作为一个语义集合Sem(sem1,sem2,...,semi,...,seml),其中

Figure BDA0003038194390000024
表示第i个语义的经纬度范围;S31: Calculate the nearest l-1 semantics of the i-th position x i according to the Euclidean distance, and use the semantics to which it belongs and the l-1 semantics as a semantic set Sem(sem 1 ,sem 2 ,...,sem i ,...,sem l ), where
Figure BDA0003038194390000024
Indicates the latitude and longitude range of the i-th semantic;

S32:位置语义敏感度计算,计算步骤S3-1得到的l个语义的敏感度

Figure BDA0003038194390000025
计算公式如下式,S32: position semantic sensitivity calculation, calculating the sensitivity of the l semantics obtained in step S3-1
Figure BDA0003038194390000025
The calculation formula is as follows,

Figure BDA0003038194390000026
Figure BDA0003038194390000026

其中,H(semi)表示语义semi被访问的总次数,L表示所有语义被访问的次数之和。Among them, H(sem i ) represents the total number of times that semantic sem i is accessed, and L represents the sum of the times that all semantics are accessed.

进一步,所述步骤S4具体包括以下步骤:Further, the step S4 specifically includes the following steps:

S41:根据语义敏感度和位置语义范围求出高斯逆累计分布函数标准差σ1,σ2S41: Calculate the standard deviation σ 1 , σ 2 of the Gaussian inverse cumulative distribution function according to the semantic sensitivity and the position semantic range:

Figure BDA0003038194390000031
Figure BDA0003038194390000031

Figure BDA0003038194390000032
Figure BDA0003038194390000032

其中,μ为0,σ1,σ2即为所求的高斯标准差参数;Among them, μ is 0, σ 1 , σ 2 are the required Gaussian standard deviation parameters;

S42:根据语义敏感度和位置语义范围求出逆累计指数分布函数生成指数分布所需要的参数λ12S42: Calculate the parameters λ 1 , λ 2 required by the inverse cumulative exponential distribution function to generate the exponential distribution according to the semantic sensitivity and the position semantic range:

Figure BDA0003038194390000033
Figure BDA0003038194390000033

Figure BDA0003038194390000034
Figure BDA0003038194390000034

S43:根据步骤S41所求得的高斯分布参数,生成高斯分布噪声Zlng,ZlatS43: generate Gaussian distribution noise Z lng , Z lat according to the Gaussian distribution parameters obtained in step S41;

S44:根据步骤S42所求得的指数分布参数,生成指数分布噪声Wlng,WlatS44: generate exponentially distributed noises W lng , W lat according to the exponential distribution parameters obtained in step S42;

S45:计算广义拉普拉斯变量

Figure BDA0003038194390000035
Figure BDA0003038194390000036
其中Ylng,Ylat即为产生的符合特定语义敏感度的经纬度噪声。S45: Compute generalized Laplace variables
Figure BDA0003038194390000035
Figure BDA0003038194390000036
Among them, Y lng and Y lat are the generated longitude and latitude noises that conform to specific semantic sensitivity.

另一方面,本发明提供一种基于差分隐私的个性化位置语义发布系统,包括In another aspect, the present invention provides a personalized location semantic publishing system based on differential privacy, comprising:

数据预处理模块:用于对原始采集到的位置数据进行数据清洗和规约,得到待保护的位置数据X={x1,x2,...,xi,...,xn},其中xi表示第i个位置,

Figure BDA0003038194390000037
Figure BDA0003038194390000038
Figure BDA0003038194390000039
分别表示第i个位置的经度、纬度和语义。Data preprocessing module: used to clean and reduce the original collected position data to obtain the position data to be protected X={x 1 ,x 2 ,..., xi ,...,x n }, where x i represents the ith position,
Figure BDA0003038194390000037
Figure BDA0003038194390000038
Figure BDA0003038194390000039
respectively represent the longitude, latitude and semantics of the ith position.

参数设置模块:用于设置语义位置隐私水平保护参数l。Parameter setting module: used to set the semantic location privacy level protection parameter l.

语义敏感度计算模块:用于计算这l个语义的敏感度Psem=(psem1,psem2,...,pseml);Semantic sensitivity calculation module: used to calculate the sensitivity of the l semantics P sem = (p sem1 , p sem2 , . . . , p seml );

噪声生成模块:用于生成符合特定语义敏感度的拉普拉斯噪声;Noise generation module: used to generate Laplacian noise that conforms to a specific semantic sensitivity;

噪声加入模块:用于向位置数据加入噪声生成模块中第五单元所求得的广义拉普拉斯噪声得到新的位置数据X′lng=Xlng+Ylng,X′lat=Xlat+YlatNoise adding module: used to add the generalized Laplacian noise obtained by the fifth unit in the noise generating module to the position data to obtain new position data X′ lng =X lng +Y lng , X′ lat =X lat +Y lat ;

迭代处理模块:用于迭代处理每一个位置,直到所有位置数据更新完成;Iterative processing module: used to iteratively process each position until all position data updates are completed;

数据发布模块:对于每个处理之后的位置数据,都有新的经过扰动之后的位置数据与之对应,并且这些位置至少处于l个不同的语义中,从中选取一个位置语义作为用户的位置发布,发布的新位置为X′={x′1,x′2,...,x′i,...,x′n},其中x′i表示加噪声后的第i个位置

Figure BDA0003038194390000041
Figure BDA0003038194390000042
分别表示加噪声后第i个位置的经度、纬度和语义。Data publishing module: For each processed position data, there is new perturbed position data corresponding to it, and these positions are in at least l different semantics, and one position semantics is selected as the user's position to publish, The new position published is X′={x′ 1 ,x′ 2 ,...,x′ i ,...,x′ n }, where x′ i represents the ith position after adding noise
Figure BDA0003038194390000041
Figure BDA0003038194390000042
represent the longitude, latitude and semantics of the i-th position after adding noise, respectively.

进一步,所述语义敏感度计算模块包括以下子单元:Further, the semantic sensitivity calculation module includes the following subunits:

语义敏感度计算第一单元:根据欧式距离计算第i位置最近的l-1个语义,将本身所属的语义和这l-1个语义作为一个语义集合Sem=(sem1,sem2,...,seml),其中

Figure BDA0003038194390000043
表示第i个语义的经纬度范围;The first unit of semantic sensitivity calculation: calculate the l-1 semantics closest to the i-th position according to the Euclidean distance, and take the semantics to which it belongs and the l-1 semantics as a semantic set Sem=(sem 1 , sem 2 , .. .,sem l ), where
Figure BDA0003038194390000043
Indicates the latitude and longitude range of the i-th semantic;

语义敏感度计算第二单元:位置语义敏感度计算,计算这l个语义的敏感度

Figure BDA0003038194390000044
如下式:The second unit of semantic sensitivity calculation: positional semantic sensitivity calculation, calculating the sensitivity of these l semantics
Figure BDA0003038194390000044
The formula is as follows:

Figure BDA0003038194390000045
Figure BDA0003038194390000045

其中,H(semi)表示语义semi被访问的总次数,L表示所有语义被访问的次数之和。Among them, H(sem i ) represents the total number of times that semantic sem i is accessed, and L represents the sum of the times that all semantics are accessed.

进一步,所述噪声生成模块包括以下子单元:Further, the noise generation module includes the following subunits:

噪声生成第一单元,根据语义敏感度和位置语义范围求出高斯逆累计分布函数标准差σ1,σ2The first unit of noise generation, according to the semantic sensitivity and the position semantic range, calculate the standard deviation of the Gaussian inverse cumulative distribution function σ 1 , σ 2 ,

Figure BDA0003038194390000046
Figure BDA0003038194390000046

Figure BDA0003038194390000047
Figure BDA0003038194390000047

其中,μ为0,σ12即为我们所求的参数;Among them, μ is 0, σ 1 , σ 2 are the parameters we seek;

噪声生成第二单元,根据语义敏感度和位置语义范围求出逆累计指数分布函数生成指数分布所需要的参数λ12The second unit of noise generation, according to the semantic sensitivity and the position semantic range, obtains the parameters λ 1 , λ 2 required by the inverse cumulative exponential distribution function to generate the exponential distribution;

Figure BDA0003038194390000048
Figure BDA0003038194390000048

Figure BDA0003038194390000049
Figure BDA0003038194390000049

噪声生成第三单元,根据步骤S4-1所求得的高斯分布参数,生成高斯分布噪声Zlng,ZlatThe third unit of noise generation, according to the Gaussian distribution parameters obtained in step S4-1, generates Gaussian distribution noise Z lng , Z lat ;

噪声生成第四单元,根据步骤S4-2所求得的指数分布参数,生成指数分布噪声Wlng,WlatThe fourth unit of noise generation, according to the exponential distribution parameters obtained in step S4-2, generates exponentially distributed noises W lng , W lat ;

噪声生成第五单元,计算广义拉普拉斯变量

Figure BDA00030381943900000410
Figure BDA00030381943900000411
Ylng,Ylat即为产生的符合特定语义敏感度的经纬度噪声。Noise Generation Unit 5, computes generalized Laplace variables
Figure BDA00030381943900000410
Figure BDA00030381943900000411
Y lng , Y lat are the generated latitude and longitude noise that conforms to specific semantic sensitivity.

本发明的有益效果在于:本发明可以生成具有特定概率的拉普拉斯噪声,不仅可以保护位置的经纬度数据隐私,而且可以保护用户的位置语义隐私;可以根据用户对不同语义敏感度的隐私保护需求,生成符合特定语义敏感度的噪声,实现对用户位置语义的个性化保护;实施过程和步骤简单易实现,提高了发布数据的可用性并降低了通信资源的消耗,具有重要的市场价值。The beneficial effects of the present invention are: the present invention can generate Laplacian noise with a specific probability, not only can protect the privacy of the latitude and longitude data of the location, but also protect the semantic privacy of the user's location; the privacy protection can be based on the user's different semantic sensitivities It generates noise that meets specific semantic sensitivity and realizes personalized protection of user location semantics; the implementation process and steps are simple and easy to implement, improve the availability of published data and reduce the consumption of communication resources, and have important market value.

本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the present invention will be set forth in the description that follows, and will be apparent to those skilled in the art based on a study of the following, to the extent that is taught in the practice of the present invention. The objectives and other advantages of the present invention may be realized and attained by the following description.

附图说明Description of drawings

为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作优选的详细描述,其中:In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be preferably described in detail below with reference to the accompanying drawings, wherein:

图1是本发明实施例所述基于差分隐私的个性化位置语义发布方法步骤流程图;FIG. 1 is a flowchart showing the steps of a method for publishing personalized location semantics based on differential privacy according to an embodiment of the present invention;

图2是本发明实施例提供的总体流程图;2 is an overall flow chart provided by an embodiment of the present invention;

图3是本发明实施例所述的基于差分隐私的个性化位置语义发布系统示意图。FIG. 3 is a schematic diagram of a system for publishing personalized location semantics based on differential privacy according to an embodiment of the present invention.

具体实施方式Detailed ways

以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。Embodiments of the present invention are described below through specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the drawings provided in the following embodiments are only used to illustrate the basic idea of the present invention in a schematic manner, and the following embodiments and features in the embodiments can be combined with each other without conflict.

其中,附图仅用于示例性说明,表示的仅是示意图,而非实物图,不能理解为对本发明的限制;为了更好地说明本发明的实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;对本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。Among them, the accompanying drawings are only used for exemplary description, and represent only schematic diagrams, not physical drawings, and should not be construed as limitations of the present invention; in order to better illustrate the embodiments of the present invention, some parts of the accompanying drawings will be omitted, The enlargement or reduction does not represent the size of the actual product; it is understandable to those skilled in the art that some well-known structures and their descriptions in the accompanying drawings may be omitted.

本发明实施例的附图中相同或相似的标号对应相同或相似的部件;在本发明的描述中,需要理解的是,若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此附图中描述位置关系的用语仅用于示例性说明,不能理解为对本发明的限制,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。The same or similar numbers in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms “upper”, “lower”, “left” and “right” , "front", "rear" and other indicated orientations or positional relationships are based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must be It has a specific orientation, is constructed and operated in a specific orientation, so the terms describing the positional relationship in the accompanying drawings are only used for exemplary illustration, and should not be construed as a limitation of the present invention. situation to understand the specific meaning of the above terms.

请参阅图1~图3,图1和图2分别是实施本发明的总体方法流程图,本发明提供的特定语义敏感度的拉普拉斯噪声生成方法的实施例具体步骤包括:Please refer to FIG. 1 to FIG. 3. FIG. 1 and FIG. 2 are flowcharts of the overall method for implementing the present invention, respectively. The specific steps of the embodiment of the method for generating Laplacian noise with specific semantic sensitivity provided by the present invention include:

步骤S1,数据预处理,对原始采集到的位置数据进行数据清洗和规约,得到待保护的位置敏感数据X={x1,...,xi,...,xn},共有n个位置,其中xi表示第i个位置,

Figure BDA0003038194390000061
Figure BDA0003038194390000062
分别表示第i个位置的经度、纬度和语义。Step S1, data preprocessing, performing data cleaning and reduction on the originally collected location data, to obtain the location-sensitive data to be protected X={x 1 ,..., xi ,...,x n }, a total of n positions, where x i represents the ith position,
Figure BDA0003038194390000061
Figure BDA0003038194390000062
respectively represent the longitude, latitude and semantics of the ith position.

实施例中,将签到数据进行清洗和规约得到待保护的签到数据X={x1,x2,...,x1303}。In the embodiment, the check-in data is cleaned and reduced to obtain the check-in data X={x 1 , x 2 , . . . , x 1303 } to be protected.

步骤S2,设置语义隐私保护等级参数,用户根据自身的语义隐私保护需求设置相应的隐私保护等级l。Step S2, setting the semantic privacy protection level parameter, and the user sets the corresponding privacy protection level 1 according to his own semantic privacy protection requirements.

实施例中,设置语义隐私保护水平参数l=4,具体实施时可有本领域技术人员自行设定语义隐私保护水平参数。In the embodiment, the semantic privacy protection level parameter l=4 is set, and a person skilled in the art may set the semantic privacy protection level parameter by themselves during specific implementation.

步骤S3,语义敏感度计算,包括以下步骤,Step S3, semantic sensitivity calculation, including the following steps,

步骤S3-1,根据欧式距离计算第i个位置xi最近的l-1个语义,将本身所属的语义和这l-1个语义作为一个语义集合Sem=(sem1,sem2,...,seml),其中

Figure BDA0003038194390000063
表示第i个语义的经纬度范围。Step S3-1, calculate the nearest l-1 semantics of the i-th position x i according to the Euclidean distance, and take the semantics to which it belongs and the l-1 semantics as a semantic set Sem=(sem 1 , sem 2 , .. .,sem l ), where
Figure BDA0003038194390000063
Indicates the latitude and longitude range of the i-th semantic.

实施例中,取第一个位置点x1={106.61704,29.541919,<新华书店>},根据欧式距离计算第一个位置附近最近的3个语义分别为{<新世纪超市>,<中国移动>,<肯德基>}这四种语义的范围分别为,新华书店的语义范围为{[106.61647,106.61760],[29.541427,29.542410]},新世纪超市的语义范围为{[106.61495,106.616128],[29.541529,29.54255]},中国移动的语义范围为{[106.613461,106.61486],[29.54121,29.5424362]},肯德基的语义范围为{[106.617366,106.61809],[29.54063,29.54126]};In the embodiment, take the first position point x 1 ={106.61704, 29.541919, <Xinhua Bookstore>}, and calculate the nearest three semantics near the first position according to the Euclidean distance as {<New Century Supermarket>, <China Mobile >,<KFC>} The four semantic ranges are respectively, the semantic range of Xinhua Bookstore is {[106.61647,106.61760],[29.541427,29.542410]}, the semantic range of New Century Supermarket is {[106.61495,106.616128],[ 29.541529, 29.54255]}, the semantic range of China Mobile is {[106.613461, 106.61486], [29.54121, 29.5424362]}, the semantic range of KFC is {[106.617366, 106.61809], [29.54063, 29.54126]};

步骤S3-2,位置语义敏感度计算,计算步骤S3-1得到的l个语义的敏感度

Figure BDA0003038194390000064
计算公式如下式,Step S3-2, calculating the positional semantic sensitivity, calculating the sensitivity of the l semantics obtained in step S3-1
Figure BDA0003038194390000064
The calculation formula is as follows,

Figure BDA0003038194390000065
Figure BDA0003038194390000065

其中,H(semi)表示语义semi被访问的总次数,L表示所有语义被访问的次数之和。Among them, H(sem i ) represents the total number of times that semantic sem i is accessed, and L represents the sum of the times that all semantics are accessed.

实施例中,新华书店语义一共被访问了30次,新世纪超市一共被访问了200,中国移动一共被访问15次,肯德基一共被访问10次得到的位置语义敏感度分别为0.023,0.0075,0.21,0.007;In the example, Xinhua Bookstore has been visited 30 times, New Century Supermarket has been visited 200 times, China Mobile has been visited 15 times, and KFC has been visited 10 times. , 0.007;

步骤S4,噪声生成,根据拉普拉斯噪声的生成原理生成符合特定语义敏感度的拉普拉斯噪声,包括以下步骤,Step S4, noise generation, generating Laplacian noise conforming to a specific semantic sensitivity according to the Laplacian noise generation principle, including the following steps:

步骤S4-1,根据语义敏感度和位置语义范围求出高斯逆累计分布函数标准差σ1,σ2Step S4-1, according to the semantic sensitivity and the position semantic range, the standard deviations σ 1 and σ 2 of the Gaussian inverse cumulative distribution function are obtained:

Figure BDA0003038194390000071
Figure BDA0003038194390000071

Figure BDA0003038194390000072
Figure BDA0003038194390000072

其中,μ为0,σ1,σ2即为所求的高斯标准差参数;Among them, μ is 0, σ 1 , σ 2 are the required Gaussian standard deviation parameters;

实施例中,根据步骤S3中的语义范围和计算的语义敏感度分别求得4个语义的参数σ1={0.00346,0.0000213,0.000317,0.0000278},σ2={0.00023001,0.000321,0.0000378,0.00001265}In the embodiment, four semantic parameters σ 1 ={0.00346, 0.0000213, 0.000317, 0.0000278}, σ 2 ={0.00023001, 0.000321, 0.0000378, 0.00001265} are respectively obtained according to the semantic range and the calculated semantic sensitivity in step S3

步骤S4-2,根据语义敏感度和位置语义范围求出逆累计指数分布函数生成指数分布所需要的参数λ12Step S4-2, according to the semantic sensitivity and the position semantic range, obtain the parameters λ 1 , λ 2 required by the inverse cumulative exponential distribution function to generate the exponential distribution:

实施例,根据步骤S3中的语义范围和计算的语义敏感度分别求得4个语义的参数λ1={0.000075,0.00067,0.0000379,0.0000543},λ2={0.0003954,0.0001534,0.000069023,0.00001357};In the embodiment, four semantic parameters λ 1 ={0.000075, 0.00067, 0.0000379, 0.0000543}, λ 2 ={0.0003954, 0.0001534, 0.000069023, 0.00001357} are respectively obtained according to the semantic range and the calculated semantic sensitivity in step S3;

步骤S4-3,根据步骤S4-1所求得的高斯分布参数,生成高斯分布噪声Zlng,ZlatStep S4-3, according to the Gaussian distribution parameters obtained in step S4-1, generate Gaussian distribution noise Z lng , Z lat ;

实施例,从步骤S4-1所求得的σ1,σ2合集中分别随机选取一个作为高斯分布参数,这里我们从σ1中选取0.000317生成Zlng=0.0000131,从σ2中选取0.000321生成Zlat=0.000678;In an embodiment, one of the sets of σ 1 and σ 2 obtained in step S4-1 is randomly selected as a Gaussian distribution parameter. Here, we select 0.000317 from σ 1 to generate Z lng = 0.0000131, and select 0.000321 from σ 2 to generate Z lat = 0.000678;

步骤S4-4,根据步骤S4-2所求得的指数分布参数,生成指数分布噪声Wlng,WlatStep S4-4, according to the exponential distribution parameters obtained in step S4-2, generate exponentially distributed noises W lng , W lat ;

实施例,从步骤S4-2所求得的λ1,λ2合集中分别随机选取一个作为指数分布参数,这里我们从λ1中选取0.00067生成Wlng=0.000362,从σ2中选取0.000354生成Wlat=0.000714;In the embodiment, one is randomly selected from the collection of λ 1 and λ 2 obtained in step S4-2 as an exponential distribution parameter, here we select 0.00067 from λ 1 to generate W lng =0.000362, and select 0.000354 from σ 2 to generate W lat = 0.000714;

步骤S4-5,计算广义拉普拉斯变量

Figure BDA0003038194390000073
Figure BDA0003038194390000074
Ylng,Ylat即为产生的符合特定语义敏感度的经纬度噪声。Step S4-5, calculate the generalized Laplace variable
Figure BDA0003038194390000073
Figure BDA0003038194390000074
Y lng , Y lat are the generated latitude and longitude noise that conforms to specific semantic sensitivity.

实施例,根据步骤S4-3和步骤S4-4所求得的Zlng,Zlat,Wlng,Wlat得到Ylng=0.0002492444,Ylat=0.000181166;In an embodiment, according to Z lng , Z lat , W lng , and W lat obtained in step S4-3 and step S4-4, Y lng =0.0002492444, Y lat =0.000181166;

步骤S5,噪声加入,向位置数据加入步骤S4-5所求得的广义拉普拉斯噪声得到新的位置数据X′lng=Xlng+Ylng,X′lat=Xlat+YlatStep S5, adding noise, adding the generalized Laplacian noise obtained in step S4-5 to the position data to obtain new position data X' lng =X lng +Y lng , X' lat =X lat +Y lat .

实施例,向第一个位置数据x1={106.61704,29.541919,<新华书店>}加入步骤S4-5生成的广义拉普拉斯噪声得到新的位置数据得到位置数据x′1={106.6172892444,29.54210016}。In an embodiment, the generalized Laplacian noise generated in step S4-5 is added to the first position data x 1 ={106.61704, 29.541919, <Xinhua Bookstore>} to obtain new position data to obtain position data x′ 1 ={106.6172892444, 29.54210016}.

步骤S6,迭代处理,迭代处理每一个位置,重复上述步骤S2-S5,直到所有位置数据处理完成。Step S6, iterative processing, iterative processing of each position, and repeating the above steps S2-S5 until all position data processing is completed.

实施例,遍历每个位置数据,将1303个位置全部进行上述步骤S2-S5处理,直到所有位置数据处理完成;In the embodiment, traverse each position data, and perform the above-mentioned steps S2-S5 processing on all 1303 positions, until all position data processing is completed;

步骤S7,数据发布,对于每个处理之后的位置数据,都有新的经过扰动之后的位置数据与之对应,并且这些位置至少处于l个不同的语义中,我们从中选取一个位置语义作为用户的位置发布,发布的新位置为X′={x′1,x′2,...,x′i,...,x′n},其中x′i表示加噪声后的第i个位置,

Figure BDA0003038194390000081
Figure BDA0003038194390000082
分别表示加噪声后第i个位置的经度、纬度和语义。Step S7, data release, for each processed location data, there is new perturbed location data corresponding to it, and these locations are in at least l different semantics, we select one location semantics as the user's location semantics. The position is published, and the new position published is X′={x′ 1 ,x′ 2 ,...,x′ i ,...,x′ n }, where x′ i represents the ith position after adding noise ,
Figure BDA0003038194390000081
Figure BDA0003038194390000082
represent the longitude, latitude and semantics of the i-th position after adding noise, respectively.

实施例,x1发布的位置数据为加噪声之后的经纬度数据106.6172892444,29.54210016,语义为新华书店、新世纪超市、中国移动、肯德基中以语义敏感度的大小为概率度量单位选取一个语义发布。 In the embodiment, the position data released by x1 is the longitude and latitude data after adding noise 106.6172892444, 29.54210016, and the semantics are Xinhua Bookstore, New Century Supermarket, China Mobile, and KFC. The size of the semantic sensitivity is used as the probability measurement unit to select a semantic release.

具体实施中,本发明所提供方法可以基于软件技术实现自动运行流程,也可采用模块化方式实现相应系统。In specific implementation, the method provided by the present invention can realize the automatic running process based on software technology, and can also realize the corresponding system in a modular way.

数据预处理模块,用于对原始采集到的位置数据进行数据清洗和规约,得到待保护的位置数据X={x1,x2,...,xi,...,xn},其中xi表示第i个位置,

Figure BDA0003038194390000083
Figure BDA0003038194390000084
分别表示第i个位置的经度、纬度和语义。The data preprocessing module is used to clean and reduce the originally collected position data to obtain the position data to be protected X={x 1 ,x 2 ,..., xi ,...,x n }, where x i represents the ith position,
Figure BDA0003038194390000083
Figure BDA0003038194390000084
respectively represent the longitude, latitude and semantics of the ith position.

参数设置模块,用于设置语义位置隐私水平保护参数l。The parameter setting module is used to set the semantic location privacy level protection parameter l.

语义敏感度计算模块,用于计算这l个语义的敏感度Psem=(psem1,psem2,...,pseml),包括以下子单元,The semantic sensitivity calculation module is used to calculate the sensitivity of the l semantics P sem = (p sem1 , p sem2 ,..., p seml ), including the following subunits,

第一单元,根据欧式距离计算第i位置最近的l-1个语义,将本身所属的语义和这l-1个语义作为一个语义集合Sem=(sem1,sem2,...,semi,...seml),其中

Figure BDA0003038194390000085
表示第i个语义的经纬度范围。The first unit calculates the nearest l-1 semantics of the i-th position according to the Euclidean distance, and takes the semantics to which it belongs and the l-1 semantics as a semantic set Sem=(sem 1 ,sem 2 ,...,sem i ,...sem l ), where
Figure BDA0003038194390000085
Indicates the latitude and longitude range of the i-th semantic.

第二单元,位置语义敏感度计算,计算这l个语义的敏感度

Figure BDA0003038194390000086
如下式,The second unit, location semantic sensitivity calculation, calculates the sensitivity of these l semantics
Figure BDA0003038194390000086
as the following formula,

Figure BDA0003038194390000087
Figure BDA0003038194390000087

其中,H(semi)表示语义semi被访问的总次数,L表示所有语义被访问的次数之和。Among them, H(sem i ) represents the total number of times that semantic sem i is accessed, and L represents the sum of the times that all semantics are accessed.

噪声生成模块,用于生成符合特定语义敏感度的拉普拉斯噪声,包括以下子单元,A noise generation module for generating Laplacian noise conforming to a specific semantic sensitivity, including the following subunits,

第一单元,根据语义敏感度和位置语义范围求出高斯逆累计分布函数标准差σ1,σ2The first unit, according to the semantic sensitivity and the position semantic range, calculate the standard deviation of the Gaussian inverse cumulative distribution function σ 1 , σ 2 ,

Figure BDA0003038194390000091
Figure BDA0003038194390000091

Figure BDA0003038194390000092
Figure BDA0003038194390000092

其中,μ为0,σ12即为我们所求的参数;Among them, μ is 0, σ 1 , σ 2 are the parameters we seek;

第二单元,根据语义敏感度和位置语义范围求出逆累计指数分布函数生成指数分布所需要的参数λ12The second unit is to obtain the parameters λ 1 , λ 2 required by the inverse cumulative exponential distribution function to generate the exponential distribution according to the semantic sensitivity and the location semantic range;

Figure BDA0003038194390000093
Figure BDA0003038194390000093

Figure BDA0003038194390000094
Figure BDA0003038194390000094

第三单元,根据步骤S4-1所求得的高斯分布参数,生成高斯分布噪声Zlng,ZlatThe third unit, according to the Gaussian distribution parameters obtained in step S4-1, generates Gaussian distribution noise Z lng , Z lat ;

第四单元,根据步骤S4-2所求得的指数分布参数,生成指数分布噪声Wlng,WlatThe fourth unit, according to the exponential distribution parameters obtained in step S4-2, generates exponential distribution noises W lng , W lat ;

第五单元,计算广义拉普拉斯变量

Figure BDA0003038194390000095
Figure BDA0003038194390000096
Ylng,Ylat即为产生的符合特定语义敏感度的经纬度噪声。Unit 5, Computing Generalized Laplace Variables
Figure BDA0003038194390000095
Figure BDA0003038194390000096
Y lng , Y lat are the generated latitude and longitude noise that conforms to specific semantic sensitivity.

噪声加入模块,用于向位置数据加入噪声生成模块中第五单元所求得的广义拉普拉斯噪声得到新的位置数据X′lng=Xlng+Ylng,X′lat=Xlat+YlatThe noise adding module is used to add the generalized Laplace noise obtained by the fifth unit in the noise generating module to the position data to obtain new position data X′ lng =X lng +Y lng , X′ lat =X lat +Y lat .

迭代处理模块,用于迭代处理每一个位置,重复上述步骤S2-S5,直到所有位置数据更新完成。The iterative processing module is used to iteratively process each position, and repeat the above steps S2-S5 until all the position data are updated.

数据发布模块,对于每个处理之后的位置数据,都有新的经过扰动之后的位置数据与之对应,并且这些位置至少处于l个不同的语义中,我们从中选取一个位置语义作为用户的位置发布,发布的新位置为X′={x′1,x′2,...,x′i,...,x′n},其中x′i表示加噪声后的第i个位置,

Figure BDA0003038194390000097
Figure BDA0003038194390000098
分别表示加噪声后第i个位置的经度、纬度和语义。In the data publishing module, for each processed location data, there is new perturbed location data corresponding to it, and these locations are in at least l different semantics, and we select one location semantics as the user's location release. , the new position released is X′={x′ 1 ,x′ 2 ,...,x′ i ,...,x′ n }, where x′ i represents the ith position after adding noise,
Figure BDA0003038194390000097
Figure BDA0003038194390000098
represent the longitude, latitude and semantics of the i-th position after adding noise, respectively.

最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements, without departing from the spirit and scope of the technical solution, should all be included in the scope of the claims of the present invention.

Claims (2)

1. A personalized position semantic publishing method based on differential privacy is characterized in that: the method comprises the following steps:
s1: data preprocessing, namely cleaning and stipulating the originally acquired position data to obtain position sensitive data X = { X = to-be-protected 1 ,...,x i ,...,x n H, a total of n positions, where x i Which represents the (i) th position of the (c),
Figure FDA0003829654390000011
wherein
Figure FDA0003829654390000012
Respectively representing longitude, latitude and semantics of the ith position;
s2: setting a corresponding semantic privacy protection level l according to the semantic privacy protection requirement;
s3: calculating semantic sensitivity; the step S3 specifically includes the following steps:
s31: calculating the ith position x according to Euclidean distance i Last l-1 semanticsThe semantic meaning of the self and the 1 semantic meaning are taken as a semantic set Sem (Sem) 1 ,sem 2 ,...,sem i ,...,sem l ) Wherein
Figure FDA0003829654390000013
Representing a latitude and longitude range of the ith semantic;
s32: position semantic sensitivity calculation, namely calculating the sensitivity of the semantic words obtained in the step S31
Figure FDA0003829654390000014
The calculation formula is as follows,
Figure FDA0003829654390000015
wherein, H (sem) i ) Representing semantics sem i The total number of times of access, L represents the sum of the number of times of access of all semantics;
s4: generating noise, namely generating the Laplacian noise which accords with the specific semantic sensitivity according to the generation principle of the Laplacian noise; the step S4 specifically includes the following steps:
s41: calculating standard deviation sigma of inverse Gaussian cumulative distribution function according to semantic sensitivity and position semantic range 1 ,σ 2
Figure FDA0003829654390000016
Figure FDA0003829654390000017
Wherein μ is 0 and σ 1 ,σ 2 The calculated Gaussian standard deviation parameter is obtained;
s42: calculating the parameter lambda required by generating the exponential distribution by the inverse cumulative exponential distribution function according to the semantic sensitivity and the position semantic range 12
Figure FDA0003829654390000018
Figure FDA0003829654390000019
S43: generating Gaussian distribution noise Z based on the Gaussian distribution parameters obtained in step S41 lng ,Z lat
S44: generating an exponential distribution noise W based on the exponential distribution parameter obtained in step S42 lng ,W lat
S45: calculating a generalized Laplace variable
Figure FDA0003829654390000021
Wherein Y is lng ,Y lat The latitude and longitude noise which is generated and accords with the specific semantic sensitivity is obtained;
s5: adding the obtained Laplacian noise to the position data to obtain new position data X' lng =X lng +Y lng ,X′ lat =X lat +Y lat
S6: performing iterative processing, namely performing iterative processing on each position, and repeating the steps S2-S5 until the data processing of all the positions is finished;
s7: data issuing, wherein for each processed position data, new disturbed position data corresponds to the position data, the positions are at least in l different semantics, one position semantic is selected from the position data to be issued as the position of a user, and the issued new position is X '= { X' 1 ,x' 2 ,...,x′ i ,...,x' n X 'therein' i Indicating the ith position after noise addition
Figure FDA0003829654390000022
Figure FDA0003829654390000023
Respectively representing longitude, latitude and semantics of the ith position after noise addition.
2. A personalized position semantic publishing system based on differential privacy is characterized in that: comprises that
A data preprocessing module: the method is used for carrying out data cleaning and specification on the originally acquired position data to obtain position data X = { X = to-be-protected 1 ,x 2 ,...,x i ,...,x n In which x i Which represents the (i) th position of the (c),
Figure FDA0003829654390000024
Figure FDA0003829654390000025
respectively representing longitude, latitude and semantics of the ith position;
a parameter setting module: the method is used for setting a semantic location privacy level protection parameter l;
a semantic sensitivity calculation module: sensitivity for calculating this semantic
Figure FDA0003829654390000029
A noise generation module: generating laplacian noise that conforms to a particular semantic sensitivity;
a noise adding module: is used for adding the generalized Laplacian noise obtained by the fifth unit in the noise generation module into the position data to obtain new position data X' lng =X lng +Y lng ,X′ lat =X lat +Y lat
An iteration processing module: the system is used for iteratively processing each position until all position data are updated;
the data release module: for each processed position data, new perturbed position data is corresponding to the processed position data, and the positions are at least in one different semantic meaning, and are selected from the one or more different semantic meaningsOne location semantic is issued as the location of the user, and the new location issued is X '= { X' 1 ,x' 2 ,...,x′ i ,...,x' n X 'therein' i Indicating the i-th position after the addition of noise,
Figure FDA0003829654390000026
Figure FDA0003829654390000027
respectively representing longitude, latitude and semantics of the ith position after noise addition;
the semantic sensitivity calculation module comprises the following subunits:
semantic sensitivity calculation first unit: calculating l-1 semanteme closest to the ith position according to the Euclidean distance, and taking the semanteme to which the semanteme belongs and the l-1 semantemes as a semantic set Sem = (Sem) 1 ,sem 2 ,...,sem i ,...,sem l ) Wherein
Figure FDA0003829654390000028
Representing a latitude and longitude range of the ith semantic;
semantic sensitivity calculation second unit: position semantic sensitivity calculation, calculating the sensitivity of the semantic
Figure FDA0003829654390000031
The following formula:
Figure FDA0003829654390000032
wherein, H (sem) i ) Representing semantics sem i The total number of times of access, L represents the sum of the number of times of access of all semantics;
the noise generation module comprises the following subunits:
a first noise generation unit for calculating standard deviation sigma of Gaussian inverse cumulative distribution function according to semantic sensitivity and position semantic range 1 ,σ 2
Figure FDA0003829654390000033
Figure FDA0003829654390000034
Wherein μ is 0 and σ 12 The parameters are obtained;
a second noise generation unit for calculating the parameter lambda needed by the inverse cumulative exponential distribution function to generate the exponential distribution according to the semantic sensitivity and the position semantic range 12
Figure FDA0003829654390000035
Figure FDA0003829654390000036
A third noise generation unit for generating Gaussian distribution noise Z based on the Gaussian distribution parameter obtained in step S41 lng ,Z lat
A fourth noise generation unit for generating an exponential distribution noise W based on the exponential distribution parameter obtained in step S42 lng ,W lat
A fifth unit for generating noise and calculating generalized Laplace variable
Figure FDA0003829654390000037
Y lng ,Y lat I.e. the generated longitude and latitude noise which conforms to the specific semantic sensitivity.
CN202110449465.5A 2021-04-25 2021-04-25 A method and system for personalized location semantic publishing based on differential privacy Active CN113177166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110449465.5A CN113177166B (en) 2021-04-25 2021-04-25 A method and system for personalized location semantic publishing based on differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110449465.5A CN113177166B (en) 2021-04-25 2021-04-25 A method and system for personalized location semantic publishing based on differential privacy

Publications (2)

Publication Number Publication Date
CN113177166A CN113177166A (en) 2021-07-27
CN113177166B true CN113177166B (en) 2022-10-21

Family

ID=76926190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110449465.5A Active CN113177166B (en) 2021-04-25 2021-04-25 A method and system for personalized location semantic publishing based on differential privacy

Country Status (1)

Country Link
CN (1) CN113177166B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595976A (en) * 2018-03-27 2018-09-28 西安电子科技大学 Android terminal sensor information guard method based on difference privacy
CN109726594A (en) * 2019-01-09 2019-05-07 南京航空航天大学 A Novel Trajectory Data Publishing Method Based on Differential Privacy
CN110300029A (en) * 2019-07-06 2019-10-01 桂林电子科技大学 A kind of location privacy protection method of anti-side right attack and position semantic attacks
CN111447181A (en) * 2020-03-04 2020-07-24 重庆邮电大学 Location privacy protection method based on differential privacy
CN111931235A (en) * 2020-08-18 2020-11-13 重庆邮电大学 Differential privacy protection method and system under error constraint condition
CN111950028A (en) * 2020-08-24 2020-11-17 重庆邮电大学 A differential privacy protection method and system for trajectory time mode
CN112035880A (en) * 2020-09-10 2020-12-04 辽宁工业大学 A Preference Awareness-Based Trajectory Privacy Protection Service Recommendation Method
CN112364379A (en) * 2020-11-18 2021-02-12 浙江工业大学 Location privacy protection method for guaranteeing service quality based on differential privacy

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10638305B1 (en) * 2018-10-11 2020-04-28 Citrix Systems, Inc. Policy based location protection service

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595976A (en) * 2018-03-27 2018-09-28 西安电子科技大学 Android terminal sensor information guard method based on difference privacy
CN109726594A (en) * 2019-01-09 2019-05-07 南京航空航天大学 A Novel Trajectory Data Publishing Method Based on Differential Privacy
CN110300029A (en) * 2019-07-06 2019-10-01 桂林电子科技大学 A kind of location privacy protection method of anti-side right attack and position semantic attacks
CN111447181A (en) * 2020-03-04 2020-07-24 重庆邮电大学 Location privacy protection method based on differential privacy
CN111931235A (en) * 2020-08-18 2020-11-13 重庆邮电大学 Differential privacy protection method and system under error constraint condition
CN111950028A (en) * 2020-08-24 2020-11-17 重庆邮电大学 A differential privacy protection method and system for trajectory time mode
CN112035880A (en) * 2020-09-10 2020-12-04 辽宁工业大学 A Preference Awareness-Based Trajectory Privacy Protection Service Recommendation Method
CN112364379A (en) * 2020-11-18 2021-02-12 浙江工业大学 Location privacy protection method for guaranteeing service quality based on differential privacy

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CLM:面向轨迹发布的差分隐私保护方法;王豪 等;《通信学报》;20170625;第38卷(第6期);85-96 *
Differential Privacy-Based Location Protection in Spatial Crowdsourcing;Jianhao Wei et al.;《IEEE TRANSACTIONS ON SERVICES COMPUTING》;20190606;第15卷(第1期);45-58 *
PrivSem: Protecting location privacy using semantic and differential privacy;Yanhui Li et al.;《World Wide Web》;20190427;第22卷;2407-2436 *
面向移动位置服务的轨迹隐私保护研究;鞠晓康;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210215(第2期);I138-186 *

Also Published As

Publication number Publication date
CN113177166A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
JP6541131B2 (en) Personal directory with social privacy and contact association features
CN104394509B (en) A kind of efficient difference disturbance location intimacy protection system and method
Damiani et al. Fine-grained cloaking of sensitive positions in location-sharing applications
CN110636065B (en) Location point privacy protection method based on location service
Xiong et al. Reward-based spatial crowdsourcing with differential privacy preservation
Yang et al. Location privacy preservation mechanism for location-based service with incomplete location data
CN111400504A (en) Method and device for identifying enterprise key people
CN112818216A (en) Client recommendation method and device, electronic equipment and storage medium
Kumar et al. Protecting location privacy in cloud services
Primault et al. Adaptive location privacy with ALP
CN114117536A (en) Location privacy protection method in three-dimensional space LBS (location based service) based on deep reinforcement learning
JP2023549009A (en) Cryptographically secure control using secure multiparty computation
CN110765472A (en) Location privacy protection method based on block chain and distributed storage
Gupta Some issues for location dependent information system query in mobile environment
Ashraf et al. State-of-the-Art, Challenges: Privacy Provisioning in TTP Location Based Services Systems
CN107135197B (en) A Chained k-Anonymous Location Privacy Protection Method Based on Grey Prediction
CN114564747B (en) Trajectory differential privacy protection method and system based on semantics and prediction
Zhang et al. Location privacy protection method based on differential privacy in crowdsensing task allocation
CN113177166B (en) A method and system for personalized location semantic publishing based on differential privacy
Min et al. Indoor semantic location privacy protection with safe reinforcement learning
Li et al. Quantifying location privacy risks under heterogeneous correlations
Eltarjaman et al. Private retrieval of POI details in top-K queries
Kai et al. Localized differential location privacy protection scheme in mobile environment
Zhao et al. EPLA: efficient personal location anonymity
US20150219746A1 (en) Determining a geographic location of a computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20250521

Address after: 100000No. 1, Building 4, No. 1 Courtyard, KeChuang 10th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Patentee after: Beijing Jiangzhi Haobin Technology Co.,Ltd.

Country or region after: China

Address before: 400065 Chongqing Nan'an District huangjuezhen pass Chongwen Road No. 2

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Country or region before: China

TR01 Transfer of patent right