CN115982752A - K domination privacy protection method based on approximate semantic query - Google Patents

K domination privacy protection method based on approximate semantic query Download PDF

Info

Publication number
CN115982752A
CN115982752A CN202211496552.7A CN202211496552A CN115982752A CN 115982752 A CN115982752 A CN 115982752A CN 202211496552 A CN202211496552 A CN 202211496552A CN 115982752 A CN115982752 A CN 115982752A
Authority
CN
China
Prior art keywords
semantic
privacy protection
positions
data
approximate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211496552.7A
Other languages
Chinese (zh)
Other versions
CN115982752B (en
Inventor
李松
吴楠
曹文琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202211496552.7A priority Critical patent/CN115982752B/en
Publication of CN115982752A publication Critical patent/CN115982752A/en
Application granted granted Critical
Publication of CN115982752B publication Critical patent/CN115982752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a K domination privacy protection method based on approximate semantic query, which comprises the steps of firstly, giving data and obtaining a position data set in a rectangular area containing a real position from the data, obtaining a clustering center point in the data set through an MCA algorithm, adopting a multi-center data processing algorithm based on the maximum and minimum distance, combining a data point set generated by the MCA algorithm clustering, selecting a position point which ensures the farthest distance between the position point and the data point set, and generating a group of processed candidate sets. Secondly, semantic similarity between any two positions in the candidate set is obtained by calculating the distance between position information of different names, and a k-1 position with the minimum semantic similarity is selected as a virtual result set by combining a dummy method. Experimental results show that the method can ensure the physical dispersity and semantic diversity of the positions and improve the virtual generation efficiency. Meanwhile, balance between privacy protection safety and query service quality is realized.

Description

K domination privacy protection method based on approximate semantic query
Technical Field
The invention relates to the field of privacy protection processing in data query, in particular to a K domination privacy protection method based on approximate semantic query.
Background
Background significance of the Main Innovative Point study
With the development of mobile location technology and wireless communication technology, a large number of mobile devices in the market have the capability of GPS accurate location, so that Location Based Services (LBS) are rapidly developed. However, while LBS provides convenience and great benefit to society, its sensitive information leakage problem is also receiving increasing attention. Since the user's location is shared among different location service providers, untrusted third parties can easily steal the user's privacy by analyzing and comparing the location information. For example, by capturing the recent user's trail, an adversary can analyze some information, such as home address, workplace and health, etc.
Therefore, it is necessary to ensure the security of the privacy of the user location, and at present, many different methods are proposed to prevent the disclosure of private information, including mainly fuzzy methods, encryption methods and policy-based methods. Spatial anonymity methods typically require the assistance of a fully Trusted Third Party (TTP). When the location query service is needed, the mobile user firstly sends a query request to the TTP, and the TTP generates a K domination area containing the user location and then sends the K domination area to the LBS server for querying. In this method, if the area of the K dominating region is too large, not only is more time consumed, but also the accuracy of the query result is reduced. At the same time, TTP is likely to become a bottleneck of the system. However, in privacy protection based on virtual locations, which are generated by mobile clients, TTPs and anonymous areas are not required. Therefore, it can well compensate for the above-mentioned disadvantages of the spatial anonymity method.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a K domination privacy protection method based on approximate semantic query, which combines the K domination technology and the semantic similarity correlation technology in the traditional calculation to improve the privacy protection degree of the query, and the algorithm can further improve the query efficiency.
The technical scheme adopted by the invention for solving the technical problems is as follows: a K domination privacy protection method based on approximate semantic query mainly comprises the following steps:
1. firstly, giving data and obtaining a position data set in a rectangular area containing real positions from the data, calculating and generating a plurality of clustering centers by an MCA center clustering method so as to form a candidate data set, then selecting some positions by adopting a multi-center data processing method based on the maximum and minimum distances, ensuring the farthest distance between the positions, and generating a group of processed fake data points;
2. secondly, semantic similarity between any two positions in the candidate set is obtained by calculating the distance between the position information of different names, and the k-1 position with the minimum semantic similarity is selected as a virtual data point.
Furthermore, the MCA algorithm is adopted, so that a plurality of clustering centers can be generated at the mobile client. Because these locations are furthest apart from each other, spurious data points may produce a data set from them.
Further, the semantic similarity calculation is carried out on the position information of the candidate set, k-1 positions with the minimum semantic similarity are selected as virtual positions, k-1 virtual point information and real positions are sent to an LBS server to be inquired, and meanwhile, a dummy set is generated by combining the proposed dummy element generation method, wherein the dummy element data set is generated through clustering calculation in the algorithm 1.
The beneficial effects of the invention are: according to the invention, further protection on the user position information query is realized by adopting an algorithm combining K domination and semantic similarity, so that the problem of time overhead during query is reduced, and the query privacy of the user can be further ensured.
Drawings
FIG. 1 is a abstract drawing of a K domination privacy preserving method based on approximate semantic query according to the invention.
Fig. 2 is a graph comparing the time overhead of the three methods presented by the present invention as the value of K increases.
Fig. 3 is an exemplary diagram of an MCA algorithm presented in the present invention.
Fig. 4 is a graph comparing the efficiency of operation of the present invention and maxminddistds, as provided by the present invention.
FIG. 5 is a graph comparing the operating efficiency of the present invention and SimPMaxMinDistDS as provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of implementation examples of the present invention, and not all embodiments. Further, it should be understood that various modifications and changes may occur to those skilled in the art after reading the present disclosure, and that such equivalents fall within the scope of the appended claims.
The invention discloses a K domination privacy protection method based on approximate semantic query, which comprises the following specific operation processes:
step (1): and calculating the clustering centers of the position geographic coordinates in the square area by using the MCA algorithm to obtain a plurality of clustering centers, and selecting the clustering centers as virtual candidate sets. The MCA algorithm is a heuristic-based clustering algorithm that takes as far as possible objects as the cluster center according to euclidean distance. The sample object is first taken as a first cluster center, and then the sample farthest from the first cluster center is selected as a second cluster center. Additional cluster centers are then determined until there are no new cluster centers. After all the clustering centers are determined, m clustering sample sets containing samples are used as virtual position candidate sets. The result is the position shown in fig. 3. According to Algorithm 1, select l 1 As the first clustering center, select l 5 As second cluster center, the third cluster center l is determined 9 . And (5) obtaining three clustering centers through clustering calculation to generate a virtual position candidate set.
When determining the cluster center, the actual position is used as the initial cluster center 1, and if it is selected as the sixth cluster center, these conditions must be satisfied:
(1)D i >γ·D 12 wherein i ∈ (1,..., n);
(2)D i =max{min(D i1 ,D i2 ) And i ∈ (1, ·, n), D 12 =|Z 2 -Z 1 |;
(3) Gamma is a test parameter in the algorithm, and the value range is as follows: gamma is more than 0.5 and less than 1.
The MCA step algorithm is as follows:
algorithm 1.
Inputting a position data set S n And a demand parameter m.
Output generation of a virtual location data set S 1
1. The value range of gamma is set and is ensured to be in the range of 0 < gamma < 1.
2. Will be the true position l re As a first clustering center Z 1
3. Find from Z 1 The most distant position as the second polymer center Z 2
4. For S n Of the remaining objects of i To Z, it goes 1 And Z 2 Is a distance D i1 And D i2 . Suppose D 12 Is Z 1 And Z 2 If D is i =max{min(D i1 ,D i2 ) And wherein i ∈ (1.. N) and D i >γ·D 12 Then, it is taken as the third clustering center Z 3
5. And by analogy, obtaining all the v cluster centers meeting the conditions. When the maximum and minimum distances are less than gamma.D 12 And when the cluster center is found, the calculation for finding the cluster center is finished.
6. Assuming that v represents the calculated number of the clustering centers, judging which of the following conditions is met:
(1) If v is more than or equal to m, the algorithm is ended;
(2) If v < m, the value is reselected and step 1 is then re-executed.
7. Generating a candidate set S 1
Step (2): and calculating the semantic similarity of the position information of the pseudo candidate set. Firstly, the same prefix in the information is removed according to the characteristics of the position information. Then, the semantic similarity in the residual character strings is calculated by calculating the distance, and the calculation efficiency and accuracy are improved. For example, "Harbin second school" and "Harbin first school" are two strings of Chinese place names. The 'Harbin' has no meaning on the semantic similarity calculation of the two place name strings, and also influences the accuracy of the calculation result. Thus, "harbin" is no longer prefixed in the calculation.
D[i,j]Is a dynamically planned distance matrix with a cost of between 0 and 1 per editing operation. And different values may be set as desired. Herein, this value is set to 0 or 1. If a is i =b i The replacement cost is 0. Otherwise, all overhead is set to 1. In the following matrix, D is a dynamic programming matrix, which represents the distance between the string a = "second middle school" and the string B = "first middle school".
Figure BDA0003965113120000041
The distance between two strings is obtained by calculation, i.e. D [ i, j]=D[4,4]And (2). Using the following formula
Figure BDA0003965113120000042
And calculating a similarity matching index between the character strings, namely semantic similarity, wherein the semantic similarity is 0.5.
Where | a | and | B | respectively denote the lengths of two character strings, and the maximum length of the character string S is used to calculate the semantic similarity. Finally, arg min (S (l) according to the formula i ,l j ) K positions with the smallest semantic similarity including the true position are obtained.
The semantic similarity algorithm is as follows:
and 2, calculating semantic similarity to obtain a virtual position result set.
Description of related input steps:
inputting a location candidate set S 1 And a parameter threshold of semantic diversity, l.
Outputting a set of position results S 2
1. And sequentially matching each character of the place name information, and ignoring prefix characters with the same matching value. Then, two new character strings a and B are obtained.
2. Let it be assumed that the string a contains i characters, which are denoted as a = a 1 a 2 a 3 La i (ii) a The string B contains j characters, denoted B = B 1 b 2 b 3 Lb i
3. And constructing a dynamic programming matrix of i +1 columns and j +1 rows. The last element from D [ i, j ] is ed (A, B).
4. If j =0, return i, then exit; if i =0, return j and then exit.
5. The first row is initialized to (0, 1, l, i); the first column is initialized to (0, 1, l, j).
6. Each element in the matrix is assigned a value:
if a is i =b i Then D [ i, j ]]=D[i-1,j-1];
If a i ≠b i Then D [ i, j ]]=1+min{D[i-1,j-1],D[i-1,j],D[i,j-1]}。
7. Step 6 is repeated until all values in the matrix are obtained, eventually ensuring the distance D i, j.
8. And calculating a similarity matching index S (A, B), namely semantic similarity, through D [ i, j ].
9. Selecting k-1 positions with minimum semantic similarity to generate a virtual result set S 2
Finally, the effectiveness of the method is verified through experiments. In the dummy position selection method considering semantic similarity, the average execution time of the dummy positions of maxminbidtds, simpmaxminbidtds and the proposed method are compared respectively. The average execution time of the virtual positions for the three methods is shown in fig. 2. In fig. 4, the comparison of the efficiency of generating virtual objects by maxminbidtds, simpmaxminbidtds and the proposed method is shown in fig. 5. As shown in fig. 2, as k increases, the maxminddistds algorithm takes more time than the proposed method. As shown in FIG. 5, when k is less than 5, the average execution time of the SimPmaxMinDistDS algorithm is slightly larger than that of the proposed method, and when k is greater than or equal to 5, the average execution time of the SimPmaxMinDistDS algorithm is much larger than that of the proposed method. As can be seen from fig. 5, the efficiency of the proposed method is better and better than the other two algorithms as k increases.
Theoretical and experimental results show that the algorithm can ensure the physical dispersity and semantic diversity of the position, effectively protect the position privacy of the user, reduce the time for generating the dummy and effectively improve the query efficiency.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims (5)

1. A K domination privacy protection method based on approximate semantic query is characterized by comprising the following steps:
the method comprises the steps that firstly, a position data set in a rectangular area containing real positions is obtained, a plurality of positions are selected through an MCA algorithm and a multi-center clustering method with the largest and smallest distances, and then a candidate data set is generated after processing through a dummy method;
and step two, after the distance between the geographical position information is calculated, calculating to obtain the semantic similarity between any two positions in the candidate set, and selecting k-1 geographical positions with the minimum semantic similarity as virtual positions.
2. The method of claim 1, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: an MCA method for processing data is provided to achieve the purpose of acquiring a cluster center point set.
3. The method of claim 1, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: and a candidate virtual model set is generated by adopting a multi-center clustering algorithm based on a maximum and minimum distance method, so that the physical dispersity of the virtual model is ensured.
4. The method of claim 1, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: in the aspect of processing the semantic similarity, the geographical position with the minimum semantic similarity is selected as the virtual place name, so that the semantic diversity of the virtual place name is ensured.
5. The method of claim 4, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: a dummy processing method is provided for processing and selecting the candidate set in the approximate semantic query process, and the average execution time for selecting the candidate set can be well reduced.
CN202211496552.7A 2022-11-25 2022-11-25 K-dominant privacy protection method based on approximate semantic query Active CN115982752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211496552.7A CN115982752B (en) 2022-11-25 2022-11-25 K-dominant privacy protection method based on approximate semantic query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211496552.7A CN115982752B (en) 2022-11-25 2022-11-25 K-dominant privacy protection method based on approximate semantic query

Publications (2)

Publication Number Publication Date
CN115982752A true CN115982752A (en) 2023-04-18
CN115982752B CN115982752B (en) 2023-08-04

Family

ID=85961850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211496552.7A Active CN115982752B (en) 2022-11-25 2022-11-25 K-dominant privacy protection method based on approximate semantic query

Country Status (1)

Country Link
CN (1) CN115982752B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116956349A (en) * 2023-07-29 2023-10-27 哈尔滨理工大学 K neighbor privacy protection query method based on time-dependent road network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110139214A (en) * 2019-06-26 2019-08-16 湖南大学 Vehicle position privacy protection method based on virtual location in a kind of VANET
US20200104648A1 (en) * 2018-09-28 2020-04-02 Wipro Limited Apparatus and method for detecting and removing outliers using sensitivity score
CN111259434A (en) * 2020-01-08 2020-06-09 广西师范大学 Privacy protection method for individual preference position in track data release
CN113946867A (en) * 2021-10-21 2022-01-18 福建工程学院 Position privacy protection method based on space influence
EP3961422A1 (en) * 2020-08-26 2022-03-02 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for extracting geographic location point spatial relationship

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104648A1 (en) * 2018-09-28 2020-04-02 Wipro Limited Apparatus and method for detecting and removing outliers using sensitivity score
CN110139214A (en) * 2019-06-26 2019-08-16 湖南大学 Vehicle position privacy protection method based on virtual location in a kind of VANET
CN111259434A (en) * 2020-01-08 2020-06-09 广西师范大学 Privacy protection method for individual preference position in track data release
EP3961422A1 (en) * 2020-08-26 2022-03-02 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for extracting geographic location point spatial relationship
CN113946867A (en) * 2021-10-21 2022-01-18 福建工程学院 Position privacy protection method based on space influence

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
S. LIU AND S. WANG: "Trajectory Community Discovery and Recommendation by Multi-Source Diffusion Modeling", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 29, no. 4, pages 898 - 911 *
时磊;潘巨龙;左正魏;: "一种基于代理服务的位置隐私保护方法", 中国计量大学学报, no. 03, pages 89 - 96 *
牛红卫: "位置服务中查询隐私保护方法的研究", 信息科技, pages 15 - 30 *
马明杰;杜跃进;李凤华;刘佳文;: "基于语义的位置服务隐私保护综述", 网络与信息安全学报, no. 12, pages 5 - 15 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116956349A (en) * 2023-07-29 2023-10-27 哈尔滨理工大学 K neighbor privacy protection query method based on time-dependent road network
CN116956349B (en) * 2023-07-29 2024-03-19 哈尔滨理工大学 K neighbor privacy protection query method based on time-dependent road network

Also Published As

Publication number Publication date
CN115982752B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN111083631B (en) Efficient query processing method for protecting location privacy and query privacy
Liu et al. Privacy-preserving task assignment in spatial crowdsourcing
Calderoni et al. Location privacy without mutual trust: The spatial Bloom filter
US8099380B1 (en) Blind evaluation of nearest neighbor queries wherein locations of users are transformed into a transformed space using a plurality of keys
Hu et al. Messages in a concealed bottle: Achieving query content privacy with accurate location-based services
CN109615021B (en) Privacy information protection method based on k-means clustering
KR20130064701A (en) Privacy-preserving collaborative filtering
CN108600304A (en) A kind of personalized location method for secret protection based on position k- anonymities
CN107169372B (en) Privacy protection query method based on Voronoi polygon and Hilbert curve coding
CN112073444B (en) Data set processing method and device and server
CN115982752A (en) K domination privacy protection method based on approximate semantic query
Palmieri et al. Spatial bloom filters: Enabling privacy in location-aware applications
CN112367662A (en) Location offset-based all-fake k anonymous location privacy protection method in Internet of vehicles
CN115052286A (en) User privacy protection and target query method and system based on location service
Zhang et al. An efficient privacy-preserving multi-keyword query scheme in location based services
CN114637746A (en) Conditional hiding trace query method, system and device based on privacy calculation
Gahi et al. Privacy preserving scheme for location-based services
CN115767722A (en) Indoor positioning privacy protection method based on inner product function encryption in cloud environment
Alotaibi et al. UBLS: User-based location selection scheme for preserving location privacy
CN116502254A (en) Method and device for inquiring trace capable of searching statistics
CN115905633A (en) Image similarity retrieval method and system with privacy protection function
Li et al. Gpsc: A grid-based privacy-reserving framework for online spatial crowdsourcing
Zhao et al. A novel dummy-based KNN query anonymization method in mobile services
WO2022099893A1 (en) Data query method, apparatus and system, and data set processing method
Patil et al. GeoSecure-R: Secure computation of geographical distance using region-anonymized GPS data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant