CN115982752A - K domination privacy protection method based on approximate semantic query - Google Patents
K domination privacy protection method based on approximate semantic query Download PDFInfo
- Publication number
- CN115982752A CN115982752A CN202211496552.7A CN202211496552A CN115982752A CN 115982752 A CN115982752 A CN 115982752A CN 202211496552 A CN202211496552 A CN 202211496552A CN 115982752 A CN115982752 A CN 115982752A
- Authority
- CN
- China
- Prior art keywords
- semantic
- privacy protection
- positions
- data
- approximate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a K domination privacy protection method based on approximate semantic query, which comprises the steps of firstly, giving data and obtaining a position data set in a rectangular area containing a real position from the data, obtaining a clustering center point in the data set through an MCA algorithm, adopting a multi-center data processing algorithm based on the maximum and minimum distance, combining a data point set generated by the MCA algorithm clustering, selecting a position point which ensures the farthest distance between the position point and the data point set, and generating a group of processed candidate sets. Secondly, semantic similarity between any two positions in the candidate set is obtained by calculating the distance between position information of different names, and a k-1 position with the minimum semantic similarity is selected as a virtual result set by combining a dummy method. Experimental results show that the method can ensure the physical dispersity and semantic diversity of the positions and improve the virtual generation efficiency. Meanwhile, balance between privacy protection safety and query service quality is realized.
Description
Technical Field
The invention relates to the field of privacy protection processing in data query, in particular to a K domination privacy protection method based on approximate semantic query.
Background
Background significance of the Main Innovative Point study
With the development of mobile location technology and wireless communication technology, a large number of mobile devices in the market have the capability of GPS accurate location, so that Location Based Services (LBS) are rapidly developed. However, while LBS provides convenience and great benefit to society, its sensitive information leakage problem is also receiving increasing attention. Since the user's location is shared among different location service providers, untrusted third parties can easily steal the user's privacy by analyzing and comparing the location information. For example, by capturing the recent user's trail, an adversary can analyze some information, such as home address, workplace and health, etc.
Therefore, it is necessary to ensure the security of the privacy of the user location, and at present, many different methods are proposed to prevent the disclosure of private information, including mainly fuzzy methods, encryption methods and policy-based methods. Spatial anonymity methods typically require the assistance of a fully Trusted Third Party (TTP). When the location query service is needed, the mobile user firstly sends a query request to the TTP, and the TTP generates a K domination area containing the user location and then sends the K domination area to the LBS server for querying. In this method, if the area of the K dominating region is too large, not only is more time consumed, but also the accuracy of the query result is reduced. At the same time, TTP is likely to become a bottleneck of the system. However, in privacy protection based on virtual locations, which are generated by mobile clients, TTPs and anonymous areas are not required. Therefore, it can well compensate for the above-mentioned disadvantages of the spatial anonymity method.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a K domination privacy protection method based on approximate semantic query, which combines the K domination technology and the semantic similarity correlation technology in the traditional calculation to improve the privacy protection degree of the query, and the algorithm can further improve the query efficiency.
The technical scheme adopted by the invention for solving the technical problems is as follows: a K domination privacy protection method based on approximate semantic query mainly comprises the following steps:
1. firstly, giving data and obtaining a position data set in a rectangular area containing real positions from the data, calculating and generating a plurality of clustering centers by an MCA center clustering method so as to form a candidate data set, then selecting some positions by adopting a multi-center data processing method based on the maximum and minimum distances, ensuring the farthest distance between the positions, and generating a group of processed fake data points;
2. secondly, semantic similarity between any two positions in the candidate set is obtained by calculating the distance between the position information of different names, and the k-1 position with the minimum semantic similarity is selected as a virtual data point.
Furthermore, the MCA algorithm is adopted, so that a plurality of clustering centers can be generated at the mobile client. Because these locations are furthest apart from each other, spurious data points may produce a data set from them.
Further, the semantic similarity calculation is carried out on the position information of the candidate set, k-1 positions with the minimum semantic similarity are selected as virtual positions, k-1 virtual point information and real positions are sent to an LBS server to be inquired, and meanwhile, a dummy set is generated by combining the proposed dummy element generation method, wherein the dummy element data set is generated through clustering calculation in the algorithm 1.
The beneficial effects of the invention are: according to the invention, further protection on the user position information query is realized by adopting an algorithm combining K domination and semantic similarity, so that the problem of time overhead during query is reduced, and the query privacy of the user can be further ensured.
Drawings
FIG. 1 is a abstract drawing of a K domination privacy preserving method based on approximate semantic query according to the invention.
Fig. 2 is a graph comparing the time overhead of the three methods presented by the present invention as the value of K increases.
Fig. 3 is an exemplary diagram of an MCA algorithm presented in the present invention.
Fig. 4 is a graph comparing the efficiency of operation of the present invention and maxminddistds, as provided by the present invention.
FIG. 5 is a graph comparing the operating efficiency of the present invention and SimPMaxMinDistDS as provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of implementation examples of the present invention, and not all embodiments. Further, it should be understood that various modifications and changes may occur to those skilled in the art after reading the present disclosure, and that such equivalents fall within the scope of the appended claims.
The invention discloses a K domination privacy protection method based on approximate semantic query, which comprises the following specific operation processes:
step (1): and calculating the clustering centers of the position geographic coordinates in the square area by using the MCA algorithm to obtain a plurality of clustering centers, and selecting the clustering centers as virtual candidate sets. The MCA algorithm is a heuristic-based clustering algorithm that takes as far as possible objects as the cluster center according to euclidean distance. The sample object is first taken as a first cluster center, and then the sample farthest from the first cluster center is selected as a second cluster center. Additional cluster centers are then determined until there are no new cluster centers. After all the clustering centers are determined, m clustering sample sets containing samples are used as virtual position candidate sets. The result is the position shown in fig. 3. According to Algorithm 1, select l 1 As the first clustering center, select l 5 As second cluster center, the third cluster center l is determined 9 . And (5) obtaining three clustering centers through clustering calculation to generate a virtual position candidate set.
When determining the cluster center, the actual position is used as the initial cluster center 1, and if it is selected as the sixth cluster center, these conditions must be satisfied:
(1)D i >γ·D 12 wherein i ∈ (1,..., n);
(2)D i =max{min(D i1 ,D i2 ) And i ∈ (1, ·, n), D 12 =|Z 2 -Z 1 |;
(3) Gamma is a test parameter in the algorithm, and the value range is as follows: gamma is more than 0.5 and less than 1.
The MCA step algorithm is as follows:
algorithm 1.
Inputting a position data set S n And a demand parameter m.
Output generation of a virtual location data set S 1 。
1. The value range of gamma is set and is ensured to be in the range of 0 < gamma < 1.
2. Will be the true position l re As a first clustering center Z 1 。
3. Find from Z 1 The most distant position as the second polymer center Z 2 。
4. For S n Of the remaining objects of i To Z, it goes 1 And Z 2 Is a distance D i1 And D i2 . Suppose D 12 Is Z 1 And Z 2 If D is i =max{min(D i1 ,D i2 ) And wherein i ∈ (1.. N) and D i >γ·D 12 Then, it is taken as the third clustering center Z 3 。
5. And by analogy, obtaining all the v cluster centers meeting the conditions. When the maximum and minimum distances are less than gamma.D 12 And when the cluster center is found, the calculation for finding the cluster center is finished.
6. Assuming that v represents the calculated number of the clustering centers, judging which of the following conditions is met:
(1) If v is more than or equal to m, the algorithm is ended;
(2) If v < m, the value is reselected and step 1 is then re-executed.
7. Generating a candidate set S 1 。
Step (2): and calculating the semantic similarity of the position information of the pseudo candidate set. Firstly, the same prefix in the information is removed according to the characteristics of the position information. Then, the semantic similarity in the residual character strings is calculated by calculating the distance, and the calculation efficiency and accuracy are improved. For example, "Harbin second school" and "Harbin first school" are two strings of Chinese place names. The 'Harbin' has no meaning on the semantic similarity calculation of the two place name strings, and also influences the accuracy of the calculation result. Thus, "harbin" is no longer prefixed in the calculation.
D[i,j]Is a dynamically planned distance matrix with a cost of between 0 and 1 per editing operation. And different values may be set as desired. Herein, this value is set to 0 or 1. If a is i =b i The replacement cost is 0. Otherwise, all overhead is set to 1. In the following matrix, D is a dynamic programming matrix, which represents the distance between the string a = "second middle school" and the string B = "first middle school".
The distance between two strings is obtained by calculation, i.e. D [ i, j]=D[4,4]And (2). Using the following formulaAnd calculating a similarity matching index between the character strings, namely semantic similarity, wherein the semantic similarity is 0.5.
Where | a | and | B | respectively denote the lengths of two character strings, and the maximum length of the character string S is used to calculate the semantic similarity. Finally, arg min (S (l) according to the formula i ,l j ) K positions with the smallest semantic similarity including the true position are obtained.
The semantic similarity algorithm is as follows:
and 2, calculating semantic similarity to obtain a virtual position result set.
Description of related input steps:
inputting a location candidate set S 1 And a parameter threshold of semantic diversity, l.
Outputting a set of position results S 2 。
1. And sequentially matching each character of the place name information, and ignoring prefix characters with the same matching value. Then, two new character strings a and B are obtained.
2. Let it be assumed that the string a contains i characters, which are denoted as a = a 1 a 2 a 3 La i (ii) a The string B contains j characters, denoted B = B 1 b 2 b 3 Lb i 。
3. And constructing a dynamic programming matrix of i +1 columns and j +1 rows. The last element from D [ i, j ] is ed (A, B).
4. If j =0, return i, then exit; if i =0, return j and then exit.
5. The first row is initialized to (0, 1, l, i); the first column is initialized to (0, 1, l, j).
6. Each element in the matrix is assigned a value:
if a is i =b i Then D [ i, j ]]=D[i-1,j-1];
If a i ≠b i Then D [ i, j ]]=1+min{D[i-1,j-1],D[i-1,j],D[i,j-1]}。
7. Step 6 is repeated until all values in the matrix are obtained, eventually ensuring the distance D i, j.
8. And calculating a similarity matching index S (A, B), namely semantic similarity, through D [ i, j ].
9. Selecting k-1 positions with minimum semantic similarity to generate a virtual result set S 2 。
Finally, the effectiveness of the method is verified through experiments. In the dummy position selection method considering semantic similarity, the average execution time of the dummy positions of maxminbidtds, simpmaxminbidtds and the proposed method are compared respectively. The average execution time of the virtual positions for the three methods is shown in fig. 2. In fig. 4, the comparison of the efficiency of generating virtual objects by maxminbidtds, simpmaxminbidtds and the proposed method is shown in fig. 5. As shown in fig. 2, as k increases, the maxminddistds algorithm takes more time than the proposed method. As shown in FIG. 5, when k is less than 5, the average execution time of the SimPmaxMinDistDS algorithm is slightly larger than that of the proposed method, and when k is greater than or equal to 5, the average execution time of the SimPmaxMinDistDS algorithm is much larger than that of the proposed method. As can be seen from fig. 5, the efficiency of the proposed method is better and better than the other two algorithms as k increases.
Theoretical and experimental results show that the algorithm can ensure the physical dispersity and semantic diversity of the position, effectively protect the position privacy of the user, reduce the time for generating the dummy and effectively improve the query efficiency.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.
Claims (5)
1. A K domination privacy protection method based on approximate semantic query is characterized by comprising the following steps:
the method comprises the steps that firstly, a position data set in a rectangular area containing real positions is obtained, a plurality of positions are selected through an MCA algorithm and a multi-center clustering method with the largest and smallest distances, and then a candidate data set is generated after processing through a dummy method;
and step two, after the distance between the geographical position information is calculated, calculating to obtain the semantic similarity between any two positions in the candidate set, and selecting k-1 geographical positions with the minimum semantic similarity as virtual positions.
2. The method of claim 1, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: an MCA method for processing data is provided to achieve the purpose of acquiring a cluster center point set.
3. The method of claim 1, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: and a candidate virtual model set is generated by adopting a multi-center clustering algorithm based on a maximum and minimum distance method, so that the physical dispersity of the virtual model is ensured.
4. The method of claim 1, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: in the aspect of processing the semantic similarity, the geographical position with the minimum semantic similarity is selected as the virtual place name, so that the semantic diversity of the virtual place name is ensured.
5. The method of claim 4, wherein the K-dominant privacy protection method based on approximate semantic query is characterized in that: a dummy processing method is provided for processing and selecting the candidate set in the approximate semantic query process, and the average execution time for selecting the candidate set can be well reduced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211496552.7A CN115982752B (en) | 2022-11-25 | 2022-11-25 | K-dominant privacy protection method based on approximate semantic query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211496552.7A CN115982752B (en) | 2022-11-25 | 2022-11-25 | K-dominant privacy protection method based on approximate semantic query |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115982752A true CN115982752A (en) | 2023-04-18 |
CN115982752B CN115982752B (en) | 2023-08-04 |
Family
ID=85961850
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211496552.7A Active CN115982752B (en) | 2022-11-25 | 2022-11-25 | K-dominant privacy protection method based on approximate semantic query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115982752B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116956349A (en) * | 2023-07-29 | 2023-10-27 | 哈尔滨理工大学 | K neighbor privacy protection query method based on time-dependent road network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110139214A (en) * | 2019-06-26 | 2019-08-16 | 湖南大学 | Vehicle position privacy protection method based on virtual location in a kind of VANET |
US20200104648A1 (en) * | 2018-09-28 | 2020-04-02 | Wipro Limited | Apparatus and method for detecting and removing outliers using sensitivity score |
CN111259434A (en) * | 2020-01-08 | 2020-06-09 | 广西师范大学 | Privacy protection method for individual preference position in track data release |
CN113946867A (en) * | 2021-10-21 | 2022-01-18 | 福建工程学院 | Position privacy protection method based on space influence |
EP3961422A1 (en) * | 2020-08-26 | 2022-03-02 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for extracting geographic location point spatial relationship |
-
2022
- 2022-11-25 CN CN202211496552.7A patent/CN115982752B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200104648A1 (en) * | 2018-09-28 | 2020-04-02 | Wipro Limited | Apparatus and method for detecting and removing outliers using sensitivity score |
CN110139214A (en) * | 2019-06-26 | 2019-08-16 | 湖南大学 | Vehicle position privacy protection method based on virtual location in a kind of VANET |
CN111259434A (en) * | 2020-01-08 | 2020-06-09 | 广西师范大学 | Privacy protection method for individual preference position in track data release |
EP3961422A1 (en) * | 2020-08-26 | 2022-03-02 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for extracting geographic location point spatial relationship |
CN113946867A (en) * | 2021-10-21 | 2022-01-18 | 福建工程学院 | Position privacy protection method based on space influence |
Non-Patent Citations (4)
Title |
---|
S. LIU AND S. WANG: "Trajectory Community Discovery and Recommendation by Multi-Source Diffusion Modeling", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 29, no. 4, pages 898 - 911 * |
时磊;潘巨龙;左正魏;: "一种基于代理服务的位置隐私保护方法", 中国计量大学学报, no. 03, pages 89 - 96 * |
牛红卫: "位置服务中查询隐私保护方法的研究", 信息科技, pages 15 - 30 * |
马明杰;杜跃进;李凤华;刘佳文;: "基于语义的位置服务隐私保护综述", 网络与信息安全学报, no. 12, pages 5 - 15 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116956349A (en) * | 2023-07-29 | 2023-10-27 | 哈尔滨理工大学 | K neighbor privacy protection query method based on time-dependent road network |
CN116956349B (en) * | 2023-07-29 | 2024-03-19 | 哈尔滨理工大学 | K neighbor privacy protection query method based on time-dependent road network |
Also Published As
Publication number | Publication date |
---|---|
CN115982752B (en) | 2023-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111083631B (en) | Efficient query processing method for protecting location privacy and query privacy | |
Liu et al. | Privacy-preserving task assignment in spatial crowdsourcing | |
Calderoni et al. | Location privacy without mutual trust: The spatial Bloom filter | |
US8099380B1 (en) | Blind evaluation of nearest neighbor queries wherein locations of users are transformed into a transformed space using a plurality of keys | |
Hu et al. | Messages in a concealed bottle: Achieving query content privacy with accurate location-based services | |
CN109615021B (en) | Privacy information protection method based on k-means clustering | |
KR20130064701A (en) | Privacy-preserving collaborative filtering | |
CN108600304A (en) | A kind of personalized location method for secret protection based on position k- anonymities | |
CN107169372B (en) | Privacy protection query method based on Voronoi polygon and Hilbert curve coding | |
CN112073444B (en) | Data set processing method and device and server | |
CN115982752A (en) | K domination privacy protection method based on approximate semantic query | |
Palmieri et al. | Spatial bloom filters: Enabling privacy in location-aware applications | |
CN112367662A (en) | Location offset-based all-fake k anonymous location privacy protection method in Internet of vehicles | |
CN115052286A (en) | User privacy protection and target query method and system based on location service | |
Zhang et al. | An efficient privacy-preserving multi-keyword query scheme in location based services | |
CN114637746A (en) | Conditional hiding trace query method, system and device based on privacy calculation | |
Gahi et al. | Privacy preserving scheme for location-based services | |
CN115767722A (en) | Indoor positioning privacy protection method based on inner product function encryption in cloud environment | |
Alotaibi et al. | UBLS: User-based location selection scheme for preserving location privacy | |
CN116502254A (en) | Method and device for inquiring trace capable of searching statistics | |
CN115905633A (en) | Image similarity retrieval method and system with privacy protection function | |
Li et al. | Gpsc: A grid-based privacy-reserving framework for online spatial crowdsourcing | |
Zhao et al. | A novel dummy-based KNN query anonymization method in mobile services | |
WO2022099893A1 (en) | Data query method, apparatus and system, and data set processing method | |
Patil et al. | GeoSecure-R: Secure computation of geographical distance using region-anonymized GPS data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |