CN110059148A - The accurate searching method that spatial key applied to electronic map is inquired - Google Patents

The accurate searching method that spatial key applied to electronic map is inquired Download PDF

Info

Publication number
CN110059148A
CN110059148A CN201910333876.0A CN201910333876A CN110059148A CN 110059148 A CN110059148 A CN 110059148A CN 201910333876 A CN201910333876 A CN 201910333876A CN 110059148 A CN110059148 A CN 110059148A
Authority
CN
China
Prior art keywords
node
keyword
bloom filter
subitem
inquired
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910333876.0A
Other languages
Chinese (zh)
Inventor
姚斌
阮珂
徐阳
过敏意
陈�全
李超
沈耀
冷静文
郑文立
林昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201910333876.0A priority Critical patent/CN110059148A/en
Publication of CN110059148A publication Critical patent/CN110059148A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of accurate searching methods that the spatial key applied to electronic map is inquired, it includes the following steps: S1, construct leaf node u first based on data set: the collection for setting the point for including in u is combined into up, each keyword t is mapped to the list object comprising t to construct the Inverted List of u, and collects the Bloom filter of the vocabulary building father node of u;S2 constructs non-leaf nodes p: setting the subitem of p as { c1 ..., cf }, the f is the subitem number that a node maximum can accommodate, by the vocabulary of the child node configuration node p of each subitem direction of p, and to the Bloom filter of each keyword insertion initialization;S3, building root node, the building for completing the IR-tree based on Bloom filter;The search index of IR-tree obtained by S4, building S3.The present invention is able to ascend its index efficiency to keyword, saves system resource.

Description

The accurate searching method that spatial key applied to electronic map is inquired
Technical field
The invention belongs to field of locating technology, relate in particular to a kind of be applied to electronics applied on Spark platform The accurate searching method of the spatial key inquiry of map.
Background technique
Development and mobile terminal recently as the communication technology are widely used, and location-based community service layer goes out not Thoroughly.Spatial key inquiry is to return using the geographical location information of user and multiple queries keyword as parameter and these are joined Number has the spatial object in space and the text degree of correlation.In an inquiry, effective index structure is constructed, can greatly be mentioned High search efficiency.For the index in a space, refer to that, by the location information of object, size shape etc. is arranged according to certain structure A kind of data structure of column.The state-of-the-art solution of approximation space keyword query is all based on the preferential index knot in space Structure, this scheme the problem is that, general space text object can all have at least dozens of keyword.And it is excellent based on space First structure is very inefficient when being indexed optimization to the space text object averagely with dozens of keyword.Therefore, such as What develops a kind of accurate searching method of novel spatial key inquiry, can be promoted in spatial key query process Its index efficiency to keyword saves system resource, is the direction that those skilled in the art need to study.The following are the application Involved in letter abbreviations annotation: another form that R-tree:B-tree develops to hyperspace, it is by spatial object It is divided by range, each node corresponds to a region and a disk page, stores its all son in the disk page of non-leaf node The region of the regional scope of node, all child nodes of non-leaf node is all fallen within its regional scope.IR-tree: to fall Based on row's index and R-tree index, the computation model of text similarity is solved by inverted index.BFIR-tree: it is based on The IR-tree that mass data processing is realized;CBFIR-tree: dynamic BFIR-tree;S2I-V structure: to the pass of different frequency The model structure that key word should be handled differently;EBRQ: the range query for including based on keyword;ABRQ: based on approximate keyword The k nearest neighbor query for including;Falsepositive: false detection rate;.KNN algorithm: closing on algorithm, is Data Mining Classification technology In one of simplest method.I-Node: one leaf R tree node, it, which is stored, is mapped to spatial key for each keyword The Inverted List of object word.
Summary of the invention
The invention solves provide a kind of accurate searching method that the spatial key applied to electronic map is inquired, It is able to ascend its index efficiency to keyword, saves system resource.
The technical scheme adopted is as follows:
A kind of accurate searching method that the spatial key applied to electronic map is inquired comprising following steps: S1 constructs leaf Child node u: the collection for setting the point for including in u is combined into up, and each keyword t is mapped to the list object comprising t to construct the row of falling of u List, and collect the Bloom filter of the vocabulary building father node of u;S2, construct non-leaf nodes p: set the subitem of p as { c1 ..., cf }, the f are the subitem number that a node maximum can accommodate, and the child node that each subitem of p is directed toward is constituted The vocabulary of node p, and to the Bloom filter of each keyword insertion initialization;S3, building root node are completed to be based on the grand mistake of cloth The building of the IR-tree of filter;S4 constructs the search index of the IR-Tree structure based on Bloom filter.
Preferably, in the accurate searching method that the above-mentioned spatial key applied to electronic map is inquired: step S4 packet Including following steps: S41:S61: given eBKQ inquiry formula are as follows: eBKQ={ Qs=(τ, ε), Qt }, the Qs are steric requirements, Qt is a set of keyword, checks that whether Qs is located at query region in present node, if Qs is located in query region, jumps to S23, if Qs, not in query region, recurrence checks the child node of the node;S42: whether each keyword in detection Qt is deposited It is in the Bloom filter of the node, if otherwise beta pruning node, if jumping to S43;S43: each keyword is reflected It is mapped to its corresponding record list, and these lists are carried out to take intersection operation, to obtain last disaggregation.
More there is a choosing, in the accurate searching method that the above-mentioned spatial key applied to electronic map is inquired, step S41 It is middle that eBKQ inquiry is realized using KNN algorithm.
In the above scheme: R-tree is a kind of for handling the data structure of multidimensional data, by pressing object space Range divides, the corresponding region of each node and a disk page, stores its all child node in the disk page of non-leaf node Regional scope, the region of all child nodes of non-leaf node is all fallen within its regional scope;In the disk page of leaf node Store the boundary rectangle of all spatial objects within its regional scope.
By using above-mentioned technical proposal: the present invention is based on Bloom filter mistake in carrying out spatial key search process Most of child node in R-tree is filtered, and is verified to accurate match is carried out by filtered each child node.To keep away Exempt to traverse all nodes in access R-tree every time, is achieved in and promotes its index efficiency to keyword, save system money The technical effect in source.
Detailed description of the invention
Present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments:
Fig. 1 is influence schematic diagram when inquiry percentage gradually increases in the present invention to this programme query region;
Fig. 2 is influence schematic diagram when keyword number gradually increases in the present invention to this programme query region.
Specific embodiment
In order to illustrate more clearly of technical solution of the present invention, it is further described below in conjunction with each embodiment.
Embodiment 1:
A kind of accurate searching method that the spatial key applied to electronic map is inquired of the present invention comprising following steps:
S1, construct leaf node u: the collection for setting the point for including in u is combined into up, and each keyword t is mapped to the list object comprising t Vocabulary to construct the Inverted List of u, and collect u constructs the Bloom filter of father node;
S2 constructs non-leaf nodes p: setting the subitem of p as { c1 ..., cf }, the f is the subitem that a node maximum can accommodate Number, the vocabulary for the child node configuration node p that each subitem of p is directed toward, and to the Bu Long of each keyword insertion initialization Filter;
S3, building root node, the building for completing the IR-tree based on Bloom filter;
S41: given eBKQ inquires formula are as follows: eBKQ={ Qs=(τ, ε), Qt }, the Qs are steric requirements, and Qt is one group crucial Word checks that whether Qs is located at query region in present node, if Qs is located in query region, jumps to S23, if Qs is not being looked into It askes in region, then recurrence checks the child node of the node;
S42: each keyword in detection Qt whether there is in the Bloom filter of the node, if otherwise beta pruning node, if It is to jump to S43;
S43: being mapped to its corresponding record list for each keyword, and carry out taking intersection operation to these lists, to obtain Last disaggregation.
Wherein, eBKQ inquiry is realized using KNN algorithm in step S41: by safeguarding a priority query, according to every A distance-taxis that given inquiry place is recorded.It will be added in queue by the record of text matches;Then by team out Record is added in result to the end, then stops search until obtaining k result or queue for sky.
In above process:
One Bloom filter by the set S being made of m element be mapped to a n bit group (with B [1] ..., B [n] } it represents, 0) everybody is initialized to.Bloom filter is based on the hash function race H being made of k individual Hash function, often Each element in the U of given element space is all mapped to a random number v ∈ [1, n] by a hash function.It will be each in set S A element is mapped to respective value by k hash function and is 1 by the corresponding position of binary array.If necessary to inquire some Element t checks binary digit B [hi (y)] (the i ∈ corresponding to element t in the array of Bloom filter whether in set S [1, k]) it whether is all 1.If it is all 1, element t has very big probability to be present in S, and otherwise element t is absolutely not present in S In.
Pass through the size of selection suitable hash function number k and binary array B, it is ensured that Bloom filter compared with Low miscalculation rate realizes the efficient beta pruning ability of B-Node.Meanwhile setting multiple queries keyword can further decrease B-Node Falsepositive probability.Assuming that some Bloom filter is based on k hash function and m binary arrays to n member Element is indexed, and assumes that the keyword number of current queries is s, then falsepositive probability are as follows:
When n/m=10 and k=7, the probability of false positive rate is only (0.008) s.And I-Node is as last One step detects the correctness that can guarantee BFIR-tree, it means that it has 100% recall.
Control experiment:
Test and execute on the cluster being made of 17 nodes configured there are two types of tools: (1) 8 have 6 core Intel Xeon The machine of E5-2603 v3 1.60GHz processor and 20GB RAM;(2) 26 core Intel Xeon E5-2620 of outfit The machine of 2.00GHz processor and 56GB RAM;(3) 76 core Intel Xeon E5-2609 1.90GHz processors of outfit and The machine of 16GB RAM.We select the machine of a type (2) as host node, and other machines is used as from node.Each from Node carries out subsequent calculating using 15GB memory and all available 6 kernels.All nodes are in Ubuntu 14.04.2 LTS It is run in system, and Hadoop 2.4.1 and Spark 1.3.0 is installed.It is carried out in two true mass data collections related Experiment.
As shown in Figure 1:
It is gradually increased from 1% as 20% by the way that percentage will be inquired and shows the influence of query region.These four structures exhibits Relatively slow performance declines (i.e. throughput of system and average retardation) out.Bigger query region can usually introduce higher Searching cost, but due to the additional beta pruning ability of text matches, cost only slightly increases.
It is as shown in Figure 2:
BFIR tree performance is good as IR-tree.Meanwhile BFIR-tree is better than IR-tree in terms of space expense.And CBFIR-tree is substantially similar to BFIR tree, the performance gap very little between them.When keyword quantity increases, S2I-V Performance accordingly promoted.When the publication spatial key inquiry of single keyword is used only in user, S2I-V and other three structures It is similar.When the quantity of key word of the inquiry increases, S2I-V reaches significant by the beta pruning ability using non-frequent keyword Performance boost.
Therefore, technical solution of the present invention is suitable for being served by based on geographical location such as public comment.
The above, only specific embodiments of the present invention, but scope of protection of the present invention is not limited thereto, it is any ripe The technical staff of art technology is known in technical scope disclosed by the invention, any changes or substitutions that can be easily thought of, should all contain Lid is within protection scope of the present invention.Protection scope of the present invention is subject to the scope of protection of the claims.

Claims (3)

1. a kind of accurate searching method that the spatial key applied to electronic map is inquired, which is characterized in that including walking as follows It is rapid:
S1, construct leaf node u: the collection for setting the point for including in u is combined into up, and each keyword t is mapped to the list object comprising t Vocabulary to construct the Inverted List of u, and collect u constructs the Bloom filter of father node;
S2 constructs non-leaf nodes p: setting the subitem of p as { c1 ..., cf }, the f is the subitem that a node maximum can accommodate Number, the vocabulary for the child node configuration node p that each subitem of p is directed toward, and to the Bu Long of each keyword insertion initialization Filter;
S3, building root node, the building for completing the IR-tree based on Bloom filter;
S4 constructs the search index of the IR-Tree structure based on Bloom filter.
2. the accurate searching method inquired as described in claim 1 applied to the spatial key of electronic map, it is characterised in that: Step S4 includes the following steps:
S41:S61: given eBKQ inquires formula are as follows: eBKQ={ Qs=(τ, ε), Qt }, the Qs are steric requirements, and Qt is one group Keyword checks that whether Qs is located at query region in present node, if Qs is located in query region, jumps to S23, if Qs is not In query region, then recurrence checks the child node of the node;
S42: each keyword in detection Qt whether there is in the Bloom filter of the node, if otherwise beta pruning node, if It is to jump to S43;
S43: being mapped to its corresponding record list for each keyword, and carry out taking intersection operation to these lists, to obtain Last disaggregation.
3. the accurate searching method inquired as claimed in claim 2 applied to the spatial key of electronic map, which is characterized in that EBKQ inquiry is realized using KNN algorithm in step S41.
CN201910333876.0A 2019-04-24 2019-04-24 The accurate searching method that spatial key applied to electronic map is inquired Pending CN110059148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910333876.0A CN110059148A (en) 2019-04-24 2019-04-24 The accurate searching method that spatial key applied to electronic map is inquired

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910333876.0A CN110059148A (en) 2019-04-24 2019-04-24 The accurate searching method that spatial key applied to electronic map is inquired

Publications (1)

Publication Number Publication Date
CN110059148A true CN110059148A (en) 2019-07-26

Family

ID=67320454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910333876.0A Pending CN110059148A (en) 2019-04-24 2019-04-24 The accurate searching method that spatial key applied to electronic map is inquired

Country Status (1)

Country Link
CN (1) CN110059148A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858613A (en) * 2020-07-31 2020-10-30 湖北亿咖通科技有限公司 Service data retrieval method
CN116821279A (en) * 2023-06-06 2023-09-29 哈尔滨理工大学 Space keyword query method and system with exclusion keywords

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160019248A1 (en) * 2014-07-21 2016-01-21 Oracle International Corporation Methods for processing within-distance queries
CN107491497A (en) * 2017-07-25 2017-12-19 福州大学 Multi-user's multi-key word sequence of any language inquiry is supported to can search for encryption system
CN108337085A (en) * 2018-01-03 2018-07-27 西安电子科技大学 A kind of newer approximate adjacent retrieval construction method of support dynamic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160019248A1 (en) * 2014-07-21 2016-01-21 Oracle International Corporation Methods for processing within-distance queries
CN107491497A (en) * 2017-07-25 2017-12-19 福州大学 Multi-user's multi-key word sequence of any language inquiry is supported to can search for encryption system
CN108337085A (en) * 2018-01-03 2018-07-27 西安电子科技大学 A kind of newer approximate adjacent retrieval construction method of support dynamic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐阳,王志杰,钱诗友: "基于分布式平台Spark的空间文本查询分析", 《华东师范大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858613A (en) * 2020-07-31 2020-10-30 湖北亿咖通科技有限公司 Service data retrieval method
CN116821279A (en) * 2023-06-06 2023-09-29 哈尔滨理工大学 Space keyword query method and system with exclusion keywords
CN116821279B (en) * 2023-06-06 2024-06-07 哈尔滨理工大学 Space keyword query method and system with exclusion keywords

Similar Documents

Publication Publication Date Title
Wei et al. AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data
US10949467B2 (en) Random draw forest index structure for searching large scale unstructured data
KR101266358B1 (en) A distributed index system based on multi-length signature files and method thereof
US11106708B2 (en) Layered locality sensitive hashing (LSH) partition indexing for big data applications
US7512282B2 (en) Methods and apparatus for incremental approximate nearest neighbor searching
CN104834693A (en) Depth-search-based visual image searching method and system thereof
US11281645B2 (en) Data management system, data management method, and computer program product
CN105843841A (en) Small file storage method and system
JP2013519152A (en) Text classification method and system
CN102521386A (en) Method for grouping space metadata based on cluster storage
CN106095920B (en) Distributed index method towards extensive High dimensional space data
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN109033340A (en) A kind of searching method and device of the point cloud K neighborhood based on Spark platform
CN108549696B (en) Time series data similarity query method based on memory calculation
CN104615734B (en) A kind of community management service big data processing system and its processing method
Budgaga et al. A framework for scalable real‐time anomaly detection over voluminous, geospatial data streams
CN109635069B (en) Geographic space data self-organizing method based on information entropy
Azri et al. Dendrogram clustering for 3D data analytics in smart city
CN110059149A (en) Electronic map spatial key Querying Distributed directory system and method
CN104933143A (en) Method and device for acquiring recommended object
CN110059148A (en) The accurate searching method that spatial key applied to electronic map is inquired
CN110046216A (en) The proximity search method that spatial key applied to electronic map is inquired
Abbasifard et al. Efficient indexing for past and current position of moving objects on road networks
CN109218366A (en) Monitor video temperature cloud storage method based on k mean value
Souza et al. Unsupervised active learning techniques for labeling training sets: an experimental evaluation on sequential data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190726