CN115329221B - Query method and query system for multi-source geographic entity - Google Patents

Query method and query system for multi-source geographic entity Download PDF

Info

Publication number
CN115329221B
CN115329221B CN202211223877.8A CN202211223877A CN115329221B CN 115329221 B CN115329221 B CN 115329221B CN 202211223877 A CN202211223877 A CN 202211223877A CN 115329221 B CN115329221 B CN 115329221B
Authority
CN
China
Prior art keywords
query
entity
geographic
geographic information
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211223877.8A
Other languages
Chinese (zh)
Other versions
CN115329221A (en
Inventor
赵帅
程渤
秦唯人
陈俊亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202211223877.8A priority Critical patent/CN115329221B/en
Publication of CN115329221A publication Critical patent/CN115329221A/en
Application granted granted Critical
Publication of CN115329221B publication Critical patent/CN115329221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a query method and a query system for a multisource geographic entity, comprising the following steps: acquiring aligned entity pairs; performing entity pair fusion on the aligned entity pairs; calculating the spatial relationship among all geographic entities by using a spatial calculation technology for the aligned and fused entity data sets to obtain a geographic information map; and realizing the compound condition query of the geographic information based on the geographic information map and the spatial database. By the aid of the scheme, the geographic entity alignment effect can be improved, and geographic information query performance can be optimized.

Description

Query method and query system for multi-source geographic entity
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a query method and a query system for a multisource geographic entity.
Background
With the popularization of the mobile internet, the map application has become an indispensable tool for people to travel daily, and the map application provides geographic information retrieval service for users and simultaneously faces the challenges of users for more diversification and complexity of the service. In reality, geographic entities generally have rich attributes, and have complex spatial relationships, so that the current requirement of users cannot be met by using a single condition to search geographic information, and users need a geographic information system (Geographic Information System, abbreviated as GIS) to have the function of supporting compound condition query so as to meet the diversified complex geographic information search requirements of the users.
At present, although the searching capability of the geographic information system is expected to be higher by the user, map applications with the widest scope of use on the market, such as a Goodyear map, a Baidu map, a Google map and the like, cannot meet the requirement of the user for inquiring about parks with subway stations and at least two entrances and exits in all nearby 300 meters in Beijing, and do not support the function of inquiring geographic information by using compound conditions. Therefore, research on geographic information systems supporting compound condition query is very important, however, a solid geographic information data base is required for developing compound condition query functions, and data needs to possess a large number of geographic entities with rich attributes on one hand, and complex spatial relationships among the geographic entities need to be contained on the other hand, which cannot be provided by a single map data manufacturer at present. Therefore, constructing a geographic information map formed by combining geographic entities and spatial relationships becomes a key for realizing the compound condition query function of the geographic information system.
The construction of the geographic information map requires a large number of geographic entities with rich attributes and various spatial relations among the entities, wherein the geographic entities with rich attributes need to align multi-source geographic data, the currently commonly used entity alignment algorithm does not utilize the spatial characteristics of the geographic entities, a partition index mechanism for the geographic data is lacked, the scale of candidate entity pairs is overlarge, calculation is not easy, a similarity measurement method is not specially designed for the geographic entities, the performance of the selected classifier is not strong, and finally the alignment effect of the geographic entities is poor. The various spatial relationships among the entities require designing a complete geographic information map model, and determining the spatial relationships among all geographic entities by using a spatial calculation method.
Disclosure of Invention
The invention aims to provide a query method and a query system for multi-source geographic entities, so as to improve the alignment effect of the geographic entities and optimize the query performance of geographic information.
In order to achieve the above object, in one aspect, the present invention provides a query method for a multi-source geographic entity, including:
acquiring aligned entity pairs;
performing entity pair fusion on the aligned entity pairs;
calculating the spatial relationship among all geographic entities by using a spatial calculation technology for the aligned and fused entity data sets to obtain a geographic information map;
and inquiring the geographic information based on the geographic information map and the spatial database under the compound condition.
Optionally, obtaining the aligned entity pair includes:
collecting multisource geographic entity data;
partitioning all the geographic entity data by a space index method to obtain candidate entity pairs;
screening the candidate entity pairs to obtain aligned entity pairs.
Optionally, collecting the multi-source geographic entity data includes:
preprocessing the collected original geographic entity data, wherein the preprocessing comprises data cleaning and unifying a coordinate format.
Optionally, implementing the compound condition query for geographic information based on the geographic information map includes:
based on the query rule, the mode of the mixed query is used for query, and a query result is obtained.
Optionally, querying using the schema of the hybrid query includes:
judging whether the current query request is in a historical query database, if so, calling a result in the historical query database, otherwise, analyzing the query request to obtain a query target and a compound relation;
inquiring a geographic information map in a map database based on the inquiring target and the compound relation to obtain all the atomic relation sets, and searching a target entity set through the entity in the atomic relation set;
if the geographic information map does not meet the query condition, judging whether the target entity set is empty, if so, searching in the spatial database is needed to be continued, otherwise, the result can be returned to the user.
Optionally, searching in the spatial database includes:
inquiring the analyzed inquiry targets and the composite relations to obtain an atomic entity set and an atomic relation set;
and carrying out combined space calculation on the entity set relation set by utilizing a space function provided by the space database, thereby obtaining a target entity set.
Optionally, after obtaining the query result, the method further includes: and updating the current query result into the historical query database.
In order to achieve the above object, the present invention further discloses a query system for a multi-source geographic entity, including:
the acquisition module is used for acquiring the aligned entity pairs;
the fusion module is used for carrying out entity pair fusion on the aligned entity pairs;
the calculation module is used for calculating the spatial relationship among all geographic entities by using a spatial calculation technology for the aligned and fused entity data sets to obtain a geographic information map;
and the query module is used for realizing compound condition query of the geographic information based on the geographic information map and the spatial database.
Optionally, the acquiring module includes:
the acquisition module is used for acquiring multi-source geographic entity data;
the partition module is used for partitioning all the geographic entity data by a space index method to obtain candidate entity pairs;
and the screening module is used for screening the candidate entity pairs to obtain aligned entity pairs.
Optionally, the system further comprises a data storage module, wherein the data storage module comprises: the map database is used for storing geographic information maps, wherein the geographic information maps comprise geographic entities and spatial relations among the geographic entities;
the spatial database stores all geographic entity data.
The invention has the technical effects that: the invention discloses a query method for a multisource geographic entity, which comprises the following steps: acquiring aligned entity pairs; performing entity pair fusion on the aligned entity pairs; calculating the spatial relationship among all geographic entities by using a spatial calculation technology for the aligned and fused entity data sets to obtain a geographic information map; and realizing the compound condition query of the geographic information based on the geographic information map and the spatial database. The invention fully utilizes the spatial characteristics of the geographic entities, uses a spatial calculation method to determine the spatial relationship among all the geographic entities, designs a complete geographic information map model through various spatial relationships among the entities to improve the alignment effect of the geographic entities, and further optimizes the geographic information inquiry performance by utilizing the geographic information map and a spatial database.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a method for multi-source geographic entity alignment provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of recursive cutting acquisition data provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of the XGBoost classifier according to an embodiment of the present invention;
FIG. 4 is a diagram of a geographic information map model provided by an embodiment of the present invention;
FIG. 5 is a flowchart for constructing a geographic information map according to an embodiment of the present invention;
FIG. 6 is a diagram of an overall architecture of a geographic information system provided by an embodiment of the present invention;
fig. 7 is a flow chart of a compound condition query function provided by an embodiment of the present invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
As shown in fig. 1, in this embodiment, a query method for a multi-source geographic entity is provided, including:
s100, acquiring aligned entity pairs;
s200, fusing the entity pairs of the aligned entity pairs; the method specifically comprises the following steps:
entity pair fusion refers to matching and merging attributes of a pair of geographic entities which are already aligned, so that fusion of entity pairs is realized. For example, when the attributes s and t of the entity pair are matched, if the attributes s and t are found to be different from each other, the attributes s and t only need to be spliced; if the two attributes are found to have the same character fragments and the unique character fragments, the attributes are required to be combined, the principle of combining is that the two attributes are required to be matched, the two attributes are consistent and not in conflict, namely the sequences of the same character fragments in the two attributes are corresponding to each other one by one and cannot be reversed, so that the two attributes can be combined in sequence, otherwise, the two attributes can only be spliced. After all the attributes of a pair of aligned entities are combined, the task of the entity fusion is completed;
s300, calculating the spatial relation among all geographic entities by using a spatial calculation technology for the aligned and fused entity data sets, and introducing the spatial relation into a map database Neo4j in batches by using a Neo4j-admin report tool to obtain a geographic information map;
specifically, as shown in fig. 5, the method can be mainly divided into two parts of parallelization calculation of geographic entity alignment and geographic entity spatial relationship.
The geographic information map is shown in fig. 4, on one hand, geographic entities with a certain regional total quantity and rich attributes are needed to be included, and on the other hand, spatial relations among the entities are needed to be established, so that the function of inquiring by using multiple spatial relation constraints by the geographic retrieval system can be supported. The map mainly comprises geographic entities and spatial relations between the entities, the spatial relations can reflect interaction between the entities and can be mainly divided into two types of topological relations and distance relations, wherein the topological relations mainly comprise separation, intersection, inclusion and the like, and the distance relations are quantitative expressions of the spatial separation lengths of the geographic entities and are generally expressed by using the shortest distance of straight lines between the central positions of the two entities.
Wherein, the determining of the spatial relationship between the geographic entities comprises:
in the embodiment of the invention, the distance between two geographic entities is calculated by adopting a space calculation method so as to determine the spatial relationship between the two geographic entities, and under a spherical coordinate system, the longitude and latitude are in the form of digital coordinates, and the two geographic entities A and B on the earth are assumed to be respectively represented by A 𝑖 And A j Representing longitude and longitude of entity aLatitude, B 𝑖 And B j Representing the longitude and latitude of entity B, the calculation process is as follows:
the first step, converting the angle of longitude and latitude of the entity A and the entity B into radian, as shown in the formula (7) and the formula (8), RA i And RA j Radian, RB representing longitude and latitude of entity a, respectively i And RB (RB) j Radians representing longitude and latitude, respectively, of entity B:
secondly, calculating the difference between the longitudes and the latitudes of the entities A and B, wherein the difference between the longitudes is represented by M, and the difference between the latitudes is represented by N, as shown in a formula (9) and a formula (10):
thirdly, calculating metric distance between the entities A and B, wherein a calculation formula is shown as (11), r is the earth radius, the distance from the equator to the earth center, and d is the metric space distance between the entities A and B:
thus, the spatial distance between the geographic entities A and B can be obtained, the distance quantitatively represents the distance relation between the two entities, and meanwhile, the topological relation between the two entities can be determined by comparing the distance with the evolution of the entity area, so that the spatial relation between the two geographic entities is established.
And S400, inquiring the geographic information based on the geographic information map and the spatial database by using the compound condition.
Further optimizing the scheme, S100, obtaining the aligned entity pair includes:
collecting multisource geographic entity data; specifically:
the sources of the data are obtained from a plurality of map manufacturers through the modes of interface access and file downloading, and a recursive cutting method is adopted in the collection, as shown in fig. 2:
before requesting a geographic entity of a region, it is first detected whether the number of entities of the region exceeds a threshold value: if the number of the rectangular areas exceeds the number of the rectangular areas, nine-equal-division cutting is needed to be carried out on the rectangular areas, the cut partition is stored in a stack, and the rectangular areas with the cut rectangular areas are requested to be cut in a recursion mode; if the data volume of the area still exceeds the threshold value, continuing cutting, and repeating the process continuously; if the physical data of the partition is not exceeded, the physical data of the partition can be directly requested to be acquired, and when the partition does not exist in the stack, the regional data acquisition is completed.
Partitioning all the geographic entity data by a space index method to obtain candidate entity pairs; specifically:
the method has the advantages that the entity two-dimensional longitude and latitude coordinates are encoded into the one-dimensional character string by using the hash technology, the longer the encoding length is, the finer the region division is, the data set can be divided into the partitions with the spatial precision of 600 meters by performing 6-bit encoding on the geographic entity longitude and latitude coordinates, the geographic entities in each partition are not much, the scale of the candidate entity pair constructed based on the geographic entities is greatly reduced, and therefore the efficiency of the entity alignment process is improved.
Screening the candidate entity pairs to obtain aligned entity pairs, wherein the method specifically comprises the following steps:
and correspondingly designing a similarity calculation method aiming at the characteristics of different attributes of the geographic entities, constructing characteristic vectors according to the similarity calculation method, and screening out aligned geographic entities through ensemble learning.
The attributes include non-spatial attributes and spatial attributes. Aiming at the non-spatial attribute, adopting an editing distance mixed similarity calculation method; for the spatial attribute, a geometric distance threshold method is adopted to calculate the similarity between two geographic entity positions. Specifically:
for non-spatial properties such as the name and address of a geographical entity, the numbers and letters they contain often have a relatively important numbering meaning such as "garden cell No. 20 building", while they sometimes also reference surrounding entities for positioning such as "kendychikun store". It can be seen that these non-spatial properties are very sensitive to both word Frequency and word order, so the invention uses a similarity calculation method of mixing TF-IDF (Term Frequency-Inverse Document Frequency) word Frequency-inverse document Frequency with J-W (Jaro-Winkler Distance) edit Distance to measure the non-spatial properties. The word frequency in TF-IDF is usually 1, the inverse document frequency is affected by the whole document and has different weights, and the J-W editing distance similarity function is a variation of the Jaro editing distance similarity function, so that the weighted editing distance between non-spatial attributes can be calculated by combining the TF-IDF and the J-W similarity measurement method.
For two attributes A and B to be matched, as shown in formula (1), m is a TF-IDF weighted value of the matching number of attribute strings, formula (2) is a Jaro edit distance similarity function, t is the number of times of position exchange of a matching part, formula (3) is a J-W edit distance similarity function, l is the matching part length, p is a range factor constant, a weight for adjusting prefix matching is usually set to 0.1, bt is a boost threshold, and is usually set to 0.7, and when the value exceeds the Jaro distance, the J-W distance is the Jaro distance.
For the purpose ofThe spatial attribute, namely longitude and latitude coordinates of the geographic entities, is defined under a spherical coordinate system of a three-dimensional space, any position of the earth can be uniquely determined, the distance between the two geographic entities can be generally calculated by using a Haverine formula, but the comparison of the two entity positions does not need to be accurately calculated, so that the embodiment of the invention adopts a geometric distance threshold method to calculate the similarity between the two geographic entity positions. For two entity coordinates a (x i ,y i ) And B (x) j ,y j ) Firstly, calculating Euclidean distance d (i, j) between two coordinates as shown in a formula (4), then, setting a proper distance threshold d according to the distribution of the geographic entity positions of a data set and the data granularity as shown in a formula (5), and comparing the Euclidean distance with the distance threshold d to calculate the similarity Sim of the spatial properties of the two entities Dis
The basic structure of the feature vector in the embodiment of the invention is shown in a formula (6), and for a certain attribute a and b of the entity pair, each dimensional feature is each attribute value, the intersection, the union and the difference of the attributes, the attribute intersection occupies the respective ratio, and the J-W distance or the Dis geometric distance.
The XGBoost classifier is adopted to complete the screening co-work of candidate geographic entity pairs, and is a lifting tree model, and decision tree models of a plurality of weak classifiers are integrated together, so that a strong classifier model is formed. The overall working principle is shown in fig. 3, a decision tree is trained and generated by continuously performing feature splitting and integration, the decision tree is a weak classifier C1, on the basis, a tree is trained to simulate the residual error of the last prediction result, so that a weak classifier C2 with better effect is obtained, the above processes are repeated continuously, n weak classifiers with better and better effects can be obtained, and the n weak classifiers are integrated to finally form a strong classifier C. Therefore, the training data X and the training label Y are used for continuously learning and training, and finally an optimal XGBoost integrated learning classification model is formed, so that the prediction classification work of geographic entity alignment is completed.
Further, before performing the candidate entity alignment, the method includes: firstly judging whether a classifier model exists in an algorithm, if not, constructing a positive sample and a negative sample by using a seed set tool and utilizing a plurality of reasonable geographic rules, and training the model again.
Further optimizing the scheme, collecting the multisource geographic entity data comprises the following steps:
preprocessing the collected original geographic entity data, wherein the preprocessing comprises data cleaning and unifying a coordinate format.
Specifically, the data attribute is cleaned, different processing modes are adopted for different attributes, entity records with missing names or coordinate attributes can be filtered, the subsequent entity alignment process is not participated, meaningless words or special characters in other attributes can be removed by adopting a mode matching method, and the cleaning effect of the data directly influences the subsequent entity alignment effect;
because entity data provided by different map manufacturers respectively adopts different coordinate systems, the coordinate conversion codes of corresponding open sources are needed to be used for uniformly converting all entities into the same coordinate system;
the simplified and traditional Chinese conversion is realized, because some collected map data languages are mixed, and some entity attributes have mixed forms of simplified Chinese, traditional Chinese, english and the like, the traditional Chinese in the attributes can be converted into the simplified Chinese by using an openccpy toolkit of Python, and English is temporarily reserved without any treatment.
Further optimizing the scheme, S400, realizing the compound condition inquiry geographic information based on the geographic information map comprises the following steps:
based on the query rule, the mode of the mixed query is used for query, and a query result is obtained. The method specifically comprises the following steps:
the definition of the query rule is mainly composed of two parts: the first part is a query target object, which is indispensable in querying; the second part is a constraint of the composite condition, which is optional at the time of the query. For one-time compound condition query, multiple condition screening is needed to return a query result to a user, wherein the condition screening in the first dimension is to screen by utilizing the static or dynamic attribute of a query target object, such as area, entrance and exit, flow of people and the like; the condition screening of the second dimension is to utilize the spatial relation between other geographic entity sets and the query target object to carry out screening, and the condition screening can be zero or more.
Further optimizing the scheme, using the mode of the mixed query to query comprises:
judging whether the current query request is in a historical query database, if so, calling a result in the historical query database, otherwise, analyzing the query request to obtain a query target and a compound relation;
inquiring a geographic information map in a map database based on the inquiring target and the compound relation to obtain all the atomic relation sets, and searching a target entity set through the entity in the atomic relation set;
if the geographic information map does not meet the query condition, judging whether the target entity set is empty, if so, searching in the spatial database is needed to be continued, otherwise, the result can be returned to the user.
Further optimizing the scheme, searching the space database comprises the following steps:
inquiring the analyzed inquiry targets and the composite relations to obtain an atomic entity set and an atomic relation set;
and carrying out combined space calculation on the entity set relation set by utilizing a space function provided by the space database, thereby obtaining a target entity set. Since the query results satisfying all the condition constraints are not necessarily obtained, the query results are arranged in a description manner according to the number of satisfied conditions in accordance with the query principle satisfying the most conditions.
Further optimizing the scheme, and after obtaining the query result, further comprising: and updating the current query result into a historical query database, and returning the current query result to the user for display.
The invention also discloses a query system for the multisource geographic entity, which comprises:
the acquisition module is used for acquiring the aligned entity pairs;
the fusion module is used for carrying out entity pair fusion on the aligned entity pairs;
the calculation module is used for calculating the spatial relationship among all geographic entities by using a spatial calculation technology for the aligned and fused entity data sets to obtain a geographic information map;
and the query module is used for realizing compound condition query of the geographic information based on the geographic information map and the spatial database.
The working process is as shown in fig. 7, and includes:
(1) After a user initiates a compound condition query, the query request is forwarded to a query module for processing;
(2) The query module firstly searches whether the same query result is in the historical query database, if yes, the query module directly returns the result to the user, otherwise, the query process is continued;
(3) Query parsing is performed on the request according to rules, the request is parsed into constraints on query target objects and constraints on corresponding complex conditions, for example, query "find a park with beautiful name, require its area to be greater than 500 square meters, score to be greater than 4, and have at least two subway stations in the vicinity of 500 meters, have restaurants with a people flow of less than 30 in the vicinity of 1000 meters", it can be parsed into the following form:
querying a target: type { type: park squares, keywords: beautiful, [ "area >500" "" score >4 "]) ]
Composite relationship 1: { relationship points to entity: type { type: subway station, keyword: "number >2" ], relationship: "st_dwith", relationship value: 500}
Composite relationship 2: { relationship points to entity: type { type: restaurant, keywords: "people stream <30" ], relationship: "st_dwith", relationship value: 1000}
Judging whether the query contains a distance relation of more than 500 meters according to the analyzed content, if so, searching the PostGIS and calculating the space relation in real time, otherwise, only searching the geographic information map in Neo4 j. Wherein, a single query target corresponds to the atomic entity set and a single compound relationship corresponds to the atomic relationship set;
(4) Regarding the query of the graph database Neo4j, firstly, all the compound relations are queried, all the atomic relation sets can be obtained, and the target entity set is searched through the entity in the atomic relation set, and as the result meeting the query condition is not necessarily required in the geographic information map, whether the target entity set is empty or not is also required to be judged, if so, the PostGIS is required to be searched continuously, otherwise, the result can be returned to the user.
(5) Regarding the query of a postGIS of a spatial database, firstly, the query of the analyzed query target and the compound relation is carried out to obtain an atomic entity set and an atomic relation set, then, the space function provided by the postGIS is utilized to carry out the combined space calculation on the entity set relation set, thus, the target entity set is obtained, and the query result meeting all condition constraints is not necessarily obtained, so that the query result is arranged in a description manner according to the quantity of the satisfied conditions according to the query principle meeting the most condition;
(6) And updating the query result to a historical query database, and returning the query result to the user for display.
Further optimizing scheme, the acquisition module includes:
the acquisition module is used for acquiring multi-source geographic entity data;
the partition module is used for partitioning all the geographic entity data by a space index method to obtain candidate entity pairs;
and the screening module is used for screening the candidate entity pairs to obtain aligned entity pairs.
Further optimizing scheme, the system also includes a data storage module, the data storage module includes: the map database is used for storing geographic information maps, wherein the geographic information maps comprise geographic entities and spatial relations among the geographic entities; the map database stores complete geographic information maps, but because the maps do not completely contain all spatial relations, in order to meet the requirement of users for inquiring more spatial relations, the spatial database is also required to be used as a data supplement, and the map database also stores the full amount of geographic entity data, so that when an inquiring request initiated by a user cannot hit the map, the spatial relations inquired by the user can be calculated in real time through the spatial database.
As shown in FIG. 6, the system adopts a structure design that a browser end and a server end (B/S) are separated, and a request to the server is set to be in a stateless mode, so that the system can be flexibly expanded to be in a C/S (client/server) mode in the future. The browser side provides a visual page, so that a user can perform interactive operation with the geographic information system, the server side can process various requests from the browser side, process business logic and data information, and return a response result to the browser for display.
The data storage layer adopts a mode of mixed storage of a map database Neo4j and a spatial database postGIS to store and manage all data of the system; the data access layer mainly realizes the ORM mapping of each entity object in the system and each table in the database, and receives various complex queries from the service layer; the business logic layer is mainly responsible for processing all requests from the control layer, and realizes the business logic of all functions in the geographic information system; the control layer is mainly responsible for controlling all interactions between the front end and the back end of the Web by managing various interfaces; the visualization layer is mainly responsible for realizing an image interface for the system to directly interact with the user.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for querying a multi-source geographic entity, comprising:
acquiring aligned entity pairs;
performing entity pair fusion on the aligned entity pairs;
calculating the spatial relationship among all geographic entities by using a spatial calculation technology for an entity pair data set formed by all the aligned and fused entity pairs to obtain a geographic information map;
based on the geographic information map and the spatial database, realizing compound condition query of geographic information;
the method for realizing the compound condition query of the geographic information based on the geographic information map comprises the following steps:
based on the query rule, performing query by using a mode of mixed query to obtain a query result;
query using the schema of the hybrid query includes:
judging whether the current query request is in a historical query database, if so, calling a result in the historical query database, otherwise, analyzing the query request to obtain a query target and a compound relation;
inquiring a geographic information map in a map database based on the inquiring target and the compound relation to obtain all the atomic relation sets, and searching a target entity set through the entity in the atomic relation set;
if the geographic information map does not meet the query condition result, judging whether the target entity set is empty, if so, continuing to search in the spatial database, otherwise, returning the result to the user.
2. The method of claim 1, wherein obtaining the aligned entity pair comprises:
collecting multisource geographic entity data;
partitioning all the geographic entity data by a space index method to obtain candidate entity pairs;
screening the candidate entity pairs to obtain aligned entity pairs.
3. The method of claim 2, wherein collecting multi-source geographic entity data comprises:
preprocessing the collected original geographic entity data, wherein the preprocessing comprises data cleaning and unifying a coordinate format.
4. The method of claim 1, wherein searching in the spatial database comprises:
inquiring the analyzed inquiry targets and the composite relations to obtain an atomic entity set and an atomic relation set;
and carrying out combined space calculation on the entity set relation set by utilizing a space function provided by the space database, thereby obtaining a target entity set.
5. The method of claim 1, wherein obtaining the query result further comprises: and updating the current query result into the historical query database.
6. A query system for a multi-source geographic entity, comprising:
the acquisition module is used for acquiring the aligned entity pairs;
the fusion module is used for carrying out entity pair fusion on the aligned entity pairs;
the calculation module is used for calculating the spatial relationship among all geographic entities by using a spatial calculation technology for an entity pair data set formed by all the aligned and fused entity pairs to obtain a geographic information map;
the query module is used for realizing compound condition query of the geographic information based on the geographic information map and the spatial database; the method for realizing the compound condition query of the geographic information based on the geographic information map comprises the following steps:
based on the query rule, performing query by using a mode of mixed query to obtain a query result;
query using the schema of the hybrid query includes:
judging whether the current query request is in a historical query database, if so, calling a result in the historical query database, otherwise, analyzing the query request to obtain a query target and a compound relation;
inquiring a geographic information map in a map database based on the inquiring target and the compound relation to obtain all the atomic relation sets, and searching a target entity set through the entity in the atomic relation set;
if the geographic information map does not meet the query condition result, judging whether the target entity set is empty, if so, continuing to search in the spatial database, otherwise, returning the result to the user.
7. The system of claim 6, wherein the acquisition module comprises:
the acquisition module is used for acquiring multi-source geographic entity data;
the partition module is used for partitioning all the geographic entity data by a space index method to obtain candidate entity pairs;
and the screening module is used for screening the candidate entity pairs to obtain aligned entity pairs.
8. The system of claim 7, further comprising a data storage module, the data storage module comprising: the map database is used for storing geographic information maps, wherein the geographic information maps comprise geographic entities and spatial relations among the geographic entities;
the spatial database stores all geographic entity data.
CN202211223877.8A 2022-10-09 2022-10-09 Query method and query system for multi-source geographic entity Active CN115329221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211223877.8A CN115329221B (en) 2022-10-09 2022-10-09 Query method and query system for multi-source geographic entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211223877.8A CN115329221B (en) 2022-10-09 2022-10-09 Query method and query system for multi-source geographic entity

Publications (2)

Publication Number Publication Date
CN115329221A CN115329221A (en) 2022-11-11
CN115329221B true CN115329221B (en) 2023-08-01

Family

ID=83914003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211223877.8A Active CN115329221B (en) 2022-10-09 2022-10-09 Query method and query system for multi-source geographic entity

Country Status (1)

Country Link
CN (1) CN115329221B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356088A1 (en) * 2014-06-06 2015-12-10 Microsoft Corporation Tile-based geocoder
CN112699195B (en) * 2020-12-29 2023-12-19 广州城市信息研究所有限公司 Geospatial data processing method, device, computer equipment and storage medium
CN113065000B (en) * 2021-03-29 2021-10-22 泰瑞数创科技(北京)有限公司 Multisource heterogeneous data fusion method based on geographic entity
CN113177058B (en) * 2021-05-11 2023-10-13 北京邮电大学 Geographic position information retrieval method and system based on composite condition
CN114297336A (en) * 2021-09-24 2022-04-08 北京大学 Chart linkage knowledge graph query system and method
CN114218400A (en) * 2021-12-13 2022-03-22 上海交通大学 Semantic-based data lake query system and method

Also Published As

Publication number Publication date
CN115329221A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN102395965B (en) Method for searching objects in a database
USRE44876E1 (en) Proximity search methods using tiles to represent geographical zones
US7046827B2 (en) Adapting point geometry for storing address density
US6816779B2 (en) Programmatically computing street intersections using street geometry
CN113434623B (en) Fusion method based on multi-source heterogeneous space planning data
US20150356088A1 (en) Tile-based geocoder
CN111353106B (en) Recommendation method and device, electronic equipment and storage medium
US10176244B2 (en) Text characterization of trajectories
CN101350013A (en) Method and system for searching geographical information
CN108717407A (en) Entity vector determines method and device, information retrieval method and device
Smart et al. Multi-source toponym data integration and mediation for a meta-gazetteer service
US6658356B2 (en) Programmatically deriving street geometry from address data
CN103279560A (en) Continuous keyword query method based on security region
Zhang et al. An improved probabilistic relaxation method for matching multi-scale road networks
Cheng et al. Quickly locating POIs in large datasets from descriptions based on improved address matching and compact qualitative representations
Cai et al. Research on multi-source POI data fusion based on ontology and clustering algorithms
CN114201480A (en) Multi-source POI fusion method and device based on NLP technology and readable storage medium
CN116662583B (en) Text generation method, place retrieval method and related devices
CN111325235B (en) Multilingual-oriented universal place name semantic similarity calculation method and application thereof
CN115329221B (en) Query method and query system for multi-source geographic entity
Yang et al. Point‐of‐interest detection from Weibo data for map updating
Huang et al. A spatial indexing approach for high performance location based services
Choi et al. Developing an alias management method based on word similarity measurement for POI application
CN115329029B (en) Mobile terminal-oriented complex condition geographic information query method, device and medium
CN117271577B (en) Keyword retrieval method based on intelligent analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant