CN102156726A - Geographic element querying and extending method based on semantic similarity - Google Patents

Geographic element querying and extending method based on semantic similarity Download PDF

Info

Publication number
CN102156726A
CN102156726A CN2011100815051A CN201110081505A CN102156726A CN 102156726 A CN102156726 A CN 102156726A CN 2011100815051 A CN2011100815051 A CN 2011100815051A CN 201110081505 A CN201110081505 A CN 201110081505A CN 102156726 A CN102156726 A CN 102156726A
Authority
CN
China
Prior art keywords
similarity
depth
node
notion
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100815051A
Other languages
Chinese (zh)
Other versions
CN102156726B (en
Inventor
张福浩
张东
刘纪平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Academy of Surveying and Mapping
Original Assignee
Chinese Academy of Surveying and Mapping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Academy of Surveying and Mapping filed Critical Chinese Academy of Surveying and Mapping
Priority to CN 201110081505 priority Critical patent/CN102156726B/en
Publication of CN102156726A publication Critical patent/CN102156726A/en
Application granted granted Critical
Publication of CN102156726B publication Critical patent/CN102156726B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a geographic element querying and extending method based on semantic similarity, wherein the concept similarity is calculated through the shortest distance between two concept nodes of a geographic ontology while considering node depth, node density impact factor, common attribute proportion and public ancestors node density and other impact factors. In addition, the characteristics for differing geographic ontology from common domain ontology, namely the spatial relationships among the geographical concepts, are considered fully, for example, position relation, topological relation and metric relation and etc.; the characteristics, as important impact factors, are provided with different weights according to the different spatial relationships among the concepts, so that the impacts of spatial relationships to the semantic similarity are quantized. The geographic element querying and extending method based on semantic similarity provided by the invention has the advantages that the geographic element querying and extending purpose based on the semantic similarity of geographic ontology is achieved while fully considering and selecting the various of impact factors, the information searched by the users can be quickly and accurately obtained, the querying method based on simple morphology matching is changed, and the efficiency of querying is improved.

Description

Geographic element enquiry expanding method based on semantic similarity
Technical field
The invention belongs to geospatial information system field, relate in particular to and a kind ofly carry out the geographic element enquiry expanding method based on geographical body semantic similarity.
Background technology
" mankind are flooded by data, but the mankind are poor in knowledge." along with the develop rapidly of computing machine and Internet technology; the quantity of information sharp increase on the network; in the face of huge information resources; people feel at a loss as to what to do; get lost in this huge Information Network of Internet, " information overload " and " information is got lost " become people and talked about one of maximum topic.In the face of vast as the open sea information, a hot issue of areas of information technology is exactly how to help the user to retrieve in high quality to obtain real Useful Information.But at present practical information retrieval system mainly is based on manual sort's catalogue and keyword coupling.The former is owing to need manual operations; the efficient and the degree of depth to the announcement of magnanimity information resource are all very limited; the latter only is confined on the aspect of character style coupling; announcement to the semanteme of information, pragmatic is very limited; the existence of isomery information such as polysemant, synonym simultaneously makes that the defective of this morphology matching way is obvious day by day, and therefore many searching systems the situation that retrieval is incomplete, give an irrelevant answer occurs through regular meeting.
And,, brought many inconvenience for sharing of spatial information with interoperability owing to have distributivity, diversity, isomerism for the spatial information that accounts for human information about 80%.Simultaneously, be limited to the development and the development of correlation technique, the supplier of space information resource only rests on the case shell of data the description of data at present, lack necessary association between the data, cause computing machine can't understand the intension of data, be difficult to find exactly the data of meeting consumers' demand, thereby directly have influence on speed and the efficient that the user obtains information.
And in traditional geographic information retrieval mode, following problem is obvious day by day:
(1) diversity of the geospatial information form of expression.Traditional geospatial information often occurs with the form of place name or spatial data, and this is that majority approve and know.But the geospatial information that this two class is common shows different because of the difference of country, culture, bearing medium.Along with the mankind to deepening continuously of geographical information world understanding and updating of geospatial information solution, the form of expression of geospatial information is moved towards variation gradually, this has brought many challenges for the retrieval of geospatial information.
(2) ambiguity of geospatial information description.This mainly shows two aspects.On the one hand be that used when describing geospatial information can't pinpoint title.Be that description to geospatial information exists ambiguity on the other hand.Identical or place name is identical with the orientation name as the place name appellation of part different regions, these titles very easily cause when traditional morphology matching way information retrieval to be obscured, and causes obstacle for the retrieval of geospatial information.
(3) the conventional information retrieval mainly realizes that in the mode of keyword morphology coupling the existence of isomery information such as polysemant, synonym makes that the defective of this morphology matching way is obvious day by day.This defective has been inherited in the retrieval of the geographical space attributive character information that develops on conventional information retrieval basis and come equally.
In addition, the user is when the geography information that inquiry needs, often do not understand title and the storage organization of data in database, people are to the different modes of world's cognition in addition, cause to lay particular emphasis on the different side of object to the observation description of same geographical phenomenon, thereby the difference on the generation viewpoint, and form semantic isomery.Promptly owing to there is semantic heterogeneity between notion in the user's cognition and the notion world and the notion in the system model, the data that user inquiring comes out often are not that they really need.This can carry out similar coupling to the query statement that the user expresses with regard to requiring system, and the information that satisfies condition is returned to the user, therefore can utilize geographical body to set up intelligent geographical INFORMATION DISCOVERY and inquiry system and service system of gis.
So-called geographical body is exactly, the knowledge in relevant geographical science field, information and data abstraction are become by the object that has common recognition one by one (or entity), and the system of forming according to certain relation, carry out generalities simultaneously and handle and clear and definite definition, at last theory and the method for expressing with formalization.
The calculation of similarity degree method has three kinds of approach at present: the one, and the method for utilizing corpus to add up according to the frequency that two notions occur at context, is calculated semantic similarity between notion; The 2nd, utilize the tree-shaped hierarchical system structure of synonym or the adopted former composition of semantic dictionary such as WorNet, HowNet etc., by calculating two information entropy or semantic distances between the notion, calculate the semantic distance between notion; The 3rd, utilize in the body semantic distance between two concept nodes to calculate the similarity of notion.
Method based on corpus is more objective, concentrated expression similarity and the difference of word at aspects such as sentence structure, semanteme, pragmatics.But this method relatively depends on the used corpus of training, and calculated amount is big, and the computing method complexity in addition, is subjected to data sparse bigger with interference data noise, occurs manifest error sometimes.Method based on dictionary is simply effective, also relatively more directly perceived, easy to understand, but that the result that this method obtains is influenced by people's subjective consciousness is bigger, can not accurately reflect objective fact sometimes.In addition, this method has reflected the similarity and the difference of semantic aspect between the word more exactly, and considers fewerly for sentence structure between the word and pragmatic Features.Though the 3rd class methods take in factors such as the node degree of depth that influences the notion semantic similarity, node density, attribute informations, but only consider basic attribute information, and ignored this factor of influence of spatial relationship between geographic concepts, therefore the factor that in this geographical exactly body semantic similarity is had the greatest impact is being difficult to be applied aspect the calculating geographic element similarity.
The present invention propose based on the query expansion algorithm of the geographic concepts of semantic similarity on the basis of the 3rd class methods, consider fully and accept or reject influencing the factor that semantic similarity calculates, emphasis is considered the influence of spatial relationship to the geographic element semantic similarity, finally obtain comparatively ideal result, well solved semantic query and expanded this problem.
Summary of the invention
The present invention is analyzing, is estimating on the existing similarity algorithm model based, take into full account and choose various factors of influence, realization is based on the geographic element query expansion of geographical body semantic similarity model, can obtain the information that the user searches fast and accurately, change the inquiry mode of simple morphology coupling, improved the efficient of inquiry.
Thus, the invention provides a kind of similarity acquisition methods, comprise the steps: based on geographical body semanteme
(1) from Geographic Information System, extracts basic geography information, therefrom obtain each required notion of geographical body and the relation between each notion described;
(2) relation between described each notion and each notion is described with network topology structure, uses different nodes to describe different concepts, the line between the use node is described the relation between the notion;
(3) according to spatial relationships different between concept node, the line between the concept node is carried out assignment, give different weight coefficients to every kind of different lines that concern;
(4) search shortest path between two concept nodes to be analyzed, obtain semantic distance between two concept nodes according to the different weight calculation that respectively concern line on the described shortest path;
(5) search the common public ancestor node of two concept nodes to be analyzed, calculate and obtain the two concept nodes to be analyzed degree of depth to public ancestor node;
(6) calculate the public ancestor node density that obtains two concept nodes to be analyzed;
(7) calculate the total attribute ratio (mainly being spatial relationship) that obtains two concept nodes to be analyzed;
(8) obtain with the similarity between two notions of analysis to the degree of depth of public ancestor node, the public ancestor node density of two concept nodes, the total property calculation of two concept nodes by described step (4) to semantic distance, two concept nodes between middle two concept nodes that obtain of step (7).
In technique scheme, in described step (8), further comprise respectively the semantic distance between two concept nodes, two concept nodes are provided with the adjusting parameter to the degree of depth of public ancestor node, the public ancestor node density of two concept nodes, the total attribute of two concept nodes.
In technique scheme, calculating formula of similarity is as follows:
Sim ( X , Y ) = ( a dis tan t ( X , Y ) + a ) A * ( depth ( X ) + depth ( Y ) | depth ( X ) - depth ( Y ) | ) B *
( parents ( X , Y ) count ) C * ( count ( Attri ( X ) IAttri ( Y ) ) count ( Attri ( X ) YAttri ( Y ) ) ) D
Wherein, X, Y represent to carry out two notions that similarity is calculated;
Figure BSA00000464627000053
The expression semantic distance is a customized parameter to a in the influence of similarity, the formula;
Figure BSA00000464627000054
The expression directed edge degree of depth is to the influence of similarity;
Figure BSA00000464627000055
Expression common ancestor number is to the influence of similarity;
Figure BSA00000464627000056
Expression concept node attribute is to the influence of similarity;
Wherein, dis tan t (X Y) is semantic distance between X, the Y,
Depth (X), depth (Y) are respectively the directed edge degree of depth of X, Y;
(the count representative comprises the total number of node in X, the Y node branch to parents for X, Y) public ancestors' number of expression X, Y.
The number of attribute in the attribute intersection of number, X and the Y of attribute during the attribute that count (Attri (X) I Attri (Y)), count (Attri (X) Y Attri (Y)) are respectively X and Y occurs simultaneously.
A, B, C, D be for regulating parameter, is respectively semantic distance, the directed edge degree of depth, common ancestor's number, this four part of the concept node attribute factor of influence to similarity, and, regulate parameter A, B, C, D and satisfy following relation:
A+B+C+D=1
By A, B, C, the above-mentioned semantic distance of D scalable, the directed edge degree of depth, common ancestor's number, this four part of concept node attribute weight size to the similarity influence.
The invention provides a kind of geographic element enquiry expanding method that uses above-mentioned similarity acquisition methods based on geographical body semanteme, it is characterized in that: except that the notion that inquiring user is directly imported, also inquire about related notion, described related notion and described user directly the notion of input by the described similarity that is calculated based on the similarity acquisition methods of geographical body semanteme greater than the threshold value that sets.
In technique scheme, the Query Result of described related notion sorts according to the similarity that calculates, and the Query Result ordering of the related notion that similarity is high is forward.
The present invention has obtained following technique effect:
(1) can obtain the information that the user wants quickly and efficiently.By the query expansion on the semantic hierarchies, improved the semantic relevance between the searching keyword, make Query Result accurate more, relevant.
(2) simple and practical.Do not rely on extensive corpus, dependence reduces.
(3) promote particularly further developing of geographic element search technique of search engine technique.By analyzing the deficiency of existing algorithm, model, make corresponding improvement, and be applied in the reality, make that existing technology can be more perfect, the requirement of realistic application.
Description of drawings
Fig. 1 is the geographical main body structure figure that obtains;
Fig. 2 is the semantic distance exemplary plot.
Embodiment
Understand and enforcement the present invention for the ease of those of ordinary skills, the present invention is described in further detail below in conjunction with the drawings and the specific embodiments.
The present invention is on the basis of the deficiency of fully summing up above-mentioned existing enquiry expanding method existence, utilize semantic association relation clear and definite between the hierarchical structure of geographical body and the Ontological concept, proposed a kind of geographic element enquiry expanding method based on geographical body semantic similarity model.This method is on the basis based on semantic distance similarity computation model, take into full account semantic relations different in the degree of depth of node in the body layer aggregated(particle) structure, the density of node, the geographical body and (mainly consider the topological relation and the position relation of geographic element, as In, Disjoint, Touch, Overlap, Equal etc.), internodal attribute and public ancestor node density is to the factors of influence such as influence degree of child node, realization is based on the geographic element query expansion function of geographical body semantic similarity, and solved following technical matters:
(1) solves semantic isomery inquiry expansion problem
(2) complexity of reduction query expansion
(3) break away from the query expansion dependence to extensive corpus
(4) make query expansion result accurate and effective more
By solve above technical matters the technical scheme that adopts of reality as follows:
Step 1: obtain the semantic relation between each notion and notion in the geographical body
From existing geo-spatial data storehouse, extract basic geography information, therefrom obtain the semantic relation (mainly being spatial relationship) between required notion of geographical body and the notion, mainly comprise place name, administrative division, traffic, landforms, water system etc., and the ownership between them, adjacency, pass through, information such as position relation and tolerance.
In geo-spatial data storehouse, storing the basic geography information such as administrative division, landforms, water system, settlement place, traffic, place name of multiple engineer's scale based on Geographic Information System (GIS), have higher domain correlation degree and rich data information, can guarantee the accuracy of geographical body relationship expression to a great extent.And have certain getting in touch between geo-spatial data storehouse model and the geographical ontology model.Contain a plurality of tables of data in the geo-spatial data storehouse, between table and the table existence one to one, the relation of one-to-many, multi-to-multi, each tables of data is made of a plurality of fields, the value of field is called record; Comprise a plurality of classes in geographical body, have certain relation between class and the class, each class all has a plurality of attributes, and the set of property value is exactly an example.And utilize the formalization corresponding relation that exists between geo-spatial data library data models and the geographical ontology model, by the definition mapping ruler, obtain basic geographical body.
The geographical main body structure that the present invention extracted as shown in Figure 1, extracting method is as follows:
1), extracts the spatial relationship between geographic element and the geographic element according to the data message in the geo-spatial data storehouse.
The concrete grammar that extracts spatial relationship is as follows:
(I) extraction of topological relation
1. put the topological relation between key element and other key element:
Obtain topological relation between them by the distance between calculation level key element and other key elements.
2. the spatial topotaxy between line feature:
If two line features have same node point, then they have the relation in abutting connection with (touch), otherwise judge their topological relation by calculating distance between two line features.
3. the spatial topotaxy between line feature and face key element:
Two end points and face factor-factor relationship by line feature are determined, when two-end-point all is positioned at the face key element, have ownership (in) relation between line feature and face key element; If have only an end points to be positioned at the face key element, then there are overlapping (overlap) relation in line feature and face key element; Otherwise, non-intersect (disjoint) relation between line feature and face key element.
4. the spatial topotaxy between the face key element:
Spatial topotaxy between the face key element can be judged by the relation between the boundary line that constitutes face.When one and the pass of the boundary line of key element and another side key element is an ownership (in), then there is ownership (in) relation between the face key element; The boundary line of key element exists overlapping (overlap) when concerning face to face, and decidable face factor-factor relationship is overlapping (overlap); When the boundary line of key element is identical, there is (equal) relation that is equal between the face key element face to face; In the time of will have the public boundary line face to face, there is relation between the face key element in abutting connection with (Touch); Otherwise the face factor-factor relationship is non-intersect (disjoint).
(II) extraction of position relation
Defined all directions is to dividing direction in space in cone-shaped model to use for reference all directions, and each direction correspondence is angle value interval accordingly; In the geo-spatial data storehouse, polygon is that elementary cell is stored with the segmental arc, and every segmental arc has and only have two polygon institutes public.Initial and end node coordinate according to segmental arc on the polygon can be in the hope of the inclined angle alpha of the corresponding string of this segmental arc; Define α and institute to such an extent that the position angle, space compares, thereby judge the corresponding direction in space of this segmental arc, and then the position relation between the polygon that obtains constituting by segmental arc.
To use in the geo-spatial data storehouse based on the geographical entity data-switching of the described complexity beyond expression of words of data model of topological relation by above-mentioned steps in order to be easy to geographical space concept model data according to the based target of people's mode of thinking understanding.
2) according to geographic element of above extracting and the relation between the geographic element, be relation between notion and notion by GIS concept of space model conversion, set up mapping ruler from the geo-spatial data storehouse to geographical body.
(I) tables of data in the geo-spatial data storehouse is analyzed, table name is drafted be the related notion of application, select the geospatial information of wherein critical notion with the expression application of simplifying as far as possible.Can determine their hierarchical relationship between the notion with reference to basic geographical information classification, at last the notion of selecting is mapped as the class or the subclass of geographical body.
Between different system, carry out exchanges data for convenient, use general network ontology language OWL (Web Ontology Language) that related notion is described.
1. being described below of tables of data T:
<owl:Class?rdf:ID=TName/>;
" class " that its expression uses resource description framework (RDF) to describe expresses notion, and its ID is TName.
2. a plurality of tables of data T1, T2, and between the relation, as exist " equivalence " relation, be described below:
Figure BSA00000464627000091
Figure BSA00000464627000101
Its expression T1Name and T2Name equivalence.
(II) be the attribute of geographical Ontological concept with the description geodata attribute information stored in the geographical information data table in basis and the field mappings of spatial information:
1. the field F that comprises attribute information among the tables of data T is mapped as the data type attribute of geographical body:
Figure BSA00000464627000102
2. the field F that comprises spatial information among the tables of data T is mapped as the object type attribute of geographical body:
3. two tables of data T1, T2 that are associated by foreign key field F are mapped as field F the object type attribute of geographical body:
After finish to the mode switch of geographical body in the geo-spatial data storehouse, convert the geodata in geo-spatial data storehouse to the corresponding geographical instances of ontology, promptly finish data-switching.
Mapping ruler is:
1. each tuple in the geo-spatial data storehouse table is mapped as an example of geographical body, and the major key of choosing tuple is as the instances of ontology name.
2. the attribute data of tuple in the table of geo-spatial data storehouse is mapped as the data type property value of instances of ontology.
3. the spatial data with tuple in the table of geo-spatial data storehouse is mapped as geographical instances of ontology corresponding object type attribute value.
By above method, can obtain each conception of species and the relation of the semantic hierarchies between notion in the basic geographical body.
Step 2: be the various semantic distances that concern assignment and calculate two notions in the body layer aggregated(particle) structure in the geographical body
In geographical noumenon hierarchical structure,, give different weights to the annexation between geographical Ontological concept at first according to different relations; Calculate the semantic distance of two notions in the geographical body layer aggregated(particle) structure then, promptly check the shortest line distance between two notions, then according to the semantic distance of the different weight calculation point-to-point transmissions of every line.
Have certain semantic relation between the notion in the geographical body, as shown in Figure 2 example: " China " is the different address of same title with " china ", is " Equal " relation between the two; Beijing, Shanghai, Shandong belong to China, have " In " relation between they and the China, and they three are mutually disjointed, and is " Disjoint " relation; The Miyun Reservoir Miyun County of flowing through is " Overlap " relation with Miyun County.According to different corresponding relation between each notion in the geographical body, for every kind of relation is given weight, concern that as equal weight is 1, In concerns that weight is 4/5, and overlap concerns that weight is 3/4, and Disjoint concerns that weight is 1/6 etc.Thereby can calculate the semantic distance of point-to-point transmission then according to the different weights of every line by searching the shortest line distance between two notions.
The computing method of semantic distance are as follows, at first find the shortest line distance between two notions in the geographical body, as the bee-line between " China " and " Miyun Reservoir " is " China-Beijing-Miyun-Miyun Reservoir ", if do not consider the weight on each limit, semantic distance is 1+1+1=3; And, obtain semantic distance and be: 4/5+4/5+3/4=47/20 according to the weight of every limit correspondence.When not considering each limit weight, " Shanghai " with the semantic distance of " Miyun Reservoir " also is: 1+1+1=3, and promptly " China " semantic distance of arriving " Miyun Reservoir " and " Shanghai " semantic distance that arrives " Miyun Reservoir " is identical; When considering each limit weight, " Shanghai " with the semantic distance of " Miyun Reservoir " is: 1/6+4/5+3/4=103/60, and promptly " China " semantic distance of arriving " Miyun Reservoir " and " Shanghai " semantic distance that arrives " Miyun Reservoir " is different; Obviously consider that as can be known the algorithm of each limit weight meets general user's cognition.
Step 3: public ancestor node density and attribute according to two nodes in the body are adjusted weight
With the public ancestor node density of two nodes in the body as a regulatory factor.Because the common ancestor that two notions in the body hierarchical network have is many more similar more; When calculating between notion semantic similarity, except semantic distance, public ancestors' density and total attribute between node, it also is the key factor that influences semantic similarity, promptly the common ancestor that has of two notions is many more, and two notions are also just similar more, and the public attribute that has between two notions is many more, the relation that two nodes so also are described is also just close more, and two notions are similar more.Therefore public ancestor node and same node point attribute are carried out computing as a part that influences similarity.
Step 4: regulate parameter, result of calculation is regulated for given one
Because in the similarity calculation process, the weight of various semantic relations all is to carry out assignment according to experience, and the possibility of result and the actual conditions that obtain are not inconsistent.Therefore, we need set a regulated variable, result calculated is regulated, thus the ideal value of obtaining.A regulatory factor during the concept node attribute at directed edge two ends among Fig. 2 calculated as semanteme, because in the domain body hierarchical network structure, if the same alike result that notion had of the node correspondence at directed edge two ends is many more, the relation that two nodes are described so is also just close more, and the weight of the directed edge that is made of them also just should be big more; According to the analysis-by-synthesis of above factor to semantic similarity influence, on the basis of existing calculating formula of similarity, derive the formula that similarity is calculated that gives based on body.
Wherein calculating formula of similarity is as follows:
Sim ( X , Y ) = ( a dis tan t ( X , Y ) + a ) A * ( depth ( X ) + depth ( Y ) | depth ( X ) - depth ( Y ) | ) B *
( parents ( X , Y ) count ) C * ( count ( Attri ( X ) IAttri ( Y ) ) count ( Attri ( X ) YAttri ( Y ) ) ) D
Wherein, X, Y represent to carry out two notions that similarity is calculated;
The expression semantic distance is a customized parameter to a in the influence of similarity, the formula;
Figure BSA00000464627000134
The expression directed edge degree of depth is to the influence of similarity;
Figure BSA00000464627000135
Expression common ancestor number is to the influence of similarity;
Figure BSA00000464627000136
Expression concept node attribute is to the influence of similarity.
Wherein, dis tant (X Y) is semantic distance between X, the Y,
Figure BSA00000464627000137
Weight iBe the weight of i bar line between X, Y, n represents on the shortest path between X, Y n bar line is arranged in the formula.
Wherein, depth (X), depth (Y) are respectively the directed edge degree of depth of X, Y;
Figure BSA00000464627000141
The n limit number that to be nodes X comprised in the shortest path of the nearest father node of nodes X, Y in the formula;
In like manner,
Figure BSA00000464627000142
The n limit number that to be nodes X comprised in the shortest path of the nearest father node of nodes X, Y in the formula.
Wherein, (the count representative comprises the total number of node in X, the Y node branch to parents for X, Y) public ancestors' number of expression X, Y.
Wherein, Attri (X), Attri (Y) are respectively the property set of X, Y;
The number of attribute in the attribute intersection of number, X and the Y of attribute during the attribute that count (Attri (X) IAttri (Y)), count (Attri (X) YAttri (Y)) are respectively X and Y occurs simultaneously.
A, B, C, D be for regulating parameter, is respectively semantic distance, the directed edge degree of depth, common ancestor's number, this four part of the concept node attribute factor of influence to similarity, by A, B, C, above-mentioned four parts of the D scalable weight size to the similarity influence.And adjusting parameter A, B, C, D satisfy following relation:
A+B+C+D=1
By using above-mentioned formula to calculate, just can obtain the similarity between the different geographical Ontological concepts.Like this when the user need inquire about a certain notion X, except that the information of returning notion X, also notion X is expanded, promptly (for example calculate notion X and the notion Y adjacent or close with notion X by above-mentioned formula, can travel through line number on the path with the bee-line of notion X less than the notion of n) similarity, if similarity greater than preset threshold, is then also returned simultaneously the information of notion Y.Also can sort to a plurality of results that return according to the similarity data that calculate simultaneously, the result that similarity is the highest comes the foremost, selects to make things convenient for the user.
The result of use of the present invention in actual geographic query searching system is, when the user in search interface during input inquiry key word " Miyun Reservoir ", realize geographic concepts query expansion effect according to semantic similarity algorithm above, in Search Results, list the information that five large watersheds, Caobai River basin, basin, Ji canal, basin, the Yongdinghe River etc. are associated with Miyun Reservoir, and obtained the semantic similarity of itself and Miyun Reservoir; Obtain the geographic element relevant (being geographical Ontological concept) by calculating formula of similarity: Luanping, Changping District, Pinggu District, Huairou District with Miyun Reservoir.Expand the scope that the user searches greatly, improved user's recall ratio.
Above embodiment only is one embodiment of the present invention; it describes comparatively concrete and detailed; but can not therefore be interpreted as restriction to claim of the present invention; for the person of ordinary skill of the art; without departing from the inventive concept of the premise; can also make some distortion and improvement, these all belong to protection scope of the present invention.

Claims (5)

1. the similarity acquisition methods based on geographical body semanteme is characterized in that: comprise the steps:
(1) extract basic geography information from geographic information database, therefrom obtain and describe each required notion of geographical body and the spatial relationship between each notion, described spatial relationship comprises position relation, topological relation and metric relation;
(2) spatial relationship between described each notion and each notion is described with network topology structure, uses different nodes to describe different concepts, the line between the use node is described the relation between the notion;
(3) according to spatial relationships different between concept node, the line between the concept node is carried out assignment, give different weight coefficients to every kind of different lines that concern;
(4) search shortest path between two concept nodes to be analyzed, obtain semantic distance between two concept nodes according to the different weight calculation that respectively concern line on the described shortest path;
(5) search the common public ancestor node of two concept nodes to be analyzed, calculate and obtain the two concept nodes to be analyzed degree of depth to public ancestor node;
(6) calculate the public ancestor node density that obtains two concept nodes to be analyzed;
(7) calculate total the attribute ratio, particularly space attribute that obtains two concept nodes to be analyzed;
(8) semantic distance between two concept nodes that obtain to the step (7) by described step (4), two concept nodes obtain similarity between two notions to be analyzed to the total property calculation of the public ancestor node density of the degree of depth of public ancestor node, two concept nodes, two concept nodes.
2. the similarity acquisition methods based on geographical body semanteme described in claim 1 is characterized in that: further comprise respectively the semantic distance between two concept nodes, two concept nodes are provided with the adjusting parameter to the degree of depth of public ancestor node, the public ancestor node density of two concept nodes, the total attribute of two concept nodes in described step (8).
3. the similarity acquisition methods based on geographical body semanteme described in claim 2 is characterized in that: use following formula to calculate similarity in described step (8):
Sim ( X , Y ) = ( a dis tan t ( X , Y ) + a ) A * ( depth ( X ) + depth ( Y ) | depth ( X ) - depth ( Y ) | ) B * ( parents ( X , Y ) count ) C
* ( count ( Attri ( X ) IAttri ( Y ) ) count ( Attri ( X ) YAttri ( Y ) ) ) D
Wherein, X, Y represent to carry out two notions that similarity is calculated;
The expression semantic distance is a customized parameter to a in the influence of similarity, the formula;
Figure FSA00000464626900023
The expression directed edge degree of depth is to the influence of similarity;
Figure FSA00000464626900024
Expression common ancestor number is to the influence of similarity;
Figure FSA00000464626900025
Expression concept node attribute is to the influence of similarity;
Wherein, dis tant (X Y) is semantic distance between X, the Y,
Depth (X), depth (Y) are respectively the directed edge degree of depth of X, Y;
(the count representative comprises the total number of node in X, the Y node branch to parents for X, Y) public ancestors' number of expression X, Y;
The number of attribute in the attribute intersection of number, X and the Y of attribute during the attribute that count (Attri (X) IAttri (Y)), count (Attri (X) YAttri (Y)) are respectively X and Y occurs simultaneously;
A, B, C, D be for regulating parameter, is respectively semantic distance, the directed edge degree of depth, common ancestor's number, this four part of the concept node attribute factor of influence to similarity, and, regulate parameter A, B, C, D and satisfy following relation:
A+B+C+D=1
By A, B, C, the above-mentioned semantic distance of D scalable, the directed edge degree of depth, common ancestor's number, this four part of concept node attribute weight size to the similarity influence.
4. geographic element enquiry expanding method that uses among the claim 1-4 each described similarity acquisition methods based on geographical body semanteme, it is characterized in that: except that the notion that inquiring user is directly imported, expanding query related notion also, described related notion and described user directly the notion of input by the described similarity that is calculated based on the similarity acquisition methods of geographical body semanteme greater than the threshold value that sets.
5. the geographic element enquiry expanding method described in claim 4 is characterized in that: the Query Result of described related notion sorts according to the similarity that calculates, and the Query Result ordering of the related notion that similarity is high is forward.
CN 201110081505 2011-04-01 2011-04-01 Geographic element querying and extending method based on semantic similarity Expired - Fee Related CN102156726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110081505 CN102156726B (en) 2011-04-01 2011-04-01 Geographic element querying and extending method based on semantic similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110081505 CN102156726B (en) 2011-04-01 2011-04-01 Geographic element querying and extending method based on semantic similarity

Publications (2)

Publication Number Publication Date
CN102156726A true CN102156726A (en) 2011-08-17
CN102156726B CN102156726B (en) 2013-12-25

Family

ID=44438225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110081505 Expired - Fee Related CN102156726B (en) 2011-04-01 2011-04-01 Geographic element querying and extending method based on semantic similarity

Country Status (1)

Country Link
CN (1) CN102156726B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106264A (en) * 2013-01-29 2013-05-15 河南理工大学 Matching method and matching device of place names
CN103164498A (en) * 2011-12-15 2013-06-19 财团法人工业技术研究院 Geographical position presenting system, method, recording medium, and computer program product
CN103309979A (en) * 2013-06-15 2013-09-18 昆明能讯科技有限责任公司 Knowledge cube model algorithm based on graph theory
CN104008169A (en) * 2014-05-30 2014-08-27 中国测绘科学研究院 Semanteme based geographical label content safe checking method and device
CN104239546A (en) * 2014-09-23 2014-12-24 北京理工大学 Method for calculating similarity between concepts in ontology based on field theory
CN104536957A (en) * 2014-09-26 2015-04-22 北京师范大学 Retrieval method and system for rural land circulation information
CN104636908A (en) * 2015-01-30 2015-05-20 北京完美和声信息技术有限公司 Information interaction platform and information interaction method
CN104765828A (en) * 2015-04-13 2015-07-08 天脉聚源(北京)传媒科技有限公司 Dictionary data sheet generating method and device and dictionary data sheet application method and device
CN104915396A (en) * 2015-05-28 2015-09-16 杭州电子科技大学 Knowledge retrieving method
CN105808689A (en) * 2016-03-03 2016-07-27 中国地质大学(武汉) Drainage system entity semantic similarity measurement method based on artificial neural network
CN106294583A (en) * 2016-07-28 2017-01-04 南京邮电大学 A kind of OGC geographic information services semantic similarity calculation method
CN103761283B (en) * 2014-01-10 2017-01-25 中国测绘科学研究院 Extension method of geographic information processing service
CN106610947A (en) * 2016-08-25 2017-05-03 四川用联信息技术有限公司 New Chinese automatic word segmentation algorithm
CN106610936A (en) * 2016-09-12 2017-05-03 四川用联信息技术有限公司 Improved automatic Chinese word segmentation algorithm
CN106897378A (en) * 2017-01-20 2017-06-27 浙江大学 A kind of semantic parameter search dynamic regulating method towards three-dimensional CAD model
CN107515902A (en) * 2017-07-26 2017-12-26 苏州希格玛科技有限公司 A kind of isomeric data distributed storage method based on semantic tagger
CN110659350A (en) * 2019-09-24 2020-01-07 吉林大学 Semantic search system and search method based on domain ontology
WO2020259054A1 (en) * 2019-06-28 2020-12-30 京东数字科技控股有限公司 Associated account analysis method and apparatus, and computer-readable storage medium
CN112487260A (en) * 2020-12-07 2021-03-12 上海市研发公共服务平台管理中心 Instrument project declaration and review expert matching method, device, equipment and medium
CN114485684A (en) * 2021-12-31 2022-05-13 武汉中海庭数据技术有限公司 Map road matching method and system based on ground element topological relation
CN116663662A (en) * 2023-07-31 2023-08-29 航天宏图信息技术股份有限公司 Geographic entity multi-level relation construction method and device based on multi-source semantic network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633092A (en) * 2004-11-25 2005-06-29 武汉大学 Distributed GIS space information integration apparatus and method based on mobile Agent and GML
CN101286161A (en) * 2008-05-28 2008-10-15 华中科技大学 Intelligent Chinese request-answering system based on concept

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633092A (en) * 2004-11-25 2005-06-29 武汉大学 Distributed GIS space information integration apparatus and method based on mobile Agent and GML
CN101286161A (en) * 2008-05-28 2008-10-15 华中科技大学 Intelligent Chinese request-answering system based on concept

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164498A (en) * 2011-12-15 2013-06-19 财团法人工业技术研究院 Geographical position presenting system, method, recording medium, and computer program product
US9507866B2 (en) 2011-12-15 2016-11-29 Industrial Technology Research Institute Geographical location rendering system and method and computer readable recording medium
CN103164498B (en) * 2011-12-15 2016-02-03 财团法人工业技术研究院 Geographical position presenting system and method
CN103106264A (en) * 2013-01-29 2013-05-15 河南理工大学 Matching method and matching device of place names
CN103106264B (en) * 2013-01-29 2016-04-27 河南理工大学 A kind of place name matching process and device
CN103309979A (en) * 2013-06-15 2013-09-18 昆明能讯科技有限责任公司 Knowledge cube model algorithm based on graph theory
CN103761283B (en) * 2014-01-10 2017-01-25 中国测绘科学研究院 Extension method of geographic information processing service
CN104008169A (en) * 2014-05-30 2014-08-27 中国测绘科学研究院 Semanteme based geographical label content safe checking method and device
CN104008169B (en) * 2014-05-30 2017-02-22 中国测绘科学研究院 Semanteme based geographical label content safe checking method and device
CN104239546B (en) * 2014-09-23 2021-11-23 北京理工大学 Ontology concept similarity calculation method based on field theory
CN104239546A (en) * 2014-09-23 2014-12-24 北京理工大学 Method for calculating similarity between concepts in ontology based on field theory
CN104536957A (en) * 2014-09-26 2015-04-22 北京师范大学 Retrieval method and system for rural land circulation information
CN104536957B (en) * 2014-09-26 2017-11-24 北京师范大学 Agricultural land circulation information retrieval method and system
CN104636908A (en) * 2015-01-30 2015-05-20 北京完美和声信息技术有限公司 Information interaction platform and information interaction method
CN104765828A (en) * 2015-04-13 2015-07-08 天脉聚源(北京)传媒科技有限公司 Dictionary data sheet generating method and device and dictionary data sheet application method and device
CN104765828B (en) * 2015-04-13 2018-06-19 天脉聚源(北京)传媒科技有限公司 A kind of generation of dictionary data table and application process and device
CN104915396A (en) * 2015-05-28 2015-09-16 杭州电子科技大学 Knowledge retrieving method
CN105808689A (en) * 2016-03-03 2016-07-27 中国地质大学(武汉) Drainage system entity semantic similarity measurement method based on artificial neural network
CN106294583A (en) * 2016-07-28 2017-01-04 南京邮电大学 A kind of OGC geographic information services semantic similarity calculation method
CN106610947A (en) * 2016-08-25 2017-05-03 四川用联信息技术有限公司 New Chinese automatic word segmentation algorithm
CN106610936A (en) * 2016-09-12 2017-05-03 四川用联信息技术有限公司 Improved automatic Chinese word segmentation algorithm
CN106897378A (en) * 2017-01-20 2017-06-27 浙江大学 A kind of semantic parameter search dynamic regulating method towards three-dimensional CAD model
CN107515902A (en) * 2017-07-26 2017-12-26 苏州希格玛科技有限公司 A kind of isomeric data distributed storage method based on semantic tagger
WO2020259054A1 (en) * 2019-06-28 2020-12-30 京东数字科技控股有限公司 Associated account analysis method and apparatus, and computer-readable storage medium
CN110659350A (en) * 2019-09-24 2020-01-07 吉林大学 Semantic search system and search method based on domain ontology
CN112487260A (en) * 2020-12-07 2021-03-12 上海市研发公共服务平台管理中心 Instrument project declaration and review expert matching method, device, equipment and medium
CN114485684A (en) * 2021-12-31 2022-05-13 武汉中海庭数据技术有限公司 Map road matching method and system based on ground element topological relation
CN114485684B (en) * 2021-12-31 2023-11-28 武汉中海庭数据技术有限公司 Map road matching method and system based on ground element topological relation
CN116663662A (en) * 2023-07-31 2023-08-29 航天宏图信息技术股份有限公司 Geographic entity multi-level relation construction method and device based on multi-source semantic network
CN116663662B (en) * 2023-07-31 2023-10-20 航天宏图信息技术股份有限公司 Geographic entity multi-level relation construction method and device based on multi-source semantic network

Also Published As

Publication number Publication date
CN102156726B (en) 2013-12-25

Similar Documents

Publication Publication Date Title
CN102156726B (en) Geographic element querying and extending method based on semantic similarity
CN110413732A (en) The knowledge searching method of software-oriented defect knowledge
CN104239513B (en) A kind of semantic retrieving method of domain-oriented data
CN102087669B (en) Intelligent search engine system based on semantic association
CN104182454B (en) The integrated model of multi-source heterogeneous data semantic based on domain body structure and method
CN103473283B (en) Method for matching textual cases
CN102419778B (en) Information searching method for discovering and clustering sub-topics of query statement
CN107180045A (en) A kind of internet text contains the abstracting method of geographical entity relation
CN102737042B (en) Method and device for establishing question generation model, and question generation method and device
CN101777082A (en) Correlation method of text information and geological information and system
CN101350013A (en) Method and system for searching geographical information
CN107992608B (en) SPARQL query statement automatic generation method based on keyword context
CN102693316B (en) Linear generalization regression model based cross-media retrieval method
CN104268230A (en) Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk
CN105678590A (en) topN recommendation method for social network based on cloud model
CN105843799A (en) Academic paper label recommendation method based on multi-source heterogeneous information graph model
CN109992784A (en) A kind of heterogeneous network building and distance metric method for merging multi-modal information
CN110659357A (en) Geographic knowledge question-answering system based on ontology semantic similarity
CN103064907A (en) System and method for topic meta search based on unsupervised entity relation extraction
Huang et al. Design and implementation of oil and gas information on intelligent search engine based on knowledge graph
CN104156431A (en) RDF keyword research method based on stereogram community structure
Li Research and analysis of semantic search technology based on knowledge graph
CN103294791A (en) Extensible markup language pattern matching method
CN104765763B (en) A kind of semantic matching method of the Heterogeneous Spatial Information classification of service based on concept lattice
CN104317853B (en) A kind of service cluster construction method based on Semantic Web

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131225

Termination date: 20140401