CN112347222B - Method and system for converting non-standard address into standard address based on knowledge base reasoning - Google Patents

Method and system for converting non-standard address into standard address based on knowledge base reasoning Download PDF

Info

Publication number
CN112347222B
CN112347222B CN202011141247.7A CN202011141247A CN112347222B CN 112347222 B CN112347222 B CN 112347222B CN 202011141247 A CN202011141247 A CN 202011141247A CN 112347222 B CN112347222 B CN 112347222B
Authority
CN
China
Prior art keywords
address
entity
standard
knowledge base
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011141247.7A
Other languages
Chinese (zh)
Other versions
CN112347222A (en
Inventor
吕晓宝
叶恺翔
王元兵
王海荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sugon Nanjing Research Institute Co ltd
Original Assignee
Sugon Nanjing Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sugon Nanjing Research Institute Co ltd filed Critical Sugon Nanjing Research Institute Co ltd
Priority to CN202011141247.7A priority Critical patent/CN112347222B/en
Publication of CN112347222A publication Critical patent/CN112347222A/en
Application granted granted Critical
Publication of CN112347222B publication Critical patent/CN112347222B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a method and a system for converting a non-standard address into a standard address based on knowledge base reasoning, wherein the method specifically comprises the following steps: firstly, setting a body of an address knowledge base, secondly, constructing a standard address knowledge base, constructing an entity from the traditional standard address base, further constructing a word vector of a standard address, comparing the word vector by a cosine similarity algorithm, mapping the word vector to the entity in the knowledge base, searching an entity matched with an address element in the standard address knowledge base by using a semantic similarity algorithm based on an address name, and further extracting the address element and azimuth relation description information in an original text by named entity identification; through natural language processing and knowledge map processing, non-standardized address text data are automatically mapped to standard addresses through an algorithm, and the cleaning and treatment of the address data are completed.

Description

Method and system for converting non-standard address into standard address based on knowledge base reasoning
Technical Field
The invention relates to an address conversion technology, in particular to a method and a system for converting a non-standard address into a standard address based on knowledge base reasoning.
Background
With the progress of informatization construction of digital cities and smart cities in various regions, business information of different departments is gradually brought into informatization construction contents, however, most of addresses for expressing spatial positions in the information are semantic place name address information described by natural language characters, and spatial geographic coordinates for determining relative position relation of a space main body are described in an information world and are main indexes of spatialization of various information, the spatialization of the address is one of core technologies of an application service informatization system based on the position, and how to correlate and match the address and the spatial geographic coordinates is a key for realizing spatialization of various address information and is also a basis for realizing large-batch business data spatialization management.
At present, the non-standard address mapping algorithm basically calculates the similarity between each address text in a standard address and a non-standard address, and then selects the most similar address as an output result, and generally adopts a similarity algorithm as follows: 1. matching based on keywords; 2. cosine similarity based on the short text vector; 3. an edit distance based on the character string; 4. big data recommendation based on user click behaviors; 5. the mapping process is regarded as a text classification task, machine automatic learning is carried out through a naive Bayes and neural network model, and the similarity algorithms basically meet the requirements of nonstandard address mapping but lack reasoning capability.
Disclosure of Invention
The purpose of the invention is as follows: the utility model provides a complicated path planning system of data center computer lab to solve above-mentioned problem.
The technical scheme is as follows: a complex path planning system of a data center machine room comprises:
step 1: setting an ontology of an address knowledge base;
step 2: constructing a standard address knowledge base;
and step 3: comparing by a cosine similarity algorithm;
and 4, step 4: and extracting the address information of the original text.
According to one aspect of the invention, the ontology of the address knowledge base in the step 1 comprises a knowledge graph ontology, uuids of entities, entity attributes and relationships among the entities, wherein the knowledge graph ontology comprises six levels of province, city, county, street, town, road section and address unit, the entities are corresponding standard addresses of different levels, and are distinguished through global unique identifiers; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, longitude and latitude sequences of boundaries and remarks, and the labels are social attributes of the address entities.
According to an aspect of the present invention, the step 2 is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
According to an aspect of the present invention, the step 3 is further comparing by a cosine similarity algorithm, and marking the word vector after dividing the non-standard address character string as a vector
Figure DEST_PATH_IMAGE001
And
Figure 100002_DEST_PATH_IMAGE002
the vector spaces are different due to different bases, and the vector spaces need to be converted into the same vector space, and the module operation extraction is performed
Figure 778286DEST_PATH_IMAGE001
Figure 226585DEST_PATH_IMAGE002
The union of two vector bases to form a union base
Figure 537481DEST_PATH_IMAGE001
Figure 667111DEST_PATH_IMAGE002
If the two vectors are converted into a new merged vector space composed of merged bases, the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure DEST_PATH_IMAGE003
in the formula (I), the compound is shown in the specification,
Figure 543800DEST_PATH_IMAGE001
and
Figure 897421DEST_PATH_IMAGE002
all represent vectors;
memory vector
Figure 100002_DEST_PATH_IMAGE004
(ii) a Vector quantity
Figure DEST_PATH_IMAGE005
And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure 100002_DEST_PATH_IMAGE006
by the above method, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for querying the non-standard address, and the entity B is further obtained according to the recorded manual verification standard address.
According to one aspect of the invention, an entity B is obtained according to a recorded manual check standard address, the entity A and the entity B mapped to a knowledge base are judged, and a relationship between the entity A and the entity B is judged; extracting additional relation azimuth description in the text address in a mode of combining the regular expression and the part of speech tagging algorithm, mapping the additional relation azimuth description into a corresponding relation type and a corresponding attribute in a knowledge base body, and then establishing a corresponding relation from an entity A to an entity B, wherein the step of specifically extracting the additional relation azimuth description is as follows:
step 1, firstly, performing part-of-speech tagging on a text through an open-source word segmentation tool, filtering place names, proper nouns, verbs, adjectives and time words, and segmenting the text into a plurality of semantic segments;
step 2, judging whether each segment is described in relation orientation in a regular expression matching mode;
step 3, describing semantic segments of the direction by adopting a regular expression;
there is a relationship between the entity a and the entity B, and the probability of occurrence of the entity a and the entity B has influence on each other, that is:
Figure DEST_PATH_IMAGE007
searching an entity matched with the address elements in a standard address knowledge base to obtain the following mode:
Figure 100002_DEST_PATH_IMAGE008
the following is further derived from the relationship between the two:
Figure DEST_PATH_IMAGE009
in the formula (I), the compound is shown in the specification,
Figure 100002_DEST_PATH_IMAGE010
and
Figure DEST_PATH_IMAGE011
representing independent entity vector events.
According to an aspect of the present invention, the step 4 is further: the following steps are obtained according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
According to one aspect of the present invention, the relationship conforming to the orientation description is screened from all the relationships of the matching address entity, and the following steps are obtained:
43.1, establishing a non-standard address library and a standard address library which are independent of each other;
43.2, preprocessing the non-standard address base and performing first-level address matching with the standard address base;
43.3, splitting the address of the pre-processed non-standard address library and the standard address library to form an independent address library, and completing the allocation of the non-standard address library and the standard address library;
43.4, matching the addresses of the second level formed by the non-standard address base and the standard address base;
43.5, matching the addresses of the second level, judging whether the traversal of the level combination mode is finished, and finishing the matching of the address library if the traversal of the level combination mode is finished; if not, the operation of step 43.4 is performed so that the non-standard address pool and the standard address pool address match.
Has the advantages that: the invention designs a method and a system for converting a non-standard address into a standard address based on knowledge base reasoning, wherein the standard address and mutual relationship attributes are constructed into a knowledge base in a form of a head entity-directed relationship-tail entity triple, the knowledge base is stored in a knowledge graph form, and the head entity and the directed relationship in the triple are determined by extracting the standard address in the non-structured address and extracting related azimuth and attribute elements, so that a knowledge graph query condition is determined, and the tail entity address based on the standard address reasoning is obtained; the method is effectively applied to the scene of verbally describing the address by the user, helps the system quickly and accurately locate the real address pointed by the user, and compared with the traditional standard address mapping algorithm, the method can automatically construct and update the knowledge base based on the existing structural and non-structural data, and carries out logical reasoning, thereby conforming to the actual business scene.
Drawings
FIG. 1 is a flow chart of the standard address repository construction of the present invention.
FIG. 2 is a flow diagram of the knowledge base translation non-standard address of the present invention.
FIG. 3 is an address matching flow diagram of the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a method for converting a non-standard address into a standard address based on knowledge base reasoning includes:
step 1: setting an ontology of an address knowledge base;
step 2: constructing a standard address knowledge base;
and step 3: comparing by a cosine similarity algorithm;
and 4, step 4: and extracting the address information of the original text.
In a further embodiment, the ontology of the address knowledge base in step 1 includes a knowledge graph ontology, uuid of an entity, attribute of the entity, and a relationship between the entities, where the knowledge graph ontology includes six levels of province, city, county, street, town, road segment, and address unit, and the entities are standard addresses corresponding to different levels, and are distinguished by a globally unique identifier; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, longitude and latitude sequences of boundaries and remarks, and the labels are social attributes of the address entities.
In a further embodiment, the step 2 is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
In a further embodiment, the step 3 is further:
the word vectors are compared by a cosine similarity algorithm, and the word vectors after the segmentation of the non-standard address character strings are recorded as vectors
Figure 944880DEST_PATH_IMAGE001
And
Figure 940518DEST_PATH_IMAGE002
the vector spaces are different due to different bases, and the vector spaces need to be converted into the same vector space, and the module operation extraction is performed
Figure 140555DEST_PATH_IMAGE001
Figure 665077DEST_PATH_IMAGE002
The union of two vector bases to form a union base
Figure 153827DEST_PATH_IMAGE001
Figure 421998DEST_PATH_IMAGE002
If the two vectors are converted into a new merged vector space composed of merged bases, the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure 476541DEST_PATH_IMAGE003
in the formula (I), the compound is shown in the specification,
Figure 437544DEST_PATH_IMAGE001
and
Figure 475907DEST_PATH_IMAGE002
all represent vectors;
memory vector
Figure 282189DEST_PATH_IMAGE004
(ii) a Vector quantity
Figure 456819DEST_PATH_IMAGE005
And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure 323144DEST_PATH_IMAGE006
by the above method, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for querying the non-standard address, and the entity B is further obtained according to the recorded manual verification standard address.
In a further embodiment, an entity B is obtained according to the recorded manual check standard address, the entity A and the entity B mapped to the knowledge base are judged, and a relationship between the entity A and the entity B is judged; extracting additional relation azimuth description in the text address in a mode of combining the regular expression and the part of speech tagging algorithm, mapping the additional relation azimuth description into a corresponding relation type and a corresponding attribute in a knowledge base body, and then establishing a corresponding relation from an entity A to an entity B, wherein the step of specifically extracting the additional relation azimuth description is as follows:
step 1, firstly, performing part-of-speech tagging on a text through an open-source word segmentation tool, filtering place names, proper nouns, verbs, adjectives and time words, and segmenting the text into a plurality of semantic segments;
step 2, judging whether each segment is described in relation orientation in a regular expression matching mode;
step 3, describing semantic segments of the direction by adopting a regular expression;
there is a relationship between the entity a and the entity B, and the probability of occurrence of the entity a and the entity B has influence on each other, that is:
Figure 583224DEST_PATH_IMAGE007
searching an entity matched with the address elements in a standard address knowledge base to obtain the following mode:
Figure 396459DEST_PATH_IMAGE008
the following is further derived from the relationship between the two:
Figure 425595DEST_PATH_IMAGE009
in the formula (I), the compound is shown in the specification,
Figure 734259DEST_PATH_IMAGE010
and
Figure 481636DEST_PATH_IMAGE011
representing independent entity vector events.
In a further embodiment, the step 4 is further: the following steps are obtained according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
In a further embodiment, the relationship conforming to the orientation description is screened from all the relationships of the matching address entities, resulting in the following steps:
43.1, establishing a non-standard address library and a standard address library which are independent of each other;
43.2, preprocessing the non-standard address base and performing first-level address matching with the standard address base;
43.3, splitting the address of the pre-processed non-standard address library and the standard address library to form an independent address library, and completing the allocation of the non-standard address library and the standard address library;
43.4, matching the addresses of the second level formed by the non-standard address base and the standard address base;
43.5, matching the addresses of the second level, judging whether the traversal of the level combination mode is finished, and finishing the matching of the address library if the traversal of the level combination mode is finished; if not, the operation of step 43.4 is performed so that the non-standard address pool and the standard address pool address match.
In a further embodiment, the label is a social attribute of the address entity, such as "store, supermarket, school, hospital, institution, enterprise, residential district", etc., the attribute types of different types of entities are different, and all attributes should be included for "face" type entities, such as province, city, county, town, community, district, etc.; for "point" type entities, such as address units, there is no need to include a "boundary latitude and longitude sequence"; for "line" type entities, such as road segments, there is no need to include "center point latitude and longitude";
in further embodiments, the relationships between entities are classified into four types of relationships, i.e., "belong to", "equal", "adjacent", "cross", and so on:
the belonging relationship refers to a spatial contained relationship in which lower-level entities belong to upper-level entities among six levels of entities. Typically, a subordinate entity can only have a relationship with a nearest superior entity. However, there are exceptions, such as one "address unit" class entity corresponds to "intersection", and may belong to a plurality of "road section" type entities, or one "road section" class entity spans different areas, and may belong to different street towns;
the equal relation means that different place entities actually correspond to the same place due to different name calling methods or space superposition and the like;
the neighbor relation contains two attributes: "orientation" and "distance". Wherein the orientations include discrete values such as "south", "north", "east", "west", "opposite", "near", and the like; the distance is a specific numerical value and the unit is meter;
the intersection relation refers to a line-type address entity, such as an intersection generated by intersection between road sections (street, road, lane), wherein the intersection also corresponds to an address unit-type entity. The head entity and the tail entity of the cross relationship are respectively a road section type entity, the attribute of the cross relationship comprises two attributes of an intersection type and an intersection entity, the attribute value of the intersection type is equal to the intersection type and the intersection entity, and the attribute value of the intersection entity is uuid of the intersection type entity generated by the intersection.
In a further embodiment, the longitude and latitude are used to calculate the relationship between the entities, for example, if the determination of the longitude and latitude shows that "huawei building" faces the east 10 ° and is a new street crossing before 100 m, the relationship from the "huawei building" entity to the "new street crossing" entity can be increased by the following steps: east; distance: 100 meters ".
In a further embodiment, the regular expression indicates (near the opposite | east | south | west | north | side | adjacent | side | next | partition |) [ \ u4E00- \ u9FA5|0-9] {0,3} $, indicating that a semantic fragment that conforms to the orientation description must occur in one of the words "opposite", "east", etc., and not more than three characters from the end of the string; for example, a text that people put up opposite to a street of the eight-dot Huawei building in the morning can be used for identifying and extracting the entity A of the Huawei building through a named entity, then the orientation semantic description of the street opposite to the street is extracted through a regular expression template and a part-of-speech tagging algorithm, the semantic similarity algorithm is used for mapping the orientation semantic description of the street opposite to the standard relation type of the opposite, and the opposite relation can be added between the entity A of the Huawei building and the entity B of the Guangxi building by combining the manual verification address of the text.
In a further embodiment, the named entity identifies and extracts the address elements and the orientation relation description information in the original text, for example, "120 m east store of building", and "120 m east store" and so on.
In a further embodiment, the matched entity is unique, for example "120 meters east" is mapped to a neighbor relation, and the attribute is "position: east; distance: 120 m ".
In a further embodiment, the occurrence of a precise description of an address attribute, such as "shop of 120 m east of building mansion", then this address attribute of "shop" is used as a prerequisite for the end entity B query.
In a further embodiment, for the case of multi-hop relationship inference, the multi-position relationship is inferred step by step, and the existence of the intermediate entities is sequentially confirmed until the tail entity confirms the existence. If the intersection of the yellow mountain road and the level road is 50 meters east, the opposite side of the Suguo supermarket, all the cross relations of the entity of the yellow mountain road are searched, the relation corresponding to the entity of the yellow mountain road is found out, the uuid corresponding to the entity of the intersection is found out in the attribute of the relation, the entity is positioned, and the direction is found out from the entity of the intersection: east; distance: and (3) confirming that the label attribute of the tail entity is supermarket according to the tail entity of 50 meters, searching address entities with opposite relation according to the tail entity, wherein the process needs to ensure the existence of various intermediate entities, and if the tail entity does not exist, the conversion fails.
In a further embodiment, in the case that the matching entity is not unique, joint screening needs to be performed on information such as orientation relationship and attribute. For example, the Suguo supermarket opposite to the building is searched for all address entities with names including the building in the city range, the entities with the names of the Suguo supermarket in the tail entities of the opposite relation are screened, then the head entities and the tail entities meeting the conditions can be uniquely determined, and the standard addresses of the tail entities are extracted.
In a further embodiment, a system for a method for converting non-standard addresses to standard addresses based on knowledge base reasoning, comprising the following modules:
the hierarchical distribution module is used for setting an address knowledge base body; the hierarchy distribution module comprises a knowledge graph body, uuid of an entity, entity attributes and relations among the entities, wherein the knowledge graph body comprises six hierarchies of province, city, district and county, street and town, road sections and address units, the entities are corresponding standard addresses of different hierarchies, and are distinguished through global unique identifiers; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, a boundary longitude and latitude sequence and remarks, and the labels are social attributes of the address entities;
the standard address construction module is used for constructing a standard address knowledge base; the standard address construction module is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in a text in an entity naming and identifying mode, comparing the extracted address elements with word vectors of the constructed standard addresses and address word vectors of each entity in a knowledge base through a cosine similarity algorithm, and mapping the extracted address elements to an entity A in the knowledge base;
the vector comparison module is used for comparing by a cosine similarity algorithm; the vector comparison module further compares the word vectors by a cosine similarity algorithm, and marks the word vectors divided by the non-standard address character strings as vectors
Figure 895299DEST_PATH_IMAGE001
And
Figure 778942DEST_PATH_IMAGE002
due to their respectiveThe vector spaces are different due to different bases, and are required to be converted into the same vector space, and the module operation extraction is carried out
Figure 455911DEST_PATH_IMAGE001
Figure 221741DEST_PATH_IMAGE002
The union of two vector bases to form a union base
Figure 173517DEST_PATH_IMAGE001
Figure 177245DEST_PATH_IMAGE002
If the two vectors are converted into a new merged vector space composed of merged bases, the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure 25115DEST_PATH_IMAGE003
in the formula (I), the compound is shown in the specification,
Figure 481504DEST_PATH_IMAGE001
and
Figure 236971DEST_PATH_IMAGE002
all represent vectors;
memory vector
Figure 95205DEST_PATH_IMAGE004
(ii) a Vector quantity
Figure 910715DEST_PATH_IMAGE005
So as to substitute the cosine similarity algorithm to obtainThe following modes are adopted:
Figure 854400DEST_PATH_IMAGE006
by the above mode, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for inquiring the non-standard addresses, and an entity B is further obtained according to the recorded manual verification standard address;
the address information screening module is used for extracting the address information of the original text; the address information screening module obtains the following steps according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
In summary, the present invention has the following advantages: the method comprises the steps of constructing a knowledge base by a head entity-directed relationship-tail entity triple form of standard addresses and mutual relationship attributes, storing the knowledge base in a knowledge map form, and determining a head entity and a directed relationship in the triple through extraction of the standard addresses in unstructured addresses and extraction of relevant direction and attribute elements so as to determine a knowledge map query condition and obtain a tail entity address inferred based on the standard addresses, so that non-standardized geographic position information orally described by a user can be processed and converted into standard address information capable of being processed by a machine.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. The invention is not described in detail in order to avoid unnecessary repetition.

Claims (8)

1. A method for converting a non-standard address into a standard address based on knowledge base reasoning is characterized by comprising the following steps:
step 1: setting an ontology of an address knowledge base;
step 2: constructing a standard address knowledge base;
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in a text in an entity naming and identifying mode, comparing the extracted address elements with word vectors of the constructed standard addresses and address word vectors of each entity in a knowledge base through a cosine similarity algorithm, and mapping the extracted address elements to an entity A in the knowledge base;
and step 3: comparing by a cosine similarity algorithm;
and 4, step 4: and extracting the address information of the original text.
2. The method for converting a non-standard address into a standard address based on knowledge base reasoning according to claim 1, wherein the ontology of the address knowledge base in step 1 comprises a knowledge graph ontology, uuids of entities, attributes of the entities and relationships among the entities, wherein the knowledge graph ontology comprises six levels of province, city, county, street, town, road section and address unit, and the entities are corresponding standard addresses of different levels and are distinguished by a globally unique identifier; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, longitude and latitude sequences of boundaries and remarks, and the labels are social attributes of the address entities.
3. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 1, wherein said step 3 is further:
the word vectors are compared by a cosine similarity algorithm, and the word vectors after the segmentation of the non-standard address character strings are recorded as vectors
Figure DEST_PATH_IMAGE002
And
Figure DEST_PATH_IMAGE004
the vector spaces are different due to different bases, and the vector spaces need to be converted into the same vector space, and the module operation extraction is performed
Figure 676342DEST_PATH_IMAGE002
Figure 130500DEST_PATH_IMAGE004
The union of two vector bases to form a union base
Figure 441396DEST_PATH_IMAGE002
Figure 367764DEST_PATH_IMAGE004
If the two vectors are converted into a new merged vector space composed of merged bases, the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure DEST_PATH_IMAGE006
in the formula (I), the compound is shown in the specification,
Figure 572349DEST_PATH_IMAGE002
and
Figure 925970DEST_PATH_IMAGE004
all represent vectors;
memory vector
Figure DEST_PATH_IMAGE008
(ii) a Vector quantity
Figure DEST_PATH_IMAGE010
And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure DEST_PATH_IMAGE012
by the above method, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for querying the non-standard address, and the entity B is further obtained according to the recorded manual verification standard address.
4. The method for converting the non-standard address into the standard address based on the knowledge base inference as claimed in claim 3, wherein the entity B is obtained according to the recorded manual check standard address, the entity A and the entity B mapped to the knowledge base are judged, and a relationship exists between the entity A and the entity B; extracting additional relation azimuth description in the text address in a mode of combining the regular expression and the part of speech tagging algorithm, mapping the additional relation azimuth description into a corresponding relation type and a corresponding attribute in a knowledge base body, and then establishing a corresponding relation from an entity A to an entity B, wherein the step of specifically extracting the additional relation azimuth description is as follows:
step 1, firstly, performing part-of-speech tagging on a text through an open-source word segmentation tool, filtering place names, proper nouns, verbs, adjectives and time words, and segmenting the text into a plurality of semantic segments;
step 2, judging whether each segment is described in relation orientation in a regular expression matching mode;
step 3, describing semantic segments of the direction by adopting a regular expression;
there is a relationship between the entity a and the entity B, and the probability of occurrence of the entity a and the entity B has influence on each other, that is:
Figure DEST_PATH_IMAGE014
searching an entity matched with the address elements in a standard address knowledge base to obtain the following mode:
Figure DEST_PATH_IMAGE016
the following is further derived from the relationship between the two:
Figure DEST_PATH_IMAGE018
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE020
and
Figure DEST_PATH_IMAGE022
representing independent entity vector events.
5. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 1, wherein said step 4 is further:
the following steps are obtained according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
6. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 5, wherein the relationship conforming to the orientation description is selected from all the relationships of the matching address entities, and the following steps are obtained:
43.1, establishing a non-standard address library and a standard address library which are independent of each other;
43.2, preprocessing the non-standard address base and performing first-level address matching with the standard address base;
43.3, splitting the address of the pre-processed non-standard address library and the standard address library to form an independent address library, and completing the allocation of the non-standard address library and the standard address library;
43.4, matching the addresses of the second level formed by the non-standard address base and the standard address base;
43.5, matching the addresses of the second level, judging whether the traversal of the level combination mode is finished, and finishing the matching of the address library if the traversal of the level combination mode is finished; if not, the operation of step 43.4 is performed so that the non-standard address pool and the standard address pool address match.
7. A system for converting a non-standard address into a standard address based on knowledge base reasoning is characterized by comprising the following modules:
the hierarchical distribution module is used for setting an address knowledge base body;
the standard address construction module is used for constructing a standard address knowledge base;
the vector comparison module is used for comparing by a cosine similarity algorithm;
the address information screening module is used for extracting the address information of the original text;
the hierarchy distribution module comprises a knowledge graph body, uuid of an entity, entity attributes and relations among the entities, wherein the knowledge graph body comprises six hierarchies of province, city, district and county, street and town, road sections and address units, the entities are corresponding standard addresses of different hierarchies, and are distinguished through global unique identifiers; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, a boundary longitude and latitude sequence and remarks, and the labels are social attributes of the address entities;
the standard address construction module is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
8. The system of claim 7, wherein the vector matching module further performs matching by cosine similarity algorithm, and the word vector after segmentation of the non-standard address character string is recorded as a vector
Figure 425959DEST_PATH_IMAGE002
And
Figure 421597DEST_PATH_IMAGE004
the vector spaces are different due to different bases, and the vector spaces need to be converted into the same vector space, and the module operation extraction is performed
Figure 621634DEST_PATH_IMAGE002
Figure 146156DEST_PATH_IMAGE004
The union of two vector bases to form a union base
Figure 431644DEST_PATH_IMAGE002
Figure 903077DEST_PATH_IMAGE004
If the two vectors are converted into a new merged vector space composed of merged bases, the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure DEST_PATH_IMAGE006A
in the formula (I), the compound is shown in the specification,
Figure 816675DEST_PATH_IMAGE002
and
Figure 771819DEST_PATH_IMAGE004
all represent vectors;
memory vector
Figure 544602DEST_PATH_IMAGE008
(ii) a Vector quantity
Figure 554147DEST_PATH_IMAGE010
And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure DEST_PATH_IMAGE012A
by the above mode, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for inquiring the non-standard addresses, and an entity B is further obtained according to the recorded manual verification standard address;
the address information screening module obtains the following steps according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
CN202011141247.7A 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning Active CN112347222B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011141247.7A CN112347222B (en) 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011141247.7A CN112347222B (en) 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning

Publications (2)

Publication Number Publication Date
CN112347222A CN112347222A (en) 2021-02-09
CN112347222B true CN112347222B (en) 2022-03-18

Family

ID=74359804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011141247.7A Active CN112347222B (en) 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning

Country Status (1)

Country Link
CN (1) CN112347222B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818072A (en) * 2021-03-09 2021-05-18 携程旅游信息技术(上海)有限公司 Tourism knowledge map updating method, system, equipment and storage medium
CN112906394A (en) * 2021-03-18 2021-06-04 北京字节跳动网络技术有限公司 Address recognition method, device, equipment and storage medium
CN113822057B (en) * 2021-08-06 2022-10-18 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium
CN113505190B (en) * 2021-09-10 2021-12-17 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium
CN113987114B (en) * 2021-09-17 2023-04-07 上海燃气有限公司 Address matching method and device based on semantic analysis and electronic equipment
CN115879456A (en) * 2021-09-26 2023-03-31 中兴通讯股份有限公司 Topology restoration method, device, server and storage medium of resource
CN113836357B (en) * 2021-10-12 2022-09-16 北京商越网络科技有限公司 Address database data processing method and control system based on text similarity calculation
CN114117004B (en) * 2021-11-24 2023-06-30 北京百度网讯科技有限公司 Address recognition method, address recognition device, electronic equipment and storage medium
CN114168705B (en) * 2021-12-03 2022-11-11 南京大峡谷信息科技有限公司 Chinese address matching method based on address element index
CN114363264B (en) * 2021-12-22 2024-03-15 中科曙光(南京)计算技术有限公司 Service reservation method
CN117708262A (en) * 2024-02-02 2024-03-15 北京友友天宇系统技术有限公司 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2992482A1 (en) * 2013-04-29 2016-03-09 Siemens Aktiengesellschaft Data unification device and method for unifying unstructured data objects and structured data objects into unified semantic objects
US9535902B1 (en) * 2013-06-28 2017-01-03 Digital Reasoning Systems, Inc. Systems and methods for entity resolution using attributes from structured and unstructured data
CN104679850B (en) * 2015-02-13 2018-05-29 深圳市华傲数据技术有限公司 Address structure method and device
CN107194011A (en) * 2017-06-23 2017-09-22 重庆邮电大学 A kind of position prediction system and method based on social networks
CN111144117B (en) * 2019-12-26 2023-08-29 同济大学 Method for disambiguating Chinese address of knowledge graph

Also Published As

Publication number Publication date
CN112347222A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN112347222B (en) Method and system for converting non-standard address into standard address based on knowledge base reasoning
Mustière et al. Matching networks with different levels of detail
CN110020433B (en) Industrial and commercial high-management name disambiguation method based on enterprise incidence relation
CN113434623B (en) Fusion method based on multi-source heterogeneous space planning data
CN107679221B (en) Time-space data acquisition and service combination scheme generation method for disaster reduction task
CN109657074B (en) News knowledge graph construction method based on address tree
CN107526786A (en) The method and system that place name address date based on multi-source data is integrated
CN108388559A (en) Name entity recognition method and system, computer program of the geographical space under
Du et al. A method for matching crowd‐sourced and authoritative geospatial data
CN112612863B (en) Address matching method and system based on Chinese word segmentation device
CN112988715B (en) Construction method of global network place name database based on open source mode
CN116028645B (en) Urban municipal infrastructure emergency knowledge graph determination method, system and equipment
CN111813819B (en) Space-time big data-based place name and address online matching method
CN114661744B (en) Terrain database updating method and system based on deep learning
CN114168705B (en) Chinese address matching method based on address element index
Zhou et al. A points of interest matching method using a multivariate weighting function with gradient descent optimization
Liu et al. M: N Object matching on multiscale datasets based on MBR combinatorial optimization algorithm and spatial district
CN114820960B (en) Method, device, equipment and medium for constructing map
CN114860891A (en) Method and device for constructing space-time map of intelligent pipe network
CN112269845B (en) Method for quickly matching electronic road map and bus route facing to different source data
CN111325235B (en) Multilingual-oriented universal place name semantic similarity calculation method and application thereof
Loai Ali et al. Towards rule-guided classification for volunteered geographic information
CN114595302A (en) Method, device, medium, and apparatus for constructing multi-level spatial relationship of spatial elements
CN111444299A (en) Chinese address extraction method based on address tree model
Olteanu et al. Matching imperfect spatial data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant