CN112528174A - Address finishing and complementing method based on knowledge graph and multiple matching and application - Google Patents

Address finishing and complementing method based on knowledge graph and multiple matching and application Download PDF

Info

Publication number
CN112528174A
CN112528174A CN202011361104.7A CN202011361104A CN112528174A CN 112528174 A CN112528174 A CN 112528174A CN 202011361104 A CN202011361104 A CN 202011361104A CN 112528174 A CN112528174 A CN 112528174A
Authority
CN
China
Prior art keywords
address
matching
place name
word segmentation
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011361104.7A
Other languages
Chinese (zh)
Inventor
温金明
林佳铎
黄斐然
罗伟其
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
University of Jinan
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202011361104.7A priority Critical patent/CN112528174A/en
Publication of CN112528174A publication Critical patent/CN112528174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an address trimming and complementing method based on knowledge graph and multiple matching and application thereof, wherein the method comprises the following steps: performing word segmentation operation on input address text data by adopting a word segmentation tool, constructing an address noun dictionary for matching word segmentation, and performing matching recombination according to a place name rule; acquiring address administrative region division data, constructing an address knowledge graph by adopting a database management tool, acquiring old names or alias information of place names, and constructing and associating the corresponding place names of the old names or the alias information of the place names in the constructed address knowledge graph; according to the characteristics of address composition, a plurality of matching rules are constructed for matching, and the corresponding matching rules are adopted for correcting and complementing the addresses, wherein the matching rules comprise a previous missing matching rule, a previous full missing matching rule under the condition of renaming, a previous adjacent missing matching rule under the condition of renaming and an old name and alias correction matching rule. The invention realizes correct, efficient and normative finishing and completion of the default information and even the wrong address.

Description

Address finishing and complementing method based on knowledge graph and multiple matching and application
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to an address trimming and complementing method based on knowledge graphs and multiple matching and application.
Background
With the huge increase of the information volume of the internet in recent years, the information of users is also growing explosively, and particularly, address data (particularly, e-commerce) is related to data of some users. Since there are also a large number of address data obtained in the form of text manually input by the user, address data conforming to the personal writing habit of the user inevitably appears. Thus, the address data obtained from the user is likely not in a complete and canonical address form, which causes difficulty for a person who subsequently uses the address data (e.g., a courier) to locate the geographic location of the address. Therefore, the method is urgent and has important practical application direction for finishing and complementing incomplete and irregular address data generated by depending on user habits to obtain complete and accurate and standard address data.
Most of the previous address completion methods are based on a simple table look-up, and the pain points of the methods are as follows: 1. each pair of associated information creates a row of mappings in the data table, for an address name, one address name may be associated with multiple other address names, and each pair of associated information creates a row of data in the table, which results in a large amount of redundant data. If there are 11 regions under Guangzhou, a map is created for each region with Guangzhou (the river region, the Guangzhou city), which would occupy a single row in the table, resulting in at least 11 occurrences of "Guangzhou city" in a table. When the address data volume is large, the generated redundant data is large; 2. because the complex correlation between the addresses cannot be well represented, incomplete address completion is easily caused; 3. in reality, address data input by a user has strong user habit characteristics, which easily causes the situations of address information default, old name and alias, thereby causing the problem of incapability of matching and completing.
In addition, an address completion method constructed based on the Trie tree is provided, and the pain point of the method is as follows: 1. a large number of Trie trees need to be constructed, a depth traversal mode is adopted for matching, and the matching efficiency is slow; 2. although the method is also a method in the field of knowledge graph, the characteristic of multiple knowledge semantics of the knowledge graph cannot be well utilized; 3. the complicated situations of address renaming, alias names, old names and the like cannot be processed.
In addition, the address data has a hierarchical structure which is mainly reflected in a certain hierarchical structure, and the administrative region division of China mainly shows that: 1. dividing the country into provinces, autonomous regions and direct prefectures; 2. the province and the autonomy are divided into autonomy states, counties, autonomy counties and cities; 3. the direct prefecture city and the larger city are divided into districts and counties; 4. the autonomous states are divided into counties, counties and cities; 5. the county and the autonomous county are divided into villages, national villages and towns; 6. streets can be arranged under counties, districts, villages and towns; 7. the street may have a place of commission. Generally, the system is mainly divided into five provinces, cities, districts, streets and committees. The five-layer structure is a general structure and has no qualitative property, for example, Beijing city itself is provincial unit, and subordinate administrative unit skips city level and directly reaches district level. For example, some city-level units are subordinate to administrative districts and cities only are regional-level units, and then the regional-level units are subordinate to towns; some suburban units have districts and towns; even some of the suburban genera only have towns. Therefore, the administrative division of the address of China has a certain level, but is relatively complex, and the level of the administrative district where China is located cannot be directly determined from some keywords such as cities, districts, towns, counties and the like.
Further, considering the reason of history change, part of the place names are changed due to merger, split, and advanced to higher-level administration. However, due to the delay caused by the habit of the user, many users still use the old names of the history habitually, and in addition, there are also phenomena of alias names and even rename names, which all increase the difficulty of searching for the place names for matching.
Disclosure of Invention
In order to overcome the defects and shortcomings of the prior art, the invention provides an address trimming and completion method based on a knowledge graph and multiple matching, which is different from the existing similar method.
It is a second object of the present invention to provide an address pruning completion system based on knowledge-graph and multiple matching.
A third object of the present invention is to provide a storage medium.
It is a fourth object of the invention to provide a computing device.
In order to achieve the purpose, the invention adopts the following technical scheme:
an address trimming completion method based on knowledge graph and multiple matching comprises the following steps:
address text segmentation and preliminary matching: performing word segmentation operation on input address text data by adopting a word segmentation tool, constructing an address noun dictionary for matching word segmentation, and performing matching recombination according to a place name rule;
constructing an address knowledge graph: acquiring address administrative region division data, constructing an address knowledge graph by adopting a database management tool, acquiring old names or alias information of place names, and constructing and associating the corresponding place names of the old names or the alias information of the place names in the constructed address knowledge graph;
establishing a plurality of matching models based on the address knowledge graph: according to the characteristics of address composition, a plurality of matching rules are constructed for matching, and the corresponding matching rules are adopted for correcting and completing the addresses, wherein the matching rules comprise a preceding missing matching rule, a preceding full missing matching rule under the condition of renaming, a preceding adjacent missing matching rule under the condition of renaming and an old name and alias correction matching rule.
As a preferred technical solution, the word segmentation tool adopts an open-source word segmentation tool jieba.
The address noun dictionary is provided with a dictionary set of error word segmentation, the dictionary set is read during word segmentation operation, whether place names in the dictionary set exist in the input address text data or not is sequentially matched, and if the place names exist, the place names are divided.
As a preferred technical solution, the specific steps of performing matching recombination according to the place name rule include: and judging whether the tail of each word in the matched word segmentation result is matched with a noun preset by the place name rule or not according to the matched word segmentation result, and combining the word segmentation result of the address text data into a complete place name if the tail of each word in the matched word segmentation result is matched with the noun preset by the place name rule.
As a preferred technical solution, the specific steps of constructing the address knowledge graph by using the database management tool include:
and according to address administrative region division data, associating each address entity with an address entity of the upper layer in a deep traversal mode, and associating place names with upper and lower level affiliations.
As a preferred technical solution, the database management tool employs Neo4j database.
As a preferred technical solution, the correcting and complementing the address by using the corresponding matching rule specifically includes:
matching of the previous deletion: detecting that the place name at the upper level of the place name is in a missing state, matching in the constructed address knowledge map according to the place name, searching the place name at the upper level and completing;
full deletion matching of the foregoing in case of duplication: detecting the place name duplicate and the preceding-stage place name in a missing state, constructing a relation pair combining the place name and the next-stage place name, matching in the constructed address knowledge map, finding the preceding-stage place name and completing;
the previous adjacent deletions match in the case of a rename: detecting the place name duplicate name and the adjacent upper-level place name in a missing state, constructing a relation pair combining the place name and the upper-level place name, matching in the constructed address knowledge map library, finding out the preceding-level place name and completing;
and (3) modifying and matching the old name and the alias: and detecting that the place name is an old name or an alias, matching in the constructed address knowledge graph according to the old name or the alias, finding the corresponding existing place name, and correcting.
In order to achieve the second object, the present invention adopts the following technical solutions:
a system for address pruning completion based on knowledge-graph and multiple matches, comprising: the system comprises an address text word segmentation and preliminary matching module, an address knowledge graph construction module and a matching model construction module;
the address text word segmentation and preliminary matching module is used for performing word segmentation and preliminary matching on the address text, performing word segmentation operation on input address text data by adopting a word segmentation tool, constructing an address noun dictionary for performing matching word segmentation, and performing matching recombination according to a place name rule;
the address knowledge map building module is used for building an address knowledge map, obtaining address administrative district division data, building the address knowledge map by adopting a database management tool, obtaining old names or alias information of place names, and building and associating the corresponding place names of the old names or the alias information of the place names in the built address knowledge map;
the matching model building module is used for building various matching models based on the address knowledge graph, building various matching rules for matching according to the characteristics of address composition, and correcting and completing the addresses by adopting the corresponding matching rules, wherein the matching rules comprise a preceding missing matching rule, a preceding full missing matching rule under the condition of renaming, a preceding adjacent missing matching rule under the condition of renaming and an old name and alias correction matching rule.
In order to achieve the third object, the present invention adopts the following technical solutions:
a storage medium storing a program which, when executed by a processor, implements the address trimming completion method based on the knowledge-graph and multiple matching as described above.
In order to achieve the fourth object, the present invention adopts the following technical means:
a computing device comprises a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to realize the address trimming completion method based on the knowledge graph and the multiple matching.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention effectively solves the problem that the address text generated based on the habit characteristics of the user is easy to have partial address information loss, even has old names and alias names and brings difficulty in identification, and enables the complemented address data to be efficiently and accurately amended.
(2) The invention adopts the jieba word segmentation device and the scheme of dividing the address data according to the administrative division characteristics, solves the word segmentation problem of the address data, and achieves the purpose of accurately segmenting a string of address data into individual address data quickly and efficiently.
Drawings
FIG. 1 is an overall framework of the address trimming completion method based on knowledge-graph and multiple matching of the present invention;
FIG. 2 is a flow of preliminary word segmentation and preliminary matching of address text according to the present invention;
FIG. 3 is a schematic view of a visualization of a basic address knowledge graph constructed in accordance with the present invention;
FIG. 4 is a schematic diagram of old name and alias association according to the present invention;
FIG. 5 is a schematic diagram of the basic architecture of the multi-matching system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
As shown in fig. 1, the present embodiment provides an address trimming completion method based on a knowledge graph and multiple matching, including: the method comprises the following steps of address text preliminary word segmentation and preliminary matching, address knowledge graph construction and knowledge graph establishment of multiple matching models, and specifically comprises the following steps:
s1: address text segmentation and preliminary matching
A jieba word segmentation tool is used for segmenting words of an address text, in order to solve the problems of word segmentation errors and over-discrete word segmentation, a user-defined place name dictionary is introduced to achieve full place name matching, and word segmentation errors are solved. According to the Chinese address naming rule, a matching rule is defined, so that the address text can be divided into a plurality of independent and complete place names;
as shown in fig. 2, the specific steps of address text segmentation and preliminary matching are as follows:
s11 participle
Given address text data, performing word segmentation operation on the input address text data by using an open-source word segmentation tool jieba. For example, inputting an address text of "a mountain area mud hollow bridge street 709 number" in the large company city into the jieba word segmentation device, the result [ "the large company city", "middle", "mountain area", "mud", "hollow bridge", "street", "709", "number" ];
s12 self-defining address noun dictionary
Since the jieba participler is a general Chinese participle function and is not specially used for the participle of the address text, the place name in the address text cannot be completely and correctly segmented, for example, the number 901 of the instrument town in the old store area of Baoji city is input, and the obtained result is [ "Baoji city", "old store", "area mu", "instrument town", "901", "number ]. Obviously, the word segmentation device mistakenly takes the district mu as a word, and the really desired result is [ "Baoji city", "storehouse district", "mu apparatus town", "901 number ];
therefore, an address noun dictionary is needed to be customized, and the dictionary is mainly some common words which are easy to be segmented wrongly, for example, the 'Baoshan city industrial park' is easy to be segmented wrongly due to the 'city', and is divided into [ 'Baoshan city', 'industrial park' ]. The invention defines a ditt file as a dictionary set, writes the words which are easy to be segmented incorrectly into the dictionary set, reads the dictionary set when performing segmentation, sequentially matches whether the address data has the place name in the dictionary set, and directly segments the place name if the address data has the place name.
The data are preliminarily matched in such a way, so that word segmentation errors are avoided. For example, the word of 'a display area' is added into a custom dictionary to achieve the full matching of the special address nouns in the address text;
s13 reprocessing according to the place name rule
In step S11 and step S12, the results of word segmentation may be very discrete. Such as the example in step S11: [ "Dalian city", "middle", "mountain area", "mud", "hollow bridge", "street", "709", "number" ];
according to observation, the end of the independent place name in the Chinese address has certain characteristics, such as ending with province, city, district, town, village, street and the like. Therefore, according to the naming rule of the place name in China, a matching rule is designed to match and combine the word segmentation results to obtain independent and complete place name data;
the matching rule firstly matches the nouns such as "province", "city", "district", "town", "autonomous district", "autonomous state", "county", "street" and the like at the end of each word in the preliminary word segmentation result. If the word is matched and the vocabulary exists at the tail, all words which are successfully matched last time to the current time can be spliced into a complete place name, so that the problem of over discrete word segmentation is solved;
s2: address text segmentation and preliminary matching
Address data is obtained, and an address knowledge graph is constructed by using neo4j, wherein the address knowledge graph comprises the following steps: acquiring data of administrative region division, and constructing a basic address knowledge graph according to the data, wherein the basic address knowledge graph is constructed according to the subordination relation of the administrative division as shown in figure 3; obtaining the old names and the alias information of the place names, as shown in fig. 4, associating the old names and the alias names with the corresponding place names in the constructed basic address knowledge graph, which comprises the following specific steps:
s21 construction of basic address knowledge map
The latest Chinese administrative region division data is obtained from the national statistical bureau, and according to the data of the statistical bureau, the Chinese administrative region division is roughly divided into five levels: province, city, district, street, and residence. The main performance is as follows: 1. dividing the country into provinces, autonomous regions and direct prefectures; 2. the province and the autonomy are divided into autonomy states, counties, autonomy counties and cities; 3. the direct prefecture city and the larger city are divided into districts and counties; 4. the autonomous states are divided into counties, counties and cities; 5. the county and the autonomous county are divided into villages, national villages and towns; 6. streets can be arranged under counties, districts, villages and towns; 7. the street may have a place of commission. Generally, the system is mainly divided into five provinces, cities, districts, streets and committees.
The five-layer structure is a general structure and has no qualitative property, for example, Beijing city itself is provincial unit, and subordinate administrative unit skips city level and directly reaches district level. For example, some city-level units are subordinate to administrative districts and cities only are regional-level units, and then the regional-level units are subordinate to towns; some suburban units have districts and towns; even some of the suburban genera only have towns.
Although the address hierarchy is relatively complex, administrative division data obtained by the national statistical bureau has strong superior and inferior subordination. For the present invention, only the first four stages need be used.
Thus, the present invention constructs an address knowledge graph by taking the administrative district division data of the first four levels.
The main construction method is to use four-level administrative division, and the program is written to associate each address entity with the address entity of the previous level through a deep traversal mode, for example, to construct a relation pair of [ 'guangzhou city', 'subordination', 'guangdong province' ], which is mainly constructed through the cql statement of the Neo4j database, and the specific construction statement is as follows:
MATCH (a: level 1{ name: 'address name 1' }), (b: level 2{ name: 'address name 2' }) MERGE (a) - [: BELONG TO ] - > (b);
the level in the above sentence represents the belonging level of the place name, such as 'Guangdong province' belonging to 'province level'; 'Guangzhou City' belongs to 'City level'. By means of the cql statement, a pair of place names with upper and lower level dependencies can be associated, and a partial schematic diagram of a geological knowledge map with a four-level structure is constructed as shown in FIG. 3.
According to the upper and lower level subordination relation, the front four-level structure can be obtained.
To be able to better build and store these address data, the Neo4j database was chosen. Neo4j is a high-performance, NOSQL graph database that stores structured data on a network rather than in tables. Neo4j is well suited for constructing such a knowledge graph with superior and inferior dependencies.
S2.2 Association of old and alias names
For historical and user habit reasons, some users do not strictly install address specifications to write addresses when writing addresses. Particularly, part of the place names in a short period of time are changed due to merging, upgrading and the like, and users still use old names to express the place names due to habitual problems in a period of time.
Such as the friendly region in the city of alchun, black dragon river, once named the region of greenish. If the user still writes a sweet-green area while writing, it will cause a certain trouble for the person who finds the address.
Similarly, it may be cumbersome for a person looking for the address if the user is accustomed to naming with an alias for a place. Such as Guangzhou city, also known as "sheep city", "flower city"; the street in Aquilaria sinensis is habitually called Aquilaria sinensis and so on by local people.
Therefore, the old names and the alias names corresponding to the base address knowledge map can be associated, and the alias names of the old names can be corrected immediately if the old names are found to be the place names in the matching search.
The specific association method is to use cql sentences of the Neo4j database for association, wherein cql sentences are as follows:
an alias association statement: MATCH (a: alias { name: 'address name' }), (b: county-level { name: 'address name' }) MERGE (a) - [: ALIAS NAME ] - > (b);
old name association statement: MATCH (a: OLd NAME { NAME: 'address NAME' }), (b: county level { NAME: 'address NAME' }) MERGE (a) - [: OLd NAME ] - > (b);
s3, establishing various matching models based on knowledge graph
Due to the problems of writing habits of users and the like, most address texts are not complete and standard, and address information is correct, so that various matching rules need to be constructed according to the generated address knowledge graph and the writing format with correct addresses, and each address obtained by address segmentation is effectively matched and completed or corrected. As shown in fig. 5, the main flow of the knowledge-graph-based multiple matching models is as follows:
s31 context miss matching
And detecting that the place name at the upper level of a certain place name is in a missing state, matching in the constructed address knowledge map library according to the place name, searching the place name at the upper level and completing.
Taking the word segmentation result of [ "warehouse area", "mu instrumented town", "No. 901" ] as an example of input, firstly inputting the "warehouse area", matching the place name of the "warehouse area" in the address knowledge map, belonging to the place name of the third level, and only one, filling the place name of the third level of the final result.
If the examination finds that the place names of the provincial level or the city level of the final result before the third level are all missing, and the "display space" is unique in the address knowledge map according to the matching result, the front level of the "display space" is found by the cql statement of Neo4 j. Namely, the former stage of the 'storehouse area' obtained by matching is 'Baoji City' and 'Shaanxi province', and the two place names are respectively filled into the provincial and city place name positions of the final result, so as to realize the completion of default information.
S32 full-missing match of preamble under duplicate condition
When a certain place name is used for matching the address knowledge graph, a plurality of results, namely the situation of duplicate names, are found, and the condition that the place name of the previous stage is in a missing state and the place name of the next stage is detected, the matching can be carried out in the constructed address knowledge graph library according to the combination relation pair of the place name and the place name of the next stage, and the place name of the previous stage is found and completed.
The specific matching process takes [ "sum and flat", "long and white streets", "number 15" ] as an example. Tianjin and Shenyang have peace areas, so the individual pair "peace area" has repeated results when the address knowledge map base is matched, and it is not possible to exactly which one is correct.
The top level in this example is the default, but it has the next level of "long white street", so the triplets of "and flat", "subordinates", "long white street" ] are constructed to match in the address knowledge map library.
Through the combination, the part with the duplicate name can be eliminated, and the peace area long and white street can be matched only under Shenyang city of Liaoning province. Therefore, the positions of the corresponding levels of the final result can be filled with the 'Liaoning province' and the 'Shenyang city', and the full missing matching of the former under the condition of renaming is realized.
The specific mode is that a matched cql statement is constructed, and the specific statement is as follows:
match (n) [: Belong TO '] - (name: TO match place name) [: Belong TO' ] - (name: TO match next place name of place name) return n;
through the cql statements, specific place names can be matched in the constructed address knowledge graph, and upper-level place names can also be matched.
S33 duplicate condition, wherein the former has only adjacent missing match
As shown in fig. 5, when a place name matching address knowledge graph is used, it is found that there are multiple results, that is, there is a missing adjacent place name at the previous stage, and there is a place name at the previous stage, so that matching can be performed in the constructed address knowledge graph library according to the relationship pair of the place name and the place name at the previous stage, and the previous-stage place name can be found and completed.
Take "Jilin province", "facing sun region", "lake west street", "No. 15" ] as an example. The Beijing City and Changchun City of Jilin province have a sunny region. If the "sunny region" is matched alone, there will be repeated results, which cannot be exact.
In this example, the previous stage is present, so that a relation pair of "Jilin province" and "sunny region" is constructed, and then the address knowledge map library is matched. The finally obtained Changchun city of Jilin province only has the sunny region, so the Changchun city is filled into the city-level unit of the final result, thereby realizing completion.
The specific mode is that a matched cql statement is constructed, and the specific statement is as follows:
match (name: place name at preceding stage TO BE matched with place name) < - [: Belong TO '] - (n) < - [: BE LONG TO' ] - - - - (name: TO BE matched with place name) return n;
through the cql statements, specific place names can be matched in the constructed address knowledge graph, and place names missing at the upper level can also be matched.
S34 old name and alias matching
Similarly, after the address knowledge graph is inquired about part of the place names, the old names or the alias can be known, the matching is carried out in the constructed address knowledge graph library according to the old names or the alias, the existing real place names are searched, and the correction is carried out.
Example 2
The embodiment provides an address trimming completion system based on knowledge graph and multiple matching, comprising: the system comprises an address text word segmentation and preliminary matching module, an address knowledge graph construction module and a matching model construction module;
in this embodiment, the address text word segmentation and preliminary matching module is used for performing word segmentation and preliminary matching on an address text, performing word segmentation operation on input address text data by using a word segmentation tool, constructing an address noun dictionary for performing matching word segmentation, and performing matching recombination according to a place name rule;
in this embodiment, the address knowledge map construction module is configured to construct an address knowledge map, acquire address administrative district division data, construct the address knowledge map by using a database management tool, acquire old names or alias information of place names, and associate the old names or alias information of the place names with corresponding place names in the constructed address knowledge map;
in this embodiment, the matching model building module is configured to build a plurality of matching models based on the address knowledge graph, build a plurality of matching rules for matching according to characteristics of address composition, and modify and complement addresses by using corresponding matching rules, where the matching rules include a previous missing matching rule, a previous full missing matching rule in a duplicate case, a previous adjacent missing matching rule in a duplicate case, and an old name and alias modified matching rule.
Example 3
This embodiment provides a storage medium, which may be a storage medium such as a ROM, a RAM, a magnetic disk, an optical disk, etc., and the storage medium stores one or more programs, and when the programs are executed by a processor, the address trimming and complementing method based on the knowledge graph and the multiple matching of embodiment 1 is implemented.
Example 4
The embodiment provides a computing device, which may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer, or other terminal devices with a display function, and the computing device includes a processor and a memory, where the memory stores one or more programs, and when the processor executes the programs stored in the memory, the address trimming and completing method based on the knowledge graph and the multiple matching in embodiment 1 is implemented.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. An address trimming completion method based on knowledge graph and multiple matching is characterized by comprising the following steps:
address text segmentation and preliminary matching: performing word segmentation operation on input address text data by adopting a word segmentation tool, constructing an address noun dictionary for matching word segmentation, and performing matching recombination according to a place name rule;
constructing an address knowledge graph: acquiring address administrative region division data, constructing an address knowledge graph by adopting a database management tool, acquiring old names or alias information of place names, and constructing and associating the corresponding place names of the old names or the alias information of the place names in the constructed address knowledge graph;
establishing a plurality of matching models based on the address knowledge graph: according to the characteristics of address composition, a plurality of matching rules are constructed for matching, and the corresponding matching rules are adopted for correcting and completing the addresses, wherein the matching rules comprise a preceding missing matching rule, a preceding full missing matching rule under the condition of renaming, a preceding adjacent missing matching rule under the condition of renaming and an old name and alias correction matching rule.
2. The address trimming completion method based on the knowledge graph and the multiple matching according to claim 1, wherein the word segmentation tool adopts an open-source word segmentation tool jieba.
3. The address trimming completion method based on knowledge graph and multiple matching according to claim 1, wherein the address noun dictionary is provided with a dictionary set of error word segmentation, the dictionary set is read during word segmentation operation, whether the place name in the dictionary set exists in the inputted address text data is sequentially matched, and if the place name exists, the place name is segmented.
4. The address trimming completion method based on knowledge graph and multiple matching according to claim 1, wherein the specific steps of matching and regrouping according to the place name rule comprise: and judging whether the tail of each word in the matched word segmentation result is matched with a noun preset by the place name rule or not according to the matched word segmentation result, and combining the word segmentation result of the address text data into a complete place name if the tail of each word in the matched word segmentation result is matched with the noun preset by the place name rule.
5. The address pruning completion method based on the knowledge-graph and multiple matching according to claim 1, wherein the specific step of constructing the address knowledge-graph by using the database management tool comprises:
and according to address administrative region division data, associating each address entity with an address entity of the upper layer in a deep traversal mode, and associating place names with upper and lower level affiliations.
6. The address pruning completion method based on knowledge-graph and multiple matching according to claim 1, wherein the database management tool employs a Neo4j database.
7. The address trimming and completing method based on the knowledge graph and the multiple matching as claimed in claim 1, wherein the modifying and completing of the address by the corresponding matching rules comprises the following steps:
matching of the previous deletion: detecting that the place name at the upper level of the place name is in a missing state, matching in the constructed address knowledge map according to the place name, searching the place name at the upper level and completing;
full deletion matching of the foregoing in case of duplication: detecting the place name duplicate and the preceding-stage place name in a missing state, constructing a relation pair combining the place name and the next-stage place name, matching in the constructed address knowledge map, finding the preceding-stage place name and completing;
the previous adjacent deletions match in the case of a rename: detecting the place name duplicate name and the adjacent upper-level place name in a missing state, constructing a relation pair combining the place name and the upper-level place name, matching in the constructed address knowledge map library, finding out the preceding-level place name and completing;
and (3) modifying and matching the old name and the alias: and detecting that the place name is an old name or an alias, matching in the constructed address knowledge graph according to the old name or the alias, finding the corresponding existing place name, and correcting.
8. A system for address pruning and completion based on knowledge-graph and multiple matching, comprising: the system comprises an address text word segmentation and preliminary matching module, an address knowledge graph construction module and a matching model construction module;
the address text word segmentation and preliminary matching module is used for performing word segmentation and preliminary matching on the address text, performing word segmentation operation on input address text data by adopting a word segmentation tool, constructing an address noun dictionary for performing matching word segmentation, and performing matching recombination according to a place name rule;
the address knowledge map building module is used for building an address knowledge map, obtaining address administrative district division data, building the address knowledge map by adopting a database management tool, obtaining old names or alias information of place names, and building and associating the corresponding place names of the old names or the alias information of the place names in the built address knowledge map;
the matching model building module is used for building various matching models based on the address knowledge graph, building various matching rules for matching according to the characteristics of address composition, and correcting and completing the addresses by adopting the corresponding matching rules, wherein the matching rules comprise a preceding missing matching rule, a preceding full missing matching rule under the condition of renaming, a preceding adjacent missing matching rule under the condition of renaming and an old name and alias correction matching rule.
9. A storage medium storing a program, wherein the program when executed by a processor implements the address pruning completion method based on the knowledge-graph and multiple matching according to any one of claims 1 to 7.
10. A computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the method of address pruning completion based on knowledge-graph and multiple matching according to any of claims 1-7.
CN202011361104.7A 2020-11-27 2020-11-27 Address finishing and complementing method based on knowledge graph and multiple matching and application Pending CN112528174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011361104.7A CN112528174A (en) 2020-11-27 2020-11-27 Address finishing and complementing method based on knowledge graph and multiple matching and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011361104.7A CN112528174A (en) 2020-11-27 2020-11-27 Address finishing and complementing method based on knowledge graph and multiple matching and application

Publications (1)

Publication Number Publication Date
CN112528174A true CN112528174A (en) 2021-03-19

Family

ID=74994429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011361104.7A Pending CN112528174A (en) 2020-11-27 2020-11-27 Address finishing and complementing method based on knowledge graph and multiple matching and application

Country Status (1)

Country Link
CN (1) CN112528174A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204613A (en) * 2021-04-26 2021-08-03 北京百度网讯科技有限公司 Address generation method, device, equipment and storage medium
CN113515677A (en) * 2021-07-22 2021-10-19 中移(杭州)信息技术有限公司 Address matching method and device and computer readable storage medium
CN113935293A (en) * 2021-12-16 2022-01-14 湖南四方天箭信息科技有限公司 Address splitting and complementing method and device, computer equipment and storage medium
CN114491089A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Address acquisition method, device, electronic equipment and medium
CN116501897A (en) * 2023-06-29 2023-07-28 中科聚信信息技术(北京)有限公司 Method for constructing knowledge graph based on fuzzy matching
CN117874214A (en) * 2024-03-12 2024-04-12 长威信息科技发展股份有限公司 Method and equipment for standardized management and dynamic matching of address information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440312A (en) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 System and terminal for inquiring zip code for mailing address
US20140310255A1 (en) * 2013-04-16 2014-10-16 Google Inc. Search suggestion and display environment
CN104679867A (en) * 2015-03-05 2015-06-03 深圳市华傲数据技术有限公司 Address knowledge processing method and device based on graphs
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer
CN110609902A (en) * 2018-05-28 2019-12-24 华为技术有限公司 Text processing method and device based on fusion knowledge graph
CN111144117A (en) * 2019-12-26 2020-05-12 同济大学 Knowledge graph Chinese address disambiguation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310255A1 (en) * 2013-04-16 2014-10-16 Google Inc. Search suggestion and display environment
CN103440312A (en) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 System and terminal for inquiring zip code for mailing address
CN104679867A (en) * 2015-03-05 2015-06-03 深圳市华傲数据技术有限公司 Address knowledge processing method and device based on graphs
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer
CN110609902A (en) * 2018-05-28 2019-12-24 华为技术有限公司 Text processing method and device based on fusion knowledge graph
CN111144117A (en) * 2019-12-26 2020-05-12 同济大学 Knowledge graph Chinese address disambiguation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHRISTOPHER NITTA 等: "Addressing System-Level Trimming Issues in On-Chip Nanophotonic Networks", 《2011 IEEE 17TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE》, 15 April 2011 (2011-04-15) *
高泽璞;赵云;余伊兰;罗永建;徐紫薇;张莲梅;: "基于知识图谱的低压配电网拓扑结构辨识方法", 电力系统保护与控制, no. 02 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204613A (en) * 2021-04-26 2021-08-03 北京百度网讯科技有限公司 Address generation method, device, equipment and storage medium
CN113204613B (en) * 2021-04-26 2022-05-03 北京百度网讯科技有限公司 Address generation method, device, equipment and storage medium
CN113515677A (en) * 2021-07-22 2021-10-19 中移(杭州)信息技术有限公司 Address matching method and device and computer readable storage medium
CN113515677B (en) * 2021-07-22 2023-10-27 中移(杭州)信息技术有限公司 Address matching method, device and computer readable storage medium
CN113935293A (en) * 2021-12-16 2022-01-14 湖南四方天箭信息科技有限公司 Address splitting and complementing method and device, computer equipment and storage medium
CN113935293B (en) * 2021-12-16 2022-03-22 湖南四方天箭信息科技有限公司 Address splitting and complementing method and device, computer equipment and storage medium
CN114491089A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Address acquisition method, device, electronic equipment and medium
CN114491089B (en) * 2022-01-28 2023-08-29 北京百度网讯科技有限公司 Address acquisition method, address acquisition device, electronic equipment and medium
CN116501897A (en) * 2023-06-29 2023-07-28 中科聚信信息技术(北京)有限公司 Method for constructing knowledge graph based on fuzzy matching
CN116501897B (en) * 2023-06-29 2024-04-02 中科聚信信息技术(北京)有限公司 Method for constructing knowledge graph based on fuzzy matching
CN117874214A (en) * 2024-03-12 2024-04-12 长威信息科技发展股份有限公司 Method and equipment for standardized management and dynamic matching of address information

Similar Documents

Publication Publication Date Title
CN112528174A (en) Address finishing and complementing method based on knowledge graph and multiple matching and application
CN107656913B (en) Map interest point address extraction method, map interest point address extraction device, server and storage medium
US6816779B2 (en) Programmatically computing street intersections using street geometry
CN107145577A (en) Address standardization method, device, storage medium and computer
CN103186524B (en) A kind of place name identification method and apparatus
CN104866593A (en) Database searching method based on knowledge graph
CN108369582B (en) Address error correction method and terminal
CN110909170B (en) Interest point knowledge graph construction method and device, electronic equipment and storage medium
CN104657439A (en) Generation system and method for structured query sentence used for precise retrieval of natural language
CN104657440A (en) Structured query statement generating system and method
CN112612863B (en) Address matching method and system based on Chinese word segmentation device
CN111291277A (en) Address standardization method based on semantic recognition and high-level language search
CN109933797A (en) Geocoding and system based on Jieba participle and address dictionary
CN108228825A (en) A kind of station address data cleaning method based on participle
CN110990520B (en) Address coding method and device, electronic equipment and storage medium
US10810258B1 (en) Efficient graph tree based address autocomplete and autocorrection
US6658356B2 (en) Programmatically deriving street geometry from address data
WO2022100154A1 (en) Artificial intelligence-based address standardization method and apparatus, device and storage medium
CN114201480A (en) Multi-source POI fusion method and device based on NLP technology and readable storage medium
CN112307169A (en) Address data matching method and device, computer equipment and storage medium
US10949465B1 (en) Efficient graph tree based address autocomplete and autocorrection
CN116303854A (en) Positioning method and device based on address knowledge graph
CN116501834A (en) Address information processing method and device, mobile terminal and storage medium
CN114003812A (en) Address matching method, system, device and storage medium
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination