CN112163070A - Location name matching method and device, electronic equipment and machine-readable storage medium - Google Patents

Location name matching method and device, electronic equipment and machine-readable storage medium Download PDF

Info

Publication number
CN112163070A
CN112163070A CN202011033886.1A CN202011033886A CN112163070A CN 112163070 A CN112163070 A CN 112163070A CN 202011033886 A CN202011033886 A CN 202011033886A CN 112163070 A CN112163070 A CN 112163070A
Authority
CN
China
Prior art keywords
place name
name
place
similarity
names
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011033886.1A
Other languages
Chinese (zh)
Other versions
CN112163070B (en
Inventor
周紫维
张凌
周红昭
黄跃东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision System Technology Co Ltd
Original Assignee
Hangzhou Hikvision System Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision System Technology Co Ltd filed Critical Hangzhou Hikvision System Technology Co Ltd
Priority to CN202011033886.1A priority Critical patent/CN112163070B/en
Publication of CN112163070A publication Critical patent/CN112163070A/en
Application granted granted Critical
Publication of CN112163070B publication Critical patent/CN112163070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a place name matching method, a place name matching device, electronic equipment and a machine-readable storage medium, wherein the method comprises the following steps: determining a first similarity of the first place name and the second place name; exchanging a first special name in the first place name with a second special name in the second place name to obtain a third place name and a fourth place name; calling a map service based on the third place name and the fourth place name respectively to obtain a fifth place name matched with the third place name and a sixth place name matched with the fourth place name; determining a second similarity of the fifth place name and the sixth place name; and matching the first place name and the second place name based on the first similarity and the second similarity. The method can improve the accuracy of place name matching.

Description

Location name matching method and device, electronic equipment and machine-readable storage medium
Technical Field
The present application relates to the field of data mining technologies, and in particular, to a method and an apparatus for matching names of places, an electronic device, and a machine-readable storage medium.
Background
With the increasing amount of data, how to effectively merge data becomes a popular research direction.
Taking place name data as an example, the current place name data merging scheme mainly performs place name matching in a manual mode and merges place name data based on a matching result, so that the realization efficiency is low.
Disclosure of Invention
In view of the above, the present application provides a method and an apparatus for matching a place name, an electronic device, and a machine-readable storage medium.
According to a first aspect of embodiments of the present application, there is provided a location name matching method, including:
determining a first similarity of the first place name and the second place name;
exchanging a first special name in the first place name with a second special name in the second place name to obtain a third place name and a fourth place name;
calling a map service based on the third place name and the fourth place name respectively to obtain a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
determining a second similarity of the fifth place name and the sixth place name;
and matching the first place name and the second place name based on the first similarity and the second similarity.
According to a second aspect of embodiments of the present application, there is provided a place name matching apparatus, including:
a determining unit, configured to determine a first similarity between the first place name and the second place name;
the swap unit is used for swapping a first special name in the first place name with a second special name in the second place name to obtain a third place name and a fourth place name;
the obtaining unit is used for calling a map service based on the third place name and the fourth place name respectively, and obtaining a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
the determining unit is further configured to determine a second similarity between the fifth place name and the sixth place name;
a matching unit, configured to match the first place name and the second place name based on the first similarity and the second similarity.
According to a third aspect of the embodiments of the present application, there is provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and a processor for implementing the location name matching method of the first aspect when executing the program stored in the memory.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored therein a computer program, which when executed by a processor, implements the location name matching method of the first aspect.
According to the place name matching method, on one hand, first similarity of a first place name and a second place name is determined; on the other hand, the first place name in the first place name and the second place name in the second place name are exchanged to obtain a third place name and a fourth place name, the map service is called based on the third place name and the fourth place name respectively, a fifth place name matched with the third place name and a sixth place name matched with the fourth place name are obtained, the second similarity of the fifth place name and the sixth place name is determined, the first place name and the second place name are matched based on the first similarity and the second similarity, and the place name data are expanded, the information utilization rate is improved, and the accuracy of place name matching is improved.
Drawings
Fig. 1 is a schematic flow chart diagram illustrating a location name matching method according to an exemplary embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating matching of a first place name and a second place name according to an exemplary embodiment of the present application;
FIG. 3 is a schematic flow chart diagram illustrating a location name matching method in accordance with an exemplary embodiment of the present application;
fig. 4 is a schematic structural diagram of a location name matching apparatus according to an exemplary embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make those skilled in the art better understand the technical solutions provided in the embodiments of the present application, a brief description will be given below of some technical terms involved in the embodiments of the present application.
A universal name dictionary: the general name dictionary refers to a dictionary list with general purpose, and specific general names have different directions for different applications.
In the embodiment of the present application, the universal name dictionary may include a common universal name dictionary and a domain universal name dictionary. Wherein:
common name dictionary: for the calculation of place name matching, a common universal name dictionary comprises province, city, county, street communities and the like corresponding to administrative districts, corresponding materials are downloaded from the internet, and abbreviations related to place names of the province, city, county, street communities and the like are correspondingly arranged.
Domain generic name dictionary: the place names are matched according to place classification, corresponding place classification needs to be matched with the corresponding place names, dictionaries of different place types are respectively established according to different application industries, for example, an internet bar internet cafe belongs to an entertainment place, and a hospital belongs to a medical place, so that the internet bar or the internet cafe belongs to common names in a common name dictionary of the entertainment place, and a field common name dictionary is obtained by summarizing the different place types of the corresponding industries and the corresponding field common names so as to assist in place name matching.
And (4) site classification: the method refers to the classification of places established based on established angles, such as supermarkets, farmer markets, railway stations, government agencies and the like. The data can be made by combining the national standard, scientific arrangement and the like.
Word segmentation: in particular, Chinese Word Segmentation refers to a process of segmenting a Chinese character sequence into individual words, and Word Segmentation refers to a process of recombining continuous character sequences into Word sequences according to a certain specification. For example, i love beijing tiananmen, the result of word segmentation is me/love/beijing/tiananmen. Common Chinese word segmentation tools include jieba, snowNLP and the like.
Text similarity: the method refers to the similarity between two texts, and has wide application in the fields of search engines, recommendation systems, paper identification, machine translation, automatic response, named entity recognition, spelling correction and the like.
In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic flow chart of a place name matching method provided in an embodiment of the present application is shown in fig. 1, where the place name matching method may include the following steps:
and step S100, determining a first similarity of the first place name and the second place name.
In the embodiment of the present application, the first place name and the second place name do not refer to two fixed place names, but may refer to any two place names that need to be matched.
For example, the first place name and the second place name may be place names from two different data sources.
When the acquired place name (including the first place name or the second place name) is acquired from the data source, information such as an administrative division and a place classification corresponding to the place name may be acquired.
For example, when the first place name and the second place name need to be matched, on one hand, the similarity (referred to as the first similarity herein) between the first place name and the second place name can be determined according to the existing strategy.
For example, the similarity of the first place name and the second place name may be determined based on a text similarity calculation.
And step S110, exchanging the first special name in the first place name with the second special name in the second place name to obtain a third place name and a fourth place name.
In the embodiment of the application, the universal name in the place name is identified through the universal name dictionary, and the main words which cannot be identified are regarded as the special name in the place name.
For example, in order to identify a common name and a specific name in a place name, a corresponding common name dictionary (commonly used including province, city, county and district streets (may be referred to as common names) and the like, and common names of places (may be referred to as field common names) such as internet cafes or cafes, gas stations, hospitals and the like may be added according to different place types) may be created in advance, and for any place name, the created common name dictionary may be queried based on the place name, and a main word included in the place name and not belonging to the common name may be determined as the specific name.
Considering that two place names with higher similarity are usually combined, if the two place names can be combined, the difference between the two place names may be caused by the place name non-specification in the place name source library, i.e. the description manner may be different for the specific names corresponding to the same object in different place name source libraries.
For example, for Haicawei, some of the place name source libraries may be directly described as Haicawei, and some of the place name source libraries may be described as Haicang or Haicangsan.
In this case, after the two place names are subjected to the name exchange, the obtained similarity between the two new place names is usually higher.
Conversely, for two place names with lower similarity, if the two place names cannot be merged, the similarity between the two new place names obtained after the two place names are subjected to the special name exchange is also lower in general.
Therefore, through the special name swapping, the place name data can be amplified, the utilization rate of information is improved, and the accuracy of place name matching is improved.
In the embodiment of the application, the created universal name dictionary may be queried based on the first place name and the second place name respectively, the specific names in the first place name and the second place name are determined, and the specific name in the first place name (referred to as the first specific name herein) and the specific name in the second place name (referred to as the second specific name herein) are swapped to obtain the third place name and the fourth place name after the specific names are replaced, so that the place name data expansion is realized, the information utilization rate is improved, and the accuracy of the result of the data comparison is improved.
For example, assuming that the first place name includes a plurality of common names + specific names Z1, and the second place name includes a plurality of common names + specific names Z2, the third place name obtained by performing the exclusive swap may be a plurality of common names + specific names Z2 included in the first place name, and the fourth place name may be a plurality of common names + specific names Z1 included in the second place name.
In one example, when a plurality of proper names exist in the first place name, the longest proper name among the plurality of proper names is determined as the first proper name;
when a plurality of proper names exist in the second place name, the longest proper name among the plurality of proper names is determined as the second proper name.
For example, considering that there may be a plurality of place names, in this case, the longest place name (which may be called a main-line place name) among the place names may be determined as the place name to be swapped.
Accordingly, when a plurality of proper names exist in the first place name, the longest proper name among the plurality of proper names may be determined as the first proper name.
Similarly, when a plurality of proper names exist in the second place name, the longest proper name among the plurality of proper names may be determined as the second proper name.
And step S120, calling a map service based on the third place name and the fourth place name respectively, and acquiring a fifth place name matched with the third place name and a sixth place name matched with the fourth place name.
In the embodiment of the application, it is considered that the third place name and the fourth place name obtained by performing the name-specific conversion on the first place name and the second place name may not be standard and do not belong to a standard address, and further the accuracy of place name matching may be affected.
Therefore, after the third place name and the fourth place name are obtained by performing the private name exchange on the first place name and the second place name, the map service may be invoked to acquire the standard address (referred to as the fifth place name herein) matching the third place name and the standard address (referred to as the sixth place name herein) matching the fourth place name, based on the third place name and the fourth place name, respectively.
In one example, the step S120 of calling the map service based on the third place name and the fourth place name respectively to obtain a fifth place name matching the third place name and a sixth place name matching the fourth place name may include:
calling a map service based on the third place name and the administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
and the number of the first and second groups,
and calling a map service based on the fourth place name and the administrative district corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative district corresponding to the fourth place name as search keywords to obtain a sixth place name.
And step S130, determining a second similarity of the fifth place name and the sixth place name.
In the embodiment of the present application, when the fifth place name and the sixth place name are obtained in the manner described in step S110 to step S120, the similarity (referred to as a second similarity herein) between the fifth place name and the sixth place name may be determined.
For example, the similarity between the fifth place name and the sixth place name (i.e., the second similarity) may be determined based on the text similarity calculation.
And step S140, matching the first place name and the second place name based on the first similarity and the second similarity.
In this embodiment of the application, the first place name and the second place name may be matched based on a first similarity between the first place name and the second place name, and a second similarity between a fifth place name obtained by converting the first place name and a sixth place name obtained by converting the second place name.
It can be seen that, in the method flow shown in fig. 1, data amplification is realized by performing the alias swap on two place names to be matched, and the two place names to be matched are matched based on the similarity between the original place names and the similarity between the standard addresses obtained after the alias swap, so that automatic matching of the place names is realized, and the place name matching efficiency is improved.
In some embodiments, as shown in fig. 2, in the step S140, matching the first place name and the second place name based on the first similarity and the second similarity may be implemented by:
and step S141, determining the weighted sum of the first similarity and the second similarity based on a preset weight.
And step S142, matching the first place name and the second place name based on the weighted sum.
For example, the weights of the first similarity and the second similarity may be preset (may be set according to an actual scene, and the value may be an empirical value or an experimental value).
When the first similarity of the first place name and the second similarity of the fifth place name and the sixth place name are determined in the above manner, the first similarity and the second similarity may be weighted based on a preset threshold value to obtain a weighted sum, and the first place name and the second place name may be matched based on the weighted sum.
For example, S1 w1+ S2 w 2; s1 is the first similarity, S2 is the second similarity, w1 and w2 are the weights of the preset first similarity and second similarity, respectively, and S is the final similarity, i.e. the weighted sum of the first similarity and the second similarity.
In one example, the matching the first place name and the second place name based on the weighted sum in step S142 may include:
and when the weighted sum is greater than or equal to a preset similarity threshold value, determining that the first place name and the second place name are successfully matched.
For example, when a weighted sum of the first similarity and the second similarity is determined, the weighted sum may be compared with a preset similarity threshold, and when the weighted sum is greater than or equal to the preset similarity threshold, it is determined that the first place name and the second place name are successfully matched.
In the embodiment of the application, when it is determined that the first place name and the second place name are successfully matched, data merging may be performed on the first place name and the second place name.
Based on the processing flow aiming at the first place name and the second place name, the place names from different data sources can be matched respectively, and are merged when the matching is successful, and then a corresponding place name library is established based on the place name matching and merging processing.
In some embodiments, after determining the first similarity between the first place name and the second place name in step S100, the method may further include:
comparing the first similarity with a preset minimum similarity;
and when the first similarity is larger than or equal to the preset minimum similarity, determining to execute the operation of swapping the first special name in the first place name with the second special name in the second place name.
For example, considering that when the similarity between the first place name and the second place name is low, the probability that the first place name and the second place name can be successfully matched is correspondingly low, in this case, in order to improve the place name matching efficiency while ensuring the place name matching accuracy, the minimum similarity may be set in advance (may be set according to an actual scene, and the value may be an empirical value or an experimental value). When the first similarity between the first place name and the second place name is determined in the manner of step S100, the first similarity may be compared with a preset minimum similarity, and when the first similarity is greater than or equal to the preset similarity threshold, the correlation processing in steps S110 to S140 is continuously performed.
For example, when the first similarity is smaller than the preset similarity threshold, it may be determined that the first place name and the second place name are failed to be matched, and the correlation process in the above steps S110 to S140 is no longer performed.
It should be noted that, in this embodiment of the application, when the similarity between the first place name and the second place name is greater than or equal to a preset maximum similarity (which may be set according to an actual scenario, and the value may be an empirical value or an experimental value), it may be determined that the first place name and the second place name are successfully matched, and the related processing in the steps S110 to S140 is not performed, and a specific implementation thereof is not described herein again.
In addition, considering that the similarity of the new place names obtained after the place name replacement is performed is higher for the two place names which can be combined, if the similarity of the new place names obtained after the place name replacement is too low, the probability that the two place names can be combined is usually low, so when the similarity of the place names obtained after the place name replacement is too low, the probability that the two place names to be matched can be combined is correspondingly low.
Accordingly, in the embodiment of the present application, when the second similarity is too low (which may be characterized by setting a threshold point, and is determined to be too low below the threshold), the inverse number of the second similarity may be used to find the final similarity, or the second similarity does not participate in place name matching.
For example, S-S1 w1-S2 w2, or S1 w1, or S1.
In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following describes the technical solutions provided in the embodiments of the present application with reference to specific application scenarios.
In the embodiment, data are merged according to data collected from different sources, including place names, administrative divisions, place classifications and the like, so that a corresponding place name library is constructed.
Illustratively, first, a corresponding universal name dictionary is established (commonly used streets including province, city, county and district, etc., and universal names of places such as internet cafes or internet cafes, gas stations, hospitals, etc. can be added according to different place types).
Secondly, selecting a place name (assuming that the place name is A1 (namely the first place name) and the place name is B1 (namely the second place name)) from different data sources respectively to carry out text similarity calculation to obtain the similarity simi1 (namely the first similarity), and then mutually replacing the special names (except the common names, the special names) in the corresponding A1 and B1, thereby achieving the effect of data amplification.
Then, the replaced place name and the administrative region are combined to respectively call an Application Programming Interface (API) to obtain a corresponding standard address, and calculate a similarity degree simi2 (i.e., the second similarity degree) of the corresponding standard address.
Finally, whether the two place names (A1 and A2) can be matched is judged according to the first similarity and the second similarity, and therefore merging of different data sources is facilitated.
The method can improve the accuracy and effectiveness of data combination, can greatly reduce the labor cost, and has higher application value in big data.
The main technical points of the location name matching scheme provided by the embodiment of the present application are described below with reference to a location name matching flowchart shown in fig. 3.
In this embodiment, the place name matching may include the following technical points: 1. establishing a universal name dictionary; 2. calculating the place name similarity simi 1; 3. calculating the place name similarity simi 2; 4. and calculating a matching result.
The following describes each main technical point.
1. Establishing a universal name dictionary:
1.1, establishing a common general name dictionary: for the calculation of place name matching, a common universal name dictionary comprises province, city, county, street communities and the like corresponding to administrative districts, corresponding materials are downloaded from the internet, and abbreviations related to place names of the province, city, county, street communities and the like are organized correspondingly (the situation that the place name matching accuracy is reduced due to the fact that place names in a place name source library are not standard is avoided).
1.2, establishing a domain universal name dictionary: the place names are matched according to place classification, corresponding place classification needs to be matched with the corresponding place names, dictionaries of different place types are respectively established according to different application industries, for example, an internet bar internet cafe belongs to an entertainment place, and a hospital belongs to a medical place, so that the internet bar or the internet cafe belongs to common names in a common name dictionary of the entertainment place, and a field common name dictionary is obtained by summarizing the different place types of the corresponding industries and the corresponding field common names so as to assist in place name matching.
2. Calculating the similarity simi 1:
for place names a1 and B1 from different data sources, data cleansing is performed first: removing corresponding special symbols and the like; then, word segmentation is performed, and text similarity calculation is performed, thereby determining place name similarity simi1 of place names a1 and B1.
3. Calculating the similarity simi 2:
3.1, amplifying geographical name data: the special names in the place names A1 and B1 are exchanged to achieve the aim of data amplification.
Illustratively, when the place name similarity simi1 is calculated, the information in the place names a1 and B1 is fully utilized, but the data information is less in the data comparison process, so that the utilization rate of the information can be effectively improved by adopting a data amplification method, and the result of the data comparison is improved.
The process of exchanging the proper names is to recognize the corresponding proper names through the proper name dictionary, regard the stem words which can not be recognized as the corresponding proper names, and replace the stem proper names (such as the longest proper names) to obtain the corresponding replaced place names A2 (i.e. the third place name) and B2 (i.e. the fourth place name).
3.2, acquiring a place name standard address: and calling the map service to acquire corresponding standard addresses A3 (namely the fifth place name) and B3 (namely the sixth place name) according to the replaced place names A2 and B2 and in combination with the administrative divisions of the corresponding place names.
3.3, place name similarity simi 2: and acquiring the corresponding place name similarity sim 2 by utilizing a method of word segmentation and text similarity calculation according to the place names A3 and B3 acquired above.
4. Computation of matching results
And (3) performing combined calculation (such as weighted sum calculation) on the corresponding place name similarity simi1 and place name similarity simi2, comparing the result with a preset threshold (namely the similarity threshold), and judging whether the place names A1 and B1 are successfully matched based on the comparison result, so that the purpose of data combination is achieved.
In this embodiment, when the matching of the place names a1 and B1 is successful, data merging processing may be performed, and further, a corresponding place name library may be established based on the merged data.
Therefore, in the embodiment, the place names are matched based on the natural language processing and machine learning method, compared with a manual matching mode, the labor cost can be greatly saved, and the corresponding information is fully utilized through data amplification.
The methods provided herein are described above. The following describes the apparatus provided in the present application:
referring to fig. 4, a schematic structural diagram of a place name matching device provided in an embodiment of the present application is shown in fig. 4, where the place name matching device may include:
a determining unit 410, configured to determine a first similarity between the first place name and the second place name;
a swap unit 420, configured to swap a first place name in the first place names with a second place name in the second place names to obtain a third place name and a fourth place name;
an obtaining unit 430, configured to invoke a map service based on the third place name and the fourth place name, respectively, and obtain a fifth place name matching the third place name and a sixth place name matching the fourth place name;
the determining unit 410 is further configured to determine a second similarity between the fifth place name and the sixth place name;
a matching unit 440, configured to match the first place name and the second place name based on the first similarity and the second similarity.
In some embodiments, when a plurality of proper names exist in the first place name, determining the longest proper name among the plurality of proper names as the first proper name;
when a plurality of proper names exist in the second place name, the longest proper name in the plurality of proper names is determined as the second proper name.
In some embodiments, the obtaining unit 430 calls a map service based on the third place name and the fourth place name, respectively, and obtains a fifth place name matching the third place name and a sixth place name matching the fourth place name, including:
calling a map service based on the third place name and the administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
and the number of the first and second groups,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain the sixth place name.
In some embodiments, the matching unit 440 matches the first place name and the second place name based on the first similarity and the second similarity, including:
determining a weighted sum of the first similarity and the second similarity based on a preset weight;
matching the first place name and the second place name based on the weighted sum.
In some embodiments, the matching unit 440 matches the first place name and the second place name based on the weighted sum, including:
and when the weighted sum is greater than or equal to a preset similarity threshold value, determining that the first place name and the second place name are successfully matched.
In some embodiments, the swapping unit 420 further includes, after the determining unit 410 determines the first similarity between the first place name and the second place name:
comparing the first similarity with a preset minimum similarity;
and when the first similarity is greater than or equal to the preset minimum similarity, exchanging a first special name in the first place name with a second special name in the second place name.
In some embodiments, before the swapping unit 420 swaps the first specific name in the first place name with the second specific name in the second place name, the swapping unit further includes:
creating a universal name dictionary, wherein the universal name dictionary is used for recording universal names in place names, and the universal names comprise common universal names and field universal names;
querying the common name dictionary to determine a proper name of the first place name and the second place name based on the first place name and the second place name, respectively; wherein, the special names are main words which are not in the common names and are included in the place names.
Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure. The electronic device may include a processor 501, a communication interface 502, a memory 503, and a communication bus 504. The processor 501, the communication interface 502 and the memory 503 are in communication with each other via a communication bus 504. Wherein, the memory 503 stores a computer program; the processor 501 may execute the above-described place name matching method by executing the program stored on the memory 503.
The memory 503 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the memory 502 may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
Embodiments of the present application also provide a machine-readable storage medium, such as the memory 503 in fig. 5, storing a computer program, which can be executed by the processor 501 in the electronic device shown in fig. 5 to implement the location name matching method described above.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (10)

1. A method for matching names of places, comprising:
determining a first similarity of the first place name and the second place name;
exchanging a first special name in the first place name with a second special name in the second place name to obtain a third place name and a fourth place name;
calling a map service based on the third place name and the fourth place name respectively to obtain a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
determining a second similarity of the fifth place name and the sixth place name;
and matching the first place name and the second place name based on the first similarity and the second similarity.
2. The method according to claim 1, wherein when a plurality of proper names exist in the first place name, the longest proper name among the plurality of proper names is determined as the first proper name;
when a plurality of proper names exist in the second place name, the longest proper name in the plurality of proper names is determined as the second proper name.
3. The method of claim 1, wherein the invoking a map service based on the third place name and the fourth place name, respectively, and obtaining a fifth place name matching the third place name and a sixth place name matching the fourth place name comprises:
calling a map service based on the third place name and the administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
and the number of the first and second groups,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain the sixth place name.
4. The method according to any one of claims 1-3, wherein after determining the first similarity between the first place name and the second place name, further comprising:
comparing the first similarity with a preset minimum similarity;
and when the first similarity is greater than or equal to the preset minimum similarity, determining to execute the operation of swapping the first special name in the first place name with the second special name in the second place name.
5. The method of any of claims 1-3, wherein prior to swapping a first one of the first place names with a second one of the second place names, further comprising:
creating a universal name dictionary, wherein the universal name dictionary is used for recording universal names in place names, and the universal names comprise common universal names and field universal names;
querying the common name dictionary to determine a proper name of the first place name and the second place name based on the first place name and the second place name, respectively; wherein, the special names are main words which are not in the common names and are included in the place names.
6. A place name matching apparatus, comprising:
a determining unit, configured to determine a first similarity between the first place name and the second place name;
the swap unit is used for swapping a first special name in the first place name with a second special name in the second place name to obtain a third place name and a fourth place name;
the obtaining unit is used for calling a map service based on the third place name and the fourth place name respectively, and obtaining a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
the determining unit is further configured to determine a second similarity between the fifth place name and the sixth place name;
a matching unit, configured to match the first place name and the second place name based on the first similarity and the second similarity.
7. The apparatus according to claim 6, wherein when a plurality of proper names exist in the first place name, the longest proper name among the plurality of proper names is determined as the first proper name;
when a plurality of proper names exist in the second place name, the longest proper name in the plurality of proper names is determined as the second proper name.
8. The apparatus according to claim 6, wherein the acquiring unit calls a map service based on the third place name and the fourth place name, respectively, and acquires a fifth place name matching the third place name and a sixth place name matching the fourth place name, and includes:
calling a map service based on the third place name and the administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
and the number of the first and second groups,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain the sixth place name.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of any one of claims 1 to 5 when executing a program stored in the memory.
10. A machine readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1-5.
CN202011033886.1A 2020-09-27 2020-09-27 Place name matching method, place name matching device, electronic equipment and machine-readable storage medium Active CN112163070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011033886.1A CN112163070B (en) 2020-09-27 2020-09-27 Place name matching method, place name matching device, electronic equipment and machine-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011033886.1A CN112163070B (en) 2020-09-27 2020-09-27 Place name matching method, place name matching device, electronic equipment and machine-readable storage medium

Publications (2)

Publication Number Publication Date
CN112163070A true CN112163070A (en) 2021-01-01
CN112163070B CN112163070B (en) 2024-02-27

Family

ID=73861321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011033886.1A Active CN112163070B (en) 2020-09-27 2020-09-27 Place name matching method, place name matching device, electronic equipment and machine-readable storage medium

Country Status (1)

Country Link
CN (1) CN112163070B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106264A (en) * 2013-01-29 2013-05-15 河南理工大学 Matching method and matching device of place names
CN109145095A (en) * 2017-06-16 2019-01-04 贵州小爱机器人科技有限公司 Information of place names matching process, information matching method, device and computer equipment
CN110399448A (en) * 2019-07-31 2019-11-01 浪潮软件集团有限公司 Chinese Place Names address searching matching process, terminal, computer readable storage medium
CN110442603A (en) * 2019-07-03 2019-11-12 平安科技(深圳)有限公司 Address matching method, apparatus, computer equipment and storage medium
CN111639493A (en) * 2020-05-22 2020-09-08 上海微盟企业发展有限公司 Address information standardization method, device, equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106264A (en) * 2013-01-29 2013-05-15 河南理工大学 Matching method and matching device of place names
CN109145095A (en) * 2017-06-16 2019-01-04 贵州小爱机器人科技有限公司 Information of place names matching process, information matching method, device and computer equipment
CN110442603A (en) * 2019-07-03 2019-11-12 平安科技(深圳)有限公司 Address matching method, apparatus, computer equipment and storage medium
CN110399448A (en) * 2019-07-31 2019-11-01 浪潮软件集团有限公司 Chinese Place Names address searching matching process, terminal, computer readable storage medium
CN111639493A (en) * 2020-05-22 2020-09-08 上海微盟企业发展有限公司 Address information standardization method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN112163070B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN109766341B (en) Method, device and storage medium for establishing Hash mapping
US8452106B2 (en) Partition min-hash for partial-duplicate image determination
WO2017215370A1 (en) Method and apparatus for constructing decision model, computer device and storage device
CN109033249B (en) Information extraction method and device for structured document in field of public inspection method and storage medium
CN106033416A (en) A string processing method and device
CN108153824B (en) Method and device for determining target user group
CN108733810B (en) Address data matching method and device
JP2018194919A (en) Learning program, learning method and learning device
WO2013138441A1 (en) Systems, methods, and software for computing reachability in large graphs
CN114610951A (en) Data processing method and device, electronic equipment and readable storage medium
CN109783589B (en) Method, device and storage medium for resolving address of electronic map
CN105005567B (en) Interest point query method and system
US10547536B2 (en) Identifying shortest paths
CN111241350B (en) Graph data query method, device, computer equipment and storage medium
CN110427496B (en) Knowledge graph expansion method and device for text processing
CN112163070A (en) Location name matching method and device, electronic equipment and machine-readable storage medium
US20170344628A1 (en) Methods and systems for identifying local search queries
CN114169331A (en) Address resolution method, device, computer equipment and storage medium
CN110019253B (en) Distributed graph data sequence sampling method and device
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address
CN112287671A (en) Simhash-based address resolution method and system
CN113065354A (en) Method for identifying geographic position in corpus and related equipment thereof
CN111310419B (en) Method and device for updating word rewriting candidate set
JP5583107B2 (en) Keyword place name pair extraction apparatus, method, and program
US9189488B2 (en) Determination of landmarks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant