CN112163070B - Place name matching method, place name matching device, electronic equipment and machine-readable storage medium - Google Patents
Place name matching method, place name matching device, electronic equipment and machine-readable storage medium Download PDFInfo
- Publication number
- CN112163070B CN112163070B CN202011033886.1A CN202011033886A CN112163070B CN 112163070 B CN112163070 B CN 112163070B CN 202011033886 A CN202011033886 A CN 202011033886A CN 112163070 B CN112163070 B CN 112163070B
- Authority
- CN
- China
- Prior art keywords
- place name
- name
- place
- private
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004891 communication Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 description 8
- 230000003321 amplification Effects 0.000 description 7
- 238000003199 nucleic acid amplification method Methods 0.000 description 7
- 230000011218 segmentation Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000013215 result calculation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a place name matching method, a place name matching device, electronic equipment and a machine-readable storage medium, wherein the method comprises the following steps: determining a first similarity of the first place name and the second place name; exchanging a first private name in the first place name with a second private name in the second place name to obtain a third place name and a fourth place name; calling a map service based on the third place name and the fourth place name respectively, and acquiring a fifth place name matched with the third place name and a sixth place name matched with the fourth place name; determining a second similarity of the fifth place name and the sixth place name; and matching the first place name and the second place name based on the first similarity and the second similarity. The method can improve the accuracy of place name matching.
Description
Technical Field
The present disclosure relates to the field of data mining technologies, and in particular, to a method and apparatus for matching place names, an electronic device, and a machine-readable storage medium.
Background
As the amount of data increases, how to effectively merge data becomes a popular research direction.
Taking place name data as an example, the current place name data merging scheme mainly carries out place name matching in a manual mode, merges the place name data based on a matching result, and is low in implementation efficiency.
Disclosure of Invention
In view of this, the present application provides a place name matching method, apparatus, electronic device, and machine-readable storage medium.
According to a first aspect of an embodiment of the present application, there is provided a place name matching method, including:
determining a first similarity of the first place name and the second place name;
exchanging a first private name in the first place name with a second private name in the second place name to obtain a third place name and a fourth place name;
calling a map service based on the third place name and the fourth place name respectively, and acquiring a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
determining a second similarity of the fifth place name and the sixth place name;
and matching the first place name and the second place name based on the first similarity and the second similarity.
According to a second aspect of embodiments of the present application, there is provided a place name matching device, including:
a determining unit configured to determine a first similarity of the first place name and the second place name;
the exchange unit is used for exchanging the first private name in the first place name with the second private name in the second place name so as to obtain a third place name and a fourth place name;
an obtaining unit, configured to invoke a map service based on the third place name and the fourth place name, respectively, to obtain a fifth place name that matches the third place name, and a sixth place name that matches the fourth place name;
the determining unit is further configured to determine a second similarity between the fifth place name and the sixth place name;
and the matching unit is used for matching the first place name and the second place name based on the first similarity and the second similarity.
According to a third aspect of embodiments of the present application, there is provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the place name matching method of the first aspect when executing the program stored in the memory.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the place name matching method of the first aspect.
According to the place name matching method, on one hand, the first similarity of the first place name and the second place name is determined; on the other hand, the first private name in the first place name and the second private name in the second place name are exchanged to obtain a third place name and a fourth place name, the map service is called based on the third place name and the fourth place name respectively, a fifth place name matched with the third place name and a sixth place name matched with the fourth place name are obtained, the second similarity of the fifth place name and the sixth place name is determined, and further, the first place name and the second place name are matched based on the first similarity and the second similarity, and place name data expansion is achieved through private name exchange, the information utilization rate is improved, and the place name matching accuracy is further improved.
Drawings
FIG. 1 is a flow chart of a place name matching method according to an exemplary embodiment of the present application;
FIG. 2 is a flow diagram illustrating a matching of a first place name and a second place name according to an exemplary embodiment of the present application;
FIG. 3 is a flow chart of a place name matching method according to an exemplary embodiment of the present application;
fig. 4 is a schematic structural diagram of a place name matching device according to an exemplary embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, some technical terms related to the embodiments of the present application are briefly described below.
Generic name dictionary: the generic name dictionary refers to a dictionary list with universality, specific generic names have different orientations for different applications, and the generic name dictionary refers to content related to place names.
In the embodiment of the application, the common name dictionary may include a common name dictionary and a domain common name dictionary. Wherein:
common generic name dictionary: for the calculation of place name matching, common name dictionaries, including province-city-county street communities corresponding to administrative regions, are obtained by downloading corresponding materials from the internet and correspondingly sorting abbreviations related to place names such as province-city-county street communities.
Domain common name dictionary: according to the place classification, place names are matched, corresponding place names are firstly required to be classified, dictionaries of different place types, such as the internet bar and the net coffee belong to entertainment places and the hospital belong to medical places, are respectively established according to different application industries, and therefore the common names in the common name dictionaries of the internet bar or the net coffee belong to the entertainment places are collected to obtain the field common name dictionaries by summarizing the different place types and the corresponding field common names of the corresponding industries so as to assist the place name matching.
Location classification: refers to a classification of places based on a given angle, such as supermarkets, farmer markets, railway sites, government authorities, etc. The data can be formulated by fusing the contents of national standards, scientific construction and the like.
Word segmentation: in particular to Chinese word segmentation (Chinese Word Segmentation), which refers to the process of segmenting a Chinese character sequence into individual words, namely, recombining continuous word sequences into word sequences according to a certain specification. For example, I love Beijing Tiananmen, and the word segmentation results are I/love/Beijing/Tiananmen. Common chinese word segmentation tools are jieba, snowNLP.
Text similarity: the similarity between two texts is widely applied to the fields of search engines, recommendation systems, paper authentication, machine translation, automatic response, named entity recognition, spelling correction and the like.
In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, the following describes the technical solutions of the embodiments of the present application in detail with reference to the accompanying drawings.
Referring to fig. 1, a flow chart of a place name matching method provided in an embodiment of the present application, as shown in fig. 1, the place name matching method may include the following steps:
step S100, determining a first similarity of the first place name and the second place name.
In this embodiment of the present application, the first place name and the second place name do not refer to two fixed place names, but may refer to any two place names that need to be matched.
For example, the first place name and the second place name may be place names from two different data sources.
When the obtained location name (including the first location name or the second location name) is obtained from the data source, information such as administrative division and location classification corresponding to the location name may be obtained.
For example, when the first place name and the second place name need to be matched, on the one hand, the similarity (referred to herein as the first similarity) of the first place name and the second place name may be determined according to an existing policy.
For example, the similarity of the first place name and the second place name may be determined based on a manner of text similarity calculation.
Step S110, the first private name in the first place name is exchanged with the second private name in the second place name, so that a third place name and a fourth place name are obtained.
In the embodiment of the application, the common names in the place names are identified through the common name dictionary, and the trunk words which cannot be identified are regarded as the special names in the place names.
For example, in order to identify the common names and the private names in the place names, corresponding common name dictionaries (commonly used including city and county district streets (may be referred to as common names) and the like may be created in advance, the common names of places (may be referred to as field common names) may also be added according to different place types, such as internet bars or internet cafes, gas stations, hospitals and the like), for any place name, the created common name dictionary may be queried based on the place name, and the trunk words which do not belong to the common names and are included in the place name may be determined as the private names.
Considering that two place names with higher similarity are usually considered, if the two place names can be combined, the difference between the two place names may be caused by the fact that the place names in the place name source library are not standard, that is, the description modes of the special names corresponding to the same object in the different place name source libraries may be different.
For example, for Haikang vision, some of the place name sources may be described directly as Haikang vision, some of the place name sources may be described as Haikang or the three stages of Haikang, etc.
In this case, the similarity between the new two place names obtained after the private name exchange is performed on the two place names is also generally higher.
Conversely, for two place names with low similarity, if the two place names cannot be merged, the similarity between the new two place names obtained after performing private name exchange on the two place names will also be low.
Therefore, the amplification of the place name data can be realized through the exchange of the special names, the utilization rate of the information is improved, and the accuracy rate of place name matching is further improved.
In this embodiment of the present invention, the created common name dictionary may be queried based on the first place name and the second place name, and the private names in the first place name and the second place name may be determined, and the specific private name in the first place name (herein referred to as the first private name) may be replaced with the specific private name in the second place name (herein referred to as the second private name) to obtain a third place name and a fourth place name after the private name is replaced, so as to implement data augmentation of the place names, so as to improve the information utilization rate, thereby improving the accuracy of the result of data comparison.
For example, assuming that the first place name includes a plurality of common names+private names Z1 and the second place name includes a plurality of common names+private names Z2, the third place name obtained by performing the private exchange may be a plurality of common names+private names Z2 included in the first place name, and the fourth place name may be a plurality of common names+private names Z1 included in the second place name.
In one example, when there is a plurality of private names in the first place name, determining a longest private name among the plurality of private names as the first private name;
when there are a plurality of private names in the second place name, the longest private name among the plurality of private names is determined as the second private name.
By way of example, considering that there may be a plurality of private names in place names, in this case, the longest private name (which may be referred to as a backbone private name) among the place names may be determined as the private name for the exchange.
Accordingly, when a plurality of private names exist in the first place name, the longest private name among the plurality of private names may be determined as the first private name.
Similarly, when there are a plurality of private names in the second place name, the longest private name among the plurality of private names may be determined as the second private name.
And step S120, calling a map service based on the third place name and the fourth place name respectively, and acquiring a fifth place name matched with the third place name and a sixth place name matched with the fourth place name.
In the embodiment of the present application, it is considered that the third place name and the fourth place name obtained by performing private name exchange on the first place name and the second place name may not be standard, and not belong to a standard address, so that accuracy of place name matching may be affected.
Therefore, after the third place name and the fourth place name are obtained by performing private name exchange on the first place name and the second place name, it is possible to call the map service based on the third place name and the fourth place name, respectively, to acquire a standard address (hereinafter referred to as a fifth place name) matching the third place name, and a standard address (hereinafter referred to as a sixth place name) matching the fourth place name.
In one example, in step S120, invoking the map service to obtain a fifth location name matching the third location name and a sixth location name matching the fourth location name based on the third location name and the fourth location name, respectively, may include:
calling a map service based on the third place name and administrative regions corresponding to the third place name, and performing address search by taking the third place name and the administrative regions corresponding to the third place name as search keywords to obtain a fifth place name;
the method comprises the steps of,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain a sixth place name.
Step S130, determining a second similarity of the fifth place name and the sixth place name.
In the embodiment of the present application, when the fifth place name and the sixth place name are obtained in the manner described in step S110 to step S120, the similarity (referred to herein as the second similarity) of the fifth place name and the sixth place name may be determined.
For example, the similarity (i.e., the second similarity) of the fifth place name and the sixth place name may be determined based on the manner of text similarity calculation.
Step S140, the first place name and the second place name are matched based on the first similarity and the second similarity.
In this embodiment of the present application, the first place name and the second place name may be matched based on the first similarity between the first place name and the second place name, and the second similarity between the fifth place name obtained by converting the first place name and the sixth place name obtained by converting the second place name.
It can be seen that, in the flow of the method shown in fig. 1, by performing special name exchange on two place names to be matched, data amplification is achieved, and the two place names to be matched are matched based on the similarity between original place names and the similarity between standard addresses obtained after the special name exchange, so that automatic place name matching is achieved, place name matching efficiency is improved, and in addition, through data amplification, the information utilization rate is improved, and therefore the place name matching accuracy is improved.
In some embodiments, as shown in fig. 2, in step S140, matching the first place name and the second place name based on the first similarity and the second similarity may be achieved by:
step S141, determining a weighted sum of the first similarity and the second similarity based on a preset weight.
Step S142, matching the first place name and the second place name based on the weighted sum.
For example, the weights of the first similarity and the second similarity may be set in advance (may be set according to an actual scene, and may be an empirical value or an experimental value).
When the first similarity of the first place name and the second place name, and the second similarity of the fifth place name and the sixth place name are determined in the above manner, the first similarity and the second similarity may be weighted based on a preset threshold value to obtain a weighted sum, and the first place name and the second place name may be matched based on the weighted sum.
For example, s=s1×w1+s2×w2; s1 is a first similarity, S2 is a second similarity, w1 and w2 are weights of the first similarity and the second similarity respectively, and S is a final similarity, namely a weighted sum of the first similarity and the second similarity.
In one example, in step S142, matching the first place name and the second place name based on the weighted sum may include:
and when the weighted sum is greater than or equal to a preset similarity threshold value, determining that the first place name and the second place name are successfully matched.
For example, when a weighted sum of the first similarity and the second similarity is determined, the weighted sum may be compared with a preset similarity threshold, and when the weighted sum is equal to or greater than the preset similarity threshold, the first place name and the second place name are determined to be successfully matched.
In the embodiment of the application, when the first place name and the second place name are determined to be successfully matched, data combination can be performed on the first place name and the second place name.
Based on the processing flow aiming at the first place name and the second place name, the place names from different data sources can be matched respectively, and the place names are combined when the matching is successful, and then, a corresponding place name library is established based on place name matching and combining processing.
In some embodiments, after determining the first similarity between the first place name and the second place name in step S100, the method may further include:
comparing the first similarity with a preset minimum similarity;
and when the first similarity is greater than or equal to the preset minimum similarity, determining to execute the operation of exchanging the first private name in the first place name with the second private name in the second place name.
By way of example, considering that when the similarity of the first place name and the second place name is low, the probability that the first place name and the second place name can be successfully matched is correspondingly low, in this case, in order to improve place name matching efficiency while ensuring place name matching accuracy, the minimum similarity may be preset (may be set according to an actual scene, and the value may be an empirical value or an experimental value). When the first similarity of the first place name and the second place name is determined in the manner of step S100, the first similarity may be compared with a preset minimum similarity, and when the first similarity is equal to or greater than the preset similarity threshold, the related processing in steps S110 to S140 described above may be continued.
For example, when the first similarity is smaller than the preset similarity threshold, it may be determined that the first place name and the second place name fail to match, and the related processing in the above-described steps S110 to S140 is not performed any more.
It should be noted that, in the embodiment of the present application, when the similarity between the first place name and the second place name is greater than or equal to the preset maximum similarity (which may be set according to an actual scene, and the value may be an empirical value or an experimental value), it may be determined that the first place name and the second place name are successfully matched, and the related processing in the steps S110 to S140 is not executed any more, and detailed implementation thereof will not be described herein.
In addition, considering that the similarity of a new place name obtained after the private name substitution is also higher for two place names that can be merged, and if the similarity of a new place name obtained after the private name substitution is too low, the probability that the two place names can be merged is also low in general, and therefore, when the similarity of a place name obtained after the private name substitution is too low, the probability that two place names to be matched can be merged is correspondingly low.
Accordingly, in the embodiment of the present application, when the second similarity is too low (may be characterized by setting a threshold point, below which the determination is too low), the opposite number of the second similarity may be used to find the final similarity, or the second similarity does not participate in the place name matching.
For example, s=s1×w1-S2×w2, or s=s1×w1, or s=s1.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below in connection with specific application scenarios.
In this embodiment, data is combined according to data collected by different sources, including data of place names, administrative division, place classification, and the like, so as to construct a corresponding place name library.
For example, first, a corresponding universal name dictionary (commonly used include city and county streets, etc., and according to different types of places, the universal names of places, such as internet bars or cafes, gas stations, hospitals, etc., can be added) is created.
Secondly, a place name (assumed to be A1 (namely the first place name) and a B1 (namely the second place name)) are respectively selected from different data sources to perform text similarity calculation to obtain similarity simi1 (namely the first similarity), and then private names (the name except the common name is considered to be the private name) in the corresponding A1 and the corresponding B1 are replaced with each other, so that the effect of data amplification is achieved.
Furthermore, the map service API (Application Programming Interface, application program interface) is called to obtain the corresponding standard address by combining the alternative place name and administrative division, and the similarity simi2 (i.e. the second similarity) of the corresponding standard address is calculated.
Finally, judging whether the two place names (A1 and A2) can be matched according to the first similarity and the second similarity, and further helping to combine different data sources.
The method can improve the accuracy and the effectiveness of data merging, can greatly reduce the labor cost, and has higher application value in big data.
The following describes main technical points of the place name matching scheme provided in the embodiment of the present application with reference to a place name matching flowchart shown in fig. 3.
In this embodiment, place name matching may include the following technical points: 1. establishing a common name dictionary; 2. calculating place name similarity simi1; 3. calculating place name similarity simi2; 4. and (5) calculating a matching result.
The following describes the main technical points.
1. Establishing a common name dictionary:
1.1, common general name dictionary establishment: for the calculation of place name matching, common name dictionaries, including province-city-county street communities corresponding to administrative regions, are also sorted by downloading corresponding materials from the internet and correspondingly sorting abbreviations related to place names, such as province-city-county street communities, and the like (avoiding the reduction of place name matching accuracy caused by the fact that place names in a place name source library are not standard).
1.2, building a domain common name dictionary: according to the place classification, place names are matched, corresponding place names are firstly required to be classified, dictionaries of different place types, such as the internet bar and the net coffee belong to entertainment places and the hospital belong to medical places, are respectively established according to different application industries, and therefore the common names in the common name dictionaries of the internet bar or the net coffee belong to the entertainment places are collected to obtain the field common name dictionaries by summarizing the different place types and the corresponding field common names of the corresponding industries so as to assist the place name matching.
2. Calculating similarity simi1:
for place names A1 and B1 from different data sources, data cleansing is first performed: removing the corresponding special symbol, etc.; then, word segmentation is performed, and text similarity calculation is performed, so that the place name similarity simi1 of the place names A1 and B1 is determined.
3. Calculating similarity simi2:
3.1, amplifying place name data: the special names in the place names A1 and B1 are exchanged first, so that the purpose of data amplification is achieved.
For example, when the place name similarity simi1 is calculated, the information in the place names A1 and B1 is fully utilized, but because the data information is less in the data comparison process, the data amplification method is adopted, the information utilization rate can be effectively improved, and the data comparison result is improved.
The private name interchange process is to identify the corresponding common names through a common name dictionary, and the trunk words which cannot be identified are regarded as the corresponding private names, and replace the trunk private names (such as the longest private name) to obtain the corresponding replaced place names A2 (namely the third place name) and B2 (namely the fourth place name).
3.2, obtaining a place name standard address: and respectively calling the map service to acquire corresponding standard addresses A3 (namely the fifth place name) and B3 (namely the sixth place name) according to the replaced place names A2 and B2 and combining administrative division of the corresponding place names.
3.3, place name similarity simi2: and obtaining the corresponding place name similarity simi2 by using a method of word segmentation and text similarity calculation according to the place names A3 and B3 obtained above.
4. Matching result calculation
And (3) carrying out combined calculation (such as weighting sum) on the corresponding place name similarity simi1 and the place name similarity simi2, comparing the obtained result with a preset threshold (namely the similarity threshold), and judging whether the place names A1 and B1 are successfully matched or not based on the comparison result, so that the aim of data combination is fulfilled.
In this embodiment, when the matching of the local names A1 and B1 is successful, data merging processing may be performed, and further, a corresponding local name library may be established based on the merged data.
Therefore, in the embodiment, the place names are matched based on the natural language processing and the machine learning method, so that compared with a manual matching mode, the labor cost can be greatly saved, and the corresponding information is fully utilized through data amplification.
The methods provided herein are described above. The apparatus provided in this application is described below:
referring to fig. 4, a schematic structural diagram of a place name matching device provided in an embodiment of the present application, as shown in fig. 4, the place name matching device may include:
a determining unit 410, configured to determine a first similarity between the first place name and the second place name;
a swap unit 420, configured to swap a first private name in the first place name with a second private name in the second place name to obtain a third place name and a fourth place name;
an obtaining unit 430, configured to invoke a map service based on the third place name and the fourth place name, and obtain a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
the determining unit 410 is further configured to determine a second similarity between the fifth location name and the sixth location name;
and a matching unit 440, configured to match the first place name and the second place name based on the first similarity and the second similarity.
In some embodiments, when there is a plurality of private names in the first place name, determining a longest private name among the plurality of private names as the first private name;
and when a plurality of private names exist in the second place name, determining the longest private name in the plurality of private names as the second private name.
In some embodiments, the obtaining unit 430 invokes a map service to obtain a fifth location name matching the third location name and a sixth location name matching the fourth location name based on the third location name and the fourth location name, respectively, including:
calling a map service based on the third place name and an administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
the method comprises the steps of,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain the sixth place name.
In some embodiments, the matching unit 440 matches the first place name and the second place name based on the first similarity and the second similarity, including:
determining a weighted sum of the first similarity and the second similarity based on a preset weight;
and matching the first place name and the second place name based on the weighted sum.
In some embodiments, the matching unit 440 matches the first place name and the second place name based on the weighted sum, including:
and when the weighted sum is greater than or equal to a preset similarity threshold, determining that the first place name and the second place name are successfully matched.
In some embodiments, after the determining unit 410 determines the first similarity between the first place name and the second place name, the exchanging unit 420 further includes:
comparing the first similarity with a preset minimum similarity;
and when the first similarity is greater than or equal to the preset minimum similarity, exchanging the first private name in the first place name with the second private name in the second place name.
In some embodiments, before the exchanging unit 420 exchanges the first private name of the first place name with the second private name of the second place name, the method further includes:
creating a common name dictionary, wherein the common name dictionary is used for recording common names in place names, and the common names comprise common names and field common names;
querying the common name dictionary based on the first place name and the second place name respectively to determine private names in the first place name and the second place name; the special names are trunk words which are included in the place names and do not belong to the common names.
Fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device may include a processor 501, a communication interface 502, a memory 503, and a communication bus 504. The processor 501, the communication interface 502, and the memory 503 perform communication with each other via a communication bus 504. Wherein the memory 503 has a computer program stored thereon; the processor 501 can execute the place name matching method described above by executing a program stored on the memory 503.
The memory 503 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, the memory 502 may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
The present embodiments also provide a machine-readable storage medium, such as memory 503 in fig. 5, storing a computer program executable by processor 501 in the electronic device shown in fig. 5 to implement the place name matching method described above.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. A place name matching method, comprising:
determining a first similarity of the first place name and the second place name;
exchanging a first private name in the first place name with a second private name in the second place name to obtain a third place name and a fourth place name; wherein, the special name is a trunk word except the common name in the place name;
calling a map service based on the third place name and the fourth place name respectively, and acquiring a fifth place name matched with the third place name and a sixth place name matched with the fourth place name;
determining a second similarity of the fifth place name and the sixth place name;
and matching the first place name and the second place name based on the first similarity and the second similarity.
2. The method of claim 1, wherein when a plurality of private names exist in the first place name, determining a longest private name among the plurality of private names as the first private name;
and when a plurality of private names exist in the second place name, determining the longest private name in the plurality of private names as the second private name.
3. The method of claim 1, wherein the invoking a map service based on the third place name and the fourth place name, respectively, obtaining a fifth place name matching the third place name, and a sixth place name matching the fourth place name, comprises:
calling a map service based on the third place name and an administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
the method comprises the steps of,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain the sixth place name.
4. A method according to any one of claims 1-3, wherein after determining the first similarity of the first place name and the second place name, further comprising:
comparing the first similarity with a preset minimum similarity;
and when the first similarity is greater than or equal to the preset minimum similarity, determining to execute the operation of exchanging the first private name in the first place name with the second private name in the second place name.
5. A method according to any one of claims 1-3, wherein before said exchanging a first private name of said first place name with a second private name of said second place name, further comprising:
creating a common name dictionary, wherein the common name dictionary is used for recording common names in place names, and the common names comprise common names and field common names;
querying the common name dictionary based on the first place name and the second place name respectively to determine private names in the first place name and the second place name; the special names are trunk words which are included in the place names and do not belong to the common names.
6. A place name matching apparatus, comprising:
a determining unit configured to determine a first similarity of the first place name and the second place name;
the exchange unit is used for exchanging the first private name in the first place name with the second private name in the second place name so as to obtain a third place name and a fourth place name; wherein, the special name is a trunk word except the common name in the place name;
an obtaining unit, configured to invoke a map service based on the third place name and the fourth place name, respectively, to obtain a fifth place name that matches the third place name, and a sixth place name that matches the fourth place name;
the determining unit is further configured to determine a second similarity between the fifth place name and the sixth place name;
and the matching unit is used for matching the first place name and the second place name based on the first similarity and the second similarity.
7. The apparatus of claim 6, wherein when there is a plurality of private names in the first place name, determining a longest private name among the plurality of private names as the first private name;
and when a plurality of private names exist in the second place name, determining the longest private name in the plurality of private names as the second private name.
8. The apparatus according to claim 6, wherein the obtaining unit invokes a map service to obtain a fifth place name matching the third place name and a sixth place name matching the fourth place name based on the third place name and the fourth place name, respectively, comprising:
calling a map service based on the third place name and an administrative division corresponding to the third place name, and performing address search by taking the third place name and the administrative division corresponding to the third place name as search keywords to obtain a fifth place name;
the method comprises the steps of,
and calling a map service based on the fourth place name and the administrative division corresponding to the fourth place name, and performing address search by taking the fourth place name and the administrative division corresponding to the fourth place name as search keywords to obtain the sixth place name.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of any of claims 1-5 when executing a program stored on a memory.
10. A machine readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011033886.1A CN112163070B (en) | 2020-09-27 | 2020-09-27 | Place name matching method, place name matching device, electronic equipment and machine-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011033886.1A CN112163070B (en) | 2020-09-27 | 2020-09-27 | Place name matching method, place name matching device, electronic equipment and machine-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112163070A CN112163070A (en) | 2021-01-01 |
CN112163070B true CN112163070B (en) | 2024-02-27 |
Family
ID=73861321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011033886.1A Active CN112163070B (en) | 2020-09-27 | 2020-09-27 | Place name matching method, place name matching device, electronic equipment and machine-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112163070B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103106264A (en) * | 2013-01-29 | 2013-05-15 | 河南理工大学 | Matching method and matching device of place names |
CN109145095A (en) * | 2017-06-16 | 2019-01-04 | 贵州小爱机器人科技有限公司 | Information of place names matching process, information matching method, device and computer equipment |
CN110399448A (en) * | 2019-07-31 | 2019-11-01 | 浪潮软件集团有限公司 | Chinese Place Names address searching matching process, terminal, computer readable storage medium |
CN110442603A (en) * | 2019-07-03 | 2019-11-12 | 平安科技(深圳)有限公司 | Address matching method, apparatus, computer equipment and storage medium |
CN111639493A (en) * | 2020-05-22 | 2020-09-08 | 上海微盟企业发展有限公司 | Address information standardization method, device, equipment and readable storage medium |
-
2020
- 2020-09-27 CN CN202011033886.1A patent/CN112163070B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103106264A (en) * | 2013-01-29 | 2013-05-15 | 河南理工大学 | Matching method and matching device of place names |
CN109145095A (en) * | 2017-06-16 | 2019-01-04 | 贵州小爱机器人科技有限公司 | Information of place names matching process, information matching method, device and computer equipment |
CN110442603A (en) * | 2019-07-03 | 2019-11-12 | 平安科技(深圳)有限公司 | Address matching method, apparatus, computer equipment and storage medium |
CN110399448A (en) * | 2019-07-31 | 2019-11-01 | 浪潮软件集团有限公司 | Chinese Place Names address searching matching process, terminal, computer readable storage medium |
CN111639493A (en) * | 2020-05-22 | 2020-09-08 | 上海微盟企业发展有限公司 | Address information standardization method, device, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112163070A (en) | 2021-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9189746B2 (en) | Machine-learning based classification of user accounts based on email addresses and other account information | |
CN109033249B (en) | Information extraction method and device for structured document in field of public inspection method and storage medium | |
CN111163072B (en) | Method and device for determining characteristic value in machine learning model and electronic equipment | |
CN109948122A (en) | Error correction method and device for input text and electronic equipment | |
CN110209780B (en) | Question template generation method and device, server and storage medium | |
JP2018194919A (en) | Learning program, learning method and learning device | |
CN111241229A (en) | Express courier station address distinguishing method, computer equipment and storage medium | |
CN112163070B (en) | Place name matching method, place name matching device, electronic equipment and machine-readable storage medium | |
CN110427496B (en) | Knowledge graph expansion method and device for text processing | |
CN113177407A (en) | Data dictionary construction method and device, computer equipment and storage medium | |
CN109918661B (en) | Synonym acquisition method and device | |
CN116595149A (en) | Man-machine dialogue generation method, device, equipment and storage medium | |
CN112784015B (en) | Information identification method and device, apparatus, medium, and program | |
CN113221558B (en) | Express address error correction method and device, storage medium and electronic equipment | |
CN114444464A (en) | Document detection processing method and device, storage medium and electronic equipment | |
CN112579713B (en) | Address recognition method, address recognition device, computing equipment and computer storage medium | |
CN112861532B (en) | Address standardization processing method, device, equipment and online searching system | |
CN114169331A (en) | Address resolution method, device, computer equipment and storage medium | |
CN113065354A (en) | Method for identifying geographic position in corpus and related equipment thereof | |
CN111597368A (en) | Data processing method and device | |
CN116723050B (en) | Imitation website detection method, device, equipment and medium based on graph database | |
KR102501227B1 (en) | System and Method for detecting money fraud based on volume of Internet address search | |
CN111310419B (en) | Method and device for updating word rewriting candidate set | |
CN116244753B (en) | Method, device, equipment and storage medium for intersection of private data | |
CN114880430B (en) | Name processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |