OA19238A - Name matching method and apparatus. - Google Patents

Name matching method and apparatus. Download PDF

Info

Publication number
OA19238A
OA19238A OA1201900200 OA19238A OA 19238 A OA19238 A OA 19238A OA 1201900200 OA1201900200 OA 1201900200 OA 19238 A OA19238 A OA 19238A
Authority
OA
OAPI
Prior art keywords
name
matched
matching
standard
names
Prior art date
Application number
OA1201900200
Inventor
QingQing Sun
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Publication of OA19238A publication Critical patent/OA19238A/en

Links

Abstract

Implementations of the present application disclose a method and an apparatus for matching names. The method includes the following: obtaining a name to be matched; determining a standard name set used to match the name to be matched; detecting the name to be matched to determine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical; and determining a matching result of the name to be matched based on a detection result. The matching accuracy can be improved and a false alarm rate of a risk control system can be reduced by using the implementations of the present application.

Description

NAME MATCHING METHOD AND APPARATUS
TECHNICAL FIELD
[0001] The présent application relates to the field of computer software technologies, and in particular, to a method and an apparatus for matching names.
BACKGROUND
[0002] Person name matching is a very important technology in the risk control field. For example, a risk control System records person names of determined unauthorized users in a blacklist. Then, when performing risk control operation, for each user that currently performs a service, a person name of each user is matched with each person name in the blacklist through scanning. If the matching succeeds, the user can be considered as an unauthorized user, and the service of the user is rejected, to prevent certain risks.
[0003] Person name matching can be classifîed into accurate person name matching and person name fuzzy matching. In comparison, person name fuzzy matching is more difficult in terms of technologies because it is difficult to control a proper fuzzy degree.
[0004] In the existing technology, a string matching algorithm is usually used to perform person name fuzzy matching, and a string matching degree threshold détermines a fuzzy degree. However, the string matching degree threshold is ail set according to expérience. To reduce omission, the string matching degree threshold is usually set to a relatively low value. Consequently, the matching accuracy is relatively low, and a false alarm rate of the risk control System is relatively high.
SUMMARY
[0005] Implémentations of the présent application provide a method and an apparatus for matching names, to alleviate the following technical problem: In the existing technology, the relatively low matching accuracy and relatively high System false alarm rate happen when person name fuzzy matching is performed by using a string matching algorithm.
[0006] To alleviate the previous technical problem, the implémentations of the présent application are implemented as follows:
[0007] An implémentation of the présent application provides a method for matching names, including the following: obtaining a name to be matched; determining a standard name set used to match the name to be matched; detecting the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical; and determining a matching resuit ofthe name to be matched based on a détection resuit.
[0008] An implémentation of the présent application provides an apparatus for matching names, including the following: an acquisition module, configured to obtain a name to be matched; a determining module, configured to détermine a standard name set used to match the name to be matched; a détection module, configured to detect the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical; and a matching module, configured to détermine a matching resuit of the name to be matched based on a détection resuit.
[0009] The at least one technical solution used in the implémentations of the présent application can achieve the following bénéficiai effects: The name can include a person name. In practice, a person name to be matched may be different from an actual person name due to the timeliness and the uncertainty during data entry and the variability ofthe person name. This is also the reason for performing fuzzy matching. In the solution of the présent application, for this reason, the person name to be matched is detected to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical, and a person name matching resuit is determined based on a détection resuit. Compared with the existing technology that a fuzzy degree is controlled only by using a string matching degree threshold set according to expérience, the présent application is more conducive to improving the accuracy of controlling the fuzzy degree. As such, the matching accuracy can be improved, and a false alarm rate of a risk control System can be reduced. Therefore, some or ail of the problems in the existing technologies can be alleviated.
BRIEF DESCRIPTION OF DRAWINGS
[0010] To describe the technical solutions in the implémentations of the présent application or in the existing technology more clearly, the following briefly describes the accompanying drawings required for describing the implémentations or the existing technology. Apparently, the accompanying drawings in the following description merely show some implémentations of the présent application, and a person of ordinary skill in the art can still dérivé other drawings from these accompanying drawings without créative efforts.
[0011] FIG. 1 is a schematic flowchart illustrating a method for matching names, according to an implémentation of the présent application;
[0012] FIG. 2 is a schematic flowchart illustrating an implémentation of preliminary screening in a method for matching names in an actual application scénario, according to an implémentation of the présent application;
[0013] FIG. 3 is a schematic flowchart illustrating an implémentation of performing ftizzy matching by using an integrated algorithm in a method for matching names in an actual application scénario, according to an implémentation of the présent application;
[0014] FIG. 4 is a schematic flowchart illustrating an implémentation of a method for matching names in an actual application scénario, according to an implémentation of the présent application; and
[0015] FIG. 5 is a schematic structural diagram illustrating an apparatus for matching names and corresponding to FIG. 1, according to an implémentation of the présent application.
DESCRIPTION OF IMPLEMENTATIONS
[0016] Implémentations of the présent application provide a method and an apparatus for matching names.
[0017] To make a person skilled in the art understand the technical solutions in the présent application better, the following clearly and comprehensively describes the technical solutions in the implémentations of the présent application with reference to the accompanying drawings in the implémentations of the présent application. Apparently, the described implémentations are merely some but not all of the implémentations of the présent application. All other implémentations obtained by a person of ordinary skill in the art based on the implémentations of the présent application without créative efforts shall fall within the protection scope of the présent application.
[0018] As described above, in practice, a person name to be matched may be different from an actual person name due to the timeliness and the uncertainty during data entry and the variability of the person name (which mainly indicating a change in a shape (namely, character) of the person name). For ease of understanding, an English person name is used as an example. Common variation types and instances of an English person name are shown in Table 1.
Table 1
Common variation type of an English person name Instance of a person name variation type
Misspelling Jaxson, Jakson, and Jaxon are originally expected to represent the same person name. SMYTH and SMITH are originally expected to represent the same person name. Spelling error.
Phonetic spelling différence Michel, Michal, and Miguel are originally expected to represent the same person name. Misspelling caused by similar pronunciations.
Nickname Mike, Mick, and Mikey represent nicknames ofthe same person.
Address term Dr., Mr., etc. are terms of addressing a person, and do not affect a meaning of a person name when they appear together with the person name.
Synonym Mohamed, Mohammed, and Mohammad are synonymous when appearing in a person name.
Abbreviation A person name ROB is an abbreviation for ROBBIN, and ROB and ROBBIN are synonymous. A person name BOB is an abbreviation for BOBBY, and ROB and BOBBY are synonymous.
Multi-national language différence Russian BORISOV has a different meaning from BORISOVNA, and Spanish JUANLTA is synonymous with JUAN.
[0019] If only a string matching method is used to match a person name, it is possible that matching can be attempted for a variation type of misspelling a certain letter (actually, the matching succeeds accidentally, and there is no solid basis). However, the matching accuracy is very low for other variation types.
[0020] In the solutions of the présent application, particular détection such as abbreviation détection, address term détection, multi-language détection, or alias détection can be performed for the other variation types, so that a situation that person names are synonymous but characters of the person names are not identical (in other words, the names are synonymous but different in shapes) can be considered comprehensively, and further the matching accuracy can be improved. It is worthwhile to note that different shapes can be different shapes that are wrong due to misspelling; but in the following implémentations, can be mainly different shapes that are reasonable and correct due to the other variation types.
[0021] The solutions of the présent application are not only applicable to person name matching, but also applicable to matching of names other than a person name, for example, a place name or an object name.
[0022] The solutions of the présent application are described below in detail.
[0023] FIG. 1 is a schematic flowchart illustrating a method for matching names, according to an implémentation of the présent application. A device that a program executing the procedure can be installed on includes but is not limited to a personal computer, a large or medium computer, a computer cluster, a mobile phone, a tablet computer, a smart wearable device, a vehicle machine, etc. This process usually can be used in the risk control field, and is executed by a risk control
System or a related System.
[0024] The procedure shown in FIG. 1 can include the following steps:
[0025] S101. Obtain a name to be matched.
[0026] In the présent implémentation of the présent application, a language that the name to be matched belongs to is not limited, and can be English, Russian, Spanish, or Chinese. For ease of description, the following implémentation is mainly described by using an example that the language that the name to be matched belongs to is English.
[0027] S102. Déterminé a standard name set used to match the name to be matched.
[0028] In the présent implémentation of the présent application, the standard name set can be a subset screened out from a larger name set, or can be directly the larger name set. For ease of description, the screening here can be referred to as preliminary screening. For example, in a scénario m the background, the larger name set can be a blacklist held by a risk control System.
[0029] In the previous situation, the subset can be a set that includes only names similar to the name to be matched. Similar here may not be so strict because there is a sériés of subséquent operations of further determining the similarity. It means that fine screening is performed subsequently. A matching range can be quickly reduced through preliminary screening, so that a workload of subséquent fine screening can be reduced, and the pertinence of fine screening can be improved, which is conducive to improving the efficiency of the solutions of the présent application.
[0030] S103. Detect the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0031] In the présent implémentation of the présent application, the names are synonymous but characters of the names are not identical that needs to be detected and determined is mainly caused by one or more variation types in Table 1, and the détection can include at least one of détections such as abbreviation détection, address term détection, multi-language détection, or alias détection, which are described below in detail.
[0032] In the présent implémentation of the présent application, when there are a plurality of détections, the plurality of détections can be successively performed in order. Remaining détections may not be performed if a matching resuit of the name to be matched can be determined in the détection process. Certainly, to improve the execution efficiency, the plurality of détections can be performed in parallel and then détection results can be summarized.
[0033] S104. Détermine a matching resuit of the name to be matched based on a détection resuit.
[0034] In the présent implémentation of the présent application, by performing step S103, the matching resuit of the name to be matched can be directly determined if it is determined that the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical. In this situation, the détection process in step S103 is actually an entire matching process of the name to be matched.
[0035] If it is determined that the name to be matched does not satisfy the following condition: the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical, matching can be further performed on the name to be matched by using another matching method to détermine the matching resuit of the name to be matched.
[0036] Through the method shown in FIG. 1, the name can include a person name. In practice, a person name to be matched may be different from an actual person name due to the timeliness and the uncertainty during data entry and the variability of the person name. This is also the reason for performing fuzzy matching. In the solution of the présent application, for this reason, the person name to be matched is detected to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical, and a person name matching resuit is determined based on a détection resuit. Compared with the existing technology that a fuzzy degree is controlled only by using a string matching degree threshold set according to expérience, the présent application is more conducive to improving the accuracy of controlling the fuzzy degree. As such, the matching accuracy can be improved, and a false alarm rate of the risk control System can be reduced. Therefore, some or ail of the problems in the existing technologies can be alleviated.
[0037] Based on the method in FIG. 1, the présent implémentation of the présent application further provides some implémentations of the method and an extension solution, which are described below.
[0038] In the présent implémentation of the présent application, the complexity of different names to be matched can be different, and included information can also be different. For some names to be matched that hâve a small volume of information or a simple information feature, it is difficult for the value of an obtained matching resuit to reach an expectation even if matching is performed on the names to be matched. For example, if a name is too simple and general, such as an English person name Jim, Jimmy, David, John, or Mike, it is difficult to détermine a certain person even if matching succeeds.
[0039] To alleviate the waste of processing resources used for name matching due to this situation, the name to be matched can be first filtered after being obtained, to détermine whether to continue the matching. In addition, if the name to be matched is included in a white list, the similar problem exists, and the name to be matched can also be processed by using this method.
[0040] For steps S101 and S102, after the name to be matched is obtained, and prior to the standard name set used to match the name to be matched is determined, the following steps can be further performed: obtaining a predetermined set of names that do not need to be matched; determming whether the name to be matched is included in the set of names that do not need to be matched; and if yes, continuing to perform subséquent steps. Otherwise, matching may not be performed on the name to be matched.
[0041] In the présent implémentation of the présent application, an implémentation of step S102 (which corresponds to the previous situation of preliminary screening) is used as an example for description. For step S102, the determining a standard name set used to match the name to be matched can include the following: determining a first name set that can be used to match the name to be matched; and determming the standard name set used to match the name to be matched by performing similarity matching on each word included in the name to be matched and each word included in a name in the first name set.
[0042] There are also a plurality of implémentations of how to perform similarity matching. Word segmentation matching can be performed on the name to be matched, or full-text matching can be performed on the name to be matched, etc.
[0043] In an example of word segmentation matching, the determining the standard name set used to match the name to be matched by performing similarity matching on each word included in the name to be matched and each word included in a name in the first name set can include the following: obtaining an index of each name included in the first name set, where the index of the name is any word included in the name; segmenting the name to be matched to obtain each word included in the name to be matched; performing similarity matching on each word included in the name to be matched and each index to obtain a subset of the first name set. The obtained subset includes a name indexed by each index that is successfully matched; and determining the standard name set used to match the name to be matched based on each subset.
[0044] The index in the previous example is pre-established, and an advantage of performing word segmentation matching based on the index is that a speed of obtaining a name needed in the set in the matching process can be effectively accelerated. If word segmentation matching is not performed based on the index, word segmentation matching still can be implemented (for example, a data table that stores the set is directly queried for a needed name by using a Select statement for word segmentation matching), except that the efficiency may be affected.
[0045] Further, the performing similarity matching on each word included in the name to be matched and each index can include the following: performing similarity matching on each word included in the name to be matched and each index by using a string matching algorithm, where the string matching algorithm can include one or more algorithms, for example, a prefîx tree matching algorithm, a dictionary tree matching algorithm, a string similarity matching algorithm, and a pronunciation similarity matching algorithm. Here, using the string matching algorithm is merely a better method, or another algorithm, such as a text matching algorithm, that can be used to implement similarity matching can be used.
[0046] There may be a plurality of implémentations of determining the standard name set used to match the name to be matched based on each subset. For example, if N string matching algorithms are used, for each of M words obtained through word segmentation, each string matching algorithm is used to match the word with each index to correspondingly obtain N subsets. Then a union set of the N subsets is taken. A total of M union sets are obtained, and each name whose total number of occurrences in the M union sets exceeds a specified threshold is selected and determined as the standard name set. For another example, N subsets are obtained, and then an intersection set of the N subsets is taken and determined as the standard name set.
[0047] The preliminary screening process is described above. For ease of understanding, an implémentation of the présent application further provides a schematic flowchart illustrating an implémentation of the preliminary screening in the method for matching names in an actual application scénario, as shown in FIG. 2.
[0048] In the actual application scénario, assume that the first name set is a list of English person names, an index of each name in the list is predetermined (related information other than the corresponding name can be indexed by using the index), the index is a word in the name that corresponds to the index, each index is included in an index table established by using a word that corresponds to the index as a primary key, the name to be matched is an English person name Kit Wai Jackson Wong , and a space is used as a délimiter to segment the person name, and a word segmentation resuit is shown in Table 2.
Table 2
Person name to be matched Kit Wai Jackson Wong
After word segmentation Word 1 : Kit Word 2: Wai Word 3: Jackson Word 4: Wong
[0049] In FIG. 2, M=4, and the word segmentation resuit is {Kit, Wai, Jackson, Wong}. Each word obtained through word segmentation is matched with each index by using a prefix tree matching algorithm, a dictionary tree matching algorithm, a string similarity matching algorithm (such as a Simstring algorithm), and a pronunciation similarity matching algorithm (such as a metaphone algorithm), to output a name set that corresponds to indexes of a single word that are obtained by using the four matching algorithms: matching results 1, 2, 3, and 4.
[0050] A union set of the matching results of each word is taken to obtain a comprehensive matching resuit of each word.
[0051] Names included in at least two comprehensive matching results are selected to form a set, and the set is used as a generated preliminary screening resuit.
[0052] In the présent implémentation of the présent application, after the standard name set is determined, the name to be matched can be detected. However, one or more types of preprocessing can be further performed prior to the détection. The preprocessing is conducive to improving the reliability of a subséquent détection resuit. The preprocessing can include alignment processing, capital and lowercase unification processing, simplified-traditional processing, etc.
[0053] Alignment processing is used as an example. For step S103, prior to the detecting the name to be matched, the following processing can be further performed: aligning each word included in a name in the standard name set with each word included in the name to be matched based on the similarity between each word included in the name in the standard name set and each word included in the name to be matched.
[0054] In practice, a principle of performing alignment processing based on the similarity can be a similarity-based maximization principle. To be spécifie, locations of names that include words with the maximum similarity are aligned.
[0055] For example, assume that the name to be matched is Kate Lee Smith, and an alignment resuit of a name such as Smith Catherine Lee in the standard name set is shown in Table 3.
Table 3
Name to be matched Kate Lee Smith
Name obtained after alignment processing in the standard name set Catherine Lee Smith
[0056] Further, when alignment processing is performed, for step S103, the detecting the naine to be matched can include the following: detecting the naine to be matched based on the aligned standard name set to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0057] In the présent implémentation of the présent application, the détection in step S103 is a key part for improving the matching accuracy of person name fuzzy matching. For step S103, the detecting the name to be matched includes the following: performing at least one of abbreviation détection, address term détection, multi-language détection, or alias détection on the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical. The several détections are separately described below.
[0058] In the présent implémentation of the présent application, acronym détection is most common in the abbreviation détection. In addition to the acronym détection, there is abbreviation détection with omission of a partitive. In an implémentation, the performing abbreviation détection on the name to be matched can include the following: obtaining predetermined abbreviation contrast combination data, where each abbreviation contrast combination reflects an abbreviation mapping relationship between at least one word and an abbreviation of the word; detecting whether a word included in the name to be matched has the abbreviation mapping relationship with a word included in a name in the standard name set based on the abbreviation contrast combination data; and determining, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0059] For example, if it is predetermined that an English person name Ben Williams can be abbreviated to B. Williams, Ben Williams and B. Williams can be used as an abbreviation contrast combination. If it is detected that the name to be matched and a name in the standard name set are the abbreviation contrast combination, it can be determined that there is a situation that the names are synonymous but characters of the names are not identical.
[0060] In the présent implémentation of the présent application, the performing address term détection on the name to be matched can include the following: obtaining predetermined address term data; detecting whether the name to be matched includes the address term, where it is considered that the address term does not affect a meaning of the name that includes the address term; and determining, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0061] In terms of a person name, an address term is generally a word attached to at least a part of the original person name, such as an honorific title (such as Mr. or Miss.) or a title (such as Dr. or Prof.). In a person name matching environment, the address term does not affect a meaning of the at least a part of the original person name that corresponds to the address term. Therefore, if it is determined that the name to be matched includes the address term, and other parts other than the address term can match a name in the standard name set, it can be determined that the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0062] In the présent implémentation of the présent application, the performing alias détection on the name to be matched can include the following: obtaining predetermined alias contrast combination data, where each alias contrast combination reflects an alias mapping relationship between at least one word and an alias of the word; detecting whether a word included in the name to be matched has the alias mapping relationship with a word included in a name in the standard name set based on the alias contrast combination data; and determining, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0063] In practice, the alias can include a nickname (for example, Mick in Table 1 is a nickname of Mikey) of a name that corresponds to the alias or a synonymous name of the name that corresponds to the alias in different fields. For the latter, different fields can be different régions (for example, different countries or different provinces), different languages (for example, languages of different countries or languages of different nationalities), different industries, etc.
[0064] Correspondingly, at least one of the following is performed: a nickname of the name to be matched is detected, or a synonymous name of the name to be matched in different fields is detected.
[0065] The alias also does not affect a meaning of the name that corresponds to the alias. Therefore, if it is determined that the name to be matched is an alias of a name in the standard name set, it can be determined that the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0066] It is worthwhile to note that storage forms of the contrast combination data and the address term data described above are not limited in the présent application. A common method is to store the contrast combination data and the address term data in a corresponding data table and read the data from a database when the data needs to be used.
[0067] In the présent implémentation of the présent application, the performing multi-language détection on the name to be matched can include the following: determining a language that corresponds to the name to be matched; obtaining at least one of a spelling deformation synonym rule or a spelling deformation homonym mie of at least one of the language or another language; and detecting the name to be matched according to at least one of the spelling deformation synonym mie or the spelling deformation homonym mie to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0068] Multi-language détection is mainly aimed at the following situation, for example, English Pooh is spelled as puh in German, and the two words are synonymous when appearing in person names.
[0069] It is worthwhile to note that an algorithm used to calculate a string matching degree can be used in the détections performed previously.
[0070] In the présent implémentation of the présent application, the determining a matching resuit of the name to be matched based on a détection resuit can include the following: determining the at least one name as the matching resuit of the name to be matched if it is determined that the name to be matched is synonymous with the at least one name in the standard name set but the characters of the names are not identical; or otherwise (to be spécifie, when the matching fails through the previous détection), determining the matching resuit of the name to be matched by matching the name to be matched with a name in the standard name set by using one or more similarity algorithms.
[0071] In the présent implémentation of the présent application, the plurality of similarity algorithms can be based on different dimensions, and therefore can be conducive to improving the reliability of the matching resuit. According to such an idea, the similarity algorithm can be an algorithm (such as an n-gram algorithm) used to calculate a text matching degree, an algorithm (such as a Phonex algorithm) used to calculate a phonetic matching degree, an algorithm (such as a Jaro-Winkler algorithm) used to calculate a string matching degree, etc.
[0072] When the plurality of similarity algorithms are used, matching results that correspond to the similarity algorithms can be comprehensively measured to obtain a comprehensive matching resuit. A measurement method is not limited in the présent application, and a common method is weighting summation.
[0073] For example, when the n-gram algorithm is used, an algorithm input is each word included in the name to be matched and each word at an aligned location of the word, and an algorithm output is a text matching degree of each pair of aligned words, and is denoted as Fl.
[0074] When the Jaro-Winkler algorithm is used, an algorithm input is each word included in the name to be matched and each word at an aligned location of the word, and an algorithm output is a string matching degree of each pair of aligned words, and is denoted as F2.
[0075] When the Phonex algorithm is used, an algorithm input is each word included in the name to be matched and each word at an aligned location of the word, and an algorithm output is a phonetic matching degree of each pair of aligned words, and is denoted as F3.
[0076] A comprehensive matching degree F of each pair of aligned words is obtained by performing weighting summation on the text matching degree, the string matching degree, and the phonetic matching degree, as shown in the following équation:
F=wl*Fl+w2*F2+w3*F3, where wl+w2+w3=l.
[0077] Based on the comprehensive matching degree F of each pair of words, a resuit of matching between the name to be matched and the name in the standard name set is obtained by calculating an average value. For example, for a pair of names in Table 3, an obtained matching resuit is shown in Table 4.
Table 4
Detected name Kate Lee Smith
Name in a list Catherine Lee Smith
Comprehensive matching degree 0.872 1 1
Matching resuit (0.872+1+1)/3=0.957
[0078] The détection and matching process after the preliminary screening is described above. In this process, a plurality of matching-related algorithms can be used. During implémentation of the solutions of the présent application, the algorithms that can be used can be integrated, and this process is a process of performing fuzzy matching by using an integrated algorithm.
[0079] In the présent implémentation of the présent application, after the fuzzy matching is performed by using the intégration algorithm, some post filtering based on certain iules can be further performed, for example, the matching degree in the matching resuit is mapped to text description information, or the matching degree is properly compensated for or reduced based on a certain scénario.
[0080] Based on the previous description, more intuitively, an implémentation of the présent application further provides a schematic flowchart illustrating an implémentation of performing fuzzy matching by using an integrated algorithm in a method for matching names in an actual application scénario, as shown in FIG. 3. In FIG. 3, a sequence of performing various détections and calculating matching degrees in various methods is merely an example, and is not for limiting the présent application.
[0081] In FIG. 3, if matching of a name to be matched succeeds through any one of the previous détections, a matching resuit can be directly output; otherwise, the matching resuit of the name to be matched can be determined and output by using one or more methods for matching degree calculation (for example, text matching degree calculation, phonetic matching degree calculation, and string matching degree calculation).
[0082] Certainly, various détections and matching degree calculation in various methods can also be completely performed, and then various détection results and matching degree calculation results in various methods can be comprehensively considered, to détermine a matching resuit of a string to be matched.
[0083] Further, an implémentation of the présent application further provides a schematic flowchart illustrating an implémentation of a method for matching names in an actual application scénario, as shown in FIG. 4. In FIG. 4, predetermining based on certain rules can include the following: determining whether a name to be matched is not included in a set of names that do not need to be matched, where a scan list index is an index of names in the standard name set.
[0084] Steps in FIG. 3 and FIG. 4 are described above in detail, and details are omitted here for simplicity.
[0085] The method for matching names provided in the présent implémentation of the présent application is described above. As shown in FIG. 5, based on the same invention idea, an implémentation of the présent application further provides a corresponding apparatus.
[0086] FIG. 5 is a schematic structural diagram illustrating an apparatus for matching names and corresponding to FIG. 1, according to an implémentation of the présent application. A dashed line represents an optional module, and the apparatus includes the following: an acquisition module 501, confïgured to obtain a name to be matched; a determining module 502, configured to détermine a standard name set used to match the name to be matched; a détection module 503, confïgured to detect the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical; and a matching module 504, confïgured to détermine a matching resuit of the name to be matched based on a détection resuit.
[0087] Optionally, prior to determining the standard name set used to match the name to be matched, the determining module 502 is configured to obtain a predetermined set of names that do not need to be matched, and détermine that the name to be matched is not included in the set of names that do not need to be matched.
[0088] Optionally, that the determining module 502 is confïgured to détermine a standard name set used to match the name to be matched includes the following:
[0089] The determining module 502 is confïgured to obtain a first name set that can be used to match the name to be matched, and détermine the standard name set used to match the name to be matched by performing similarity matching on each word included in the name to be matched and each word included in a name in the first name set.
[0090] Optionally, that the determining module 502 is confïgured to détermine the standard name set used to match the name to be matched by performing similarity matching on each word included in the name to be matched and each word included in a name in the first name set includes the following:
[0091] The determining module 502 is confïgured to obtain an index of each name included in the first name set, where the index of the name is any word included in the name; segment the name to be matched to obtain each word included in the name to be matched; perform similarity matching on each word included in the name to be matched and each index to obtain a subset of the first name set. The obtained subset includes a name indexed by each index that is successfully matched; and détermine the standard name set used to match the name to be matched based on the subset.
[0092] Optionally, that the determining module 502 is confïgured to perform similarity matching on each word included in the name to be matched and each index includes the following: [0093] The determining module 502 is confïgured to perform similarity matching on each word included in the name to be matched and each index by using a string matching algorithm, where the string matching algorithm includes at least one of the following: a prefix tree matching algorithm, a dictionary tree matching algorithm, a string similarity matching algorithm, or a pronunciation similarity matching algorithm.
[0094] Optionally, the apparatus further includes the following: an alignment module 505, confïgured to align each word included in a name in the standard name set with each word included in the name to be matched based on the similarity between each word included in the name in the standard name set and each word included in the name to be matched prior to the détection module 503 detects the name to be matched.
[0095] That the détection module 503 is confïgured to detect the name to be matched includes the following:
[0096] The détection module 503 is confïgured to detect the name to be matched based on the aligned standard naine set to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0097] Optionally, that the détection module 503 is confîgured to detect the name to be matched includes the following: the détection module 503 is confîgured to perform at least one of abbreviation détection, address term détection, multi-language détection, or alias détection on the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0098] Optionally, that the détection module 503 is confîgured to perform abbreviation détection on the name to be matched includes the following:
[0099] The détection module is confîgured to obtain predetermined abbreviation contrast combination data, where each abbreviation contrast combination reflects an abbreviation mapping relationship between at least one word and an abbreviation of the word; detect whether a word included in the name to be matched has the abbreviation mapping relationship with a word included in a name in the standard name set based on the abbreviation contrast combination data; and détermine, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0100] Optionally, that the détection module 503 is confîgured to perform address term détection on the name to be matched includes the following:
[0101] The détection module 503 is confîgured to obtain predetermined address term data; detect whether the name to be matched includes the address term, where it is considered that the address term does not affect a meaning of the name that includes the address term; and détermine, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0102] Optionally, the alias includes a nickname of a name that corresponds to the alias, or a synonymous name of a name that corresponds to the alias in different fîelds. That the détection module 503 is confîgured to perform alias détection on the name to be matched includes the following:
[0103] The détection module 503 is confîgured to perform at least one of the following: detecting a nickname of the name to be matched, or detecting a synonymous name of the name to be matched in different fîelds.
[0104] Optionally, that the détection module 503 is confîgured to perform multi-language détection on the name to be matched includes the following: the détection module 503 is confîgured to détermine a language that corresponds to the name to be matched; obtain at least one of a spelling deformation synonym rule or a spelling deformation homonym rule of the language; and detect the name to be matched according to at least one of the spelling deformation synonym rule or the spelling deformation homonym rule, to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0105] Optionally, that the détection module 503 is configured to perform alias détection on the name to be matched includes the following:
[0106] The détection module 503 is configured to obtain predetermined alias contrast combination data, where each alias contrast combination reflects an alias mapping relationship between at least one word and an alias of the word; detect whether a word included in the name to be matched has the alias mapping relationship with a word included in a name in the standard name set based on the alias contrast combination data; and détermine, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
[0107] Optionally, that the matching module 504 is configured to détermine a matching resuit of the name to be matched based on a détection resuit includes the following:
[0108] The matching module 504 is configured to détermine the at least one name as the matching resuit of the name to be matched if the détection module 503 détermines that the name to be matched is synonymous with the at least one name in the standard name set but the characters of the names are not identical; or otherwise, déterminé the matching resuit of the name to be matched by matching the name to be matched with a name in the standard name set by using one or more similarity algorithme.
[0109] Optionally, the similarity algorithm includes at least one of the following: an algorithm used to calculate a text matching degree, an algorithm used to calculate a phonetic matching degree, or an algorithm used to calculate a string matching degree.
[0110] Optionally, the name is a person name.
[OUI] The apparatuses provided in the implémentations of the présent application are in a one-to-one correspondence with the methods. Therefore, the apparatuses and the methods hâve the similar bénéficiai technical effects. The bénéficiai technical effects of the methods hâve been described above in detail, and therefore the bénéficiai technical effects of the apparatuses are omitted here for simplicity.
[0112] In addition, application scénarios of the previous apparatuses and methods are not limited in the présent application. In addition to the risk control fîeld (for example, fields such as anti-money laundering and user authentication) mentioned in the background, the solutions of the présent application are applicable to any other fields that may need to use the name matching technology.
[0113] In the 1990s, whether a technical improvement is a hardware improvement (for example, an improvement to a circuit structure, such as a diode, a transistor, or a switch) or a software improvement (an improvement to a method procedure) can be clearly distinguished. However, as technologies develop, ouïrent improvements to many method procedures can be considered as direct improvements to hardware circuit structures. A designer usually programs an improved method procedure into a hardware circuit, to obtain a corresponding hardware circuit structure. Therefore, a method procedure can be improved by using a hardware entity module. For example, a programmable logic device (PLD) (for example, a field programmable gâte array (FPGA)) is such an integrated circuit, and a logical function of the PLD is determined by a user through device programming. The designer perforais programming to integrate a digital system to a PLD without requesting a chip manufacturer to design and produce an application-specific integrated circuit chip. In addition, at présent, instead of manually manufacturing an integrated circuit chip, such programming is mostly implemented by using logic compiler software. The logic compiler software is similar to a software compiler used to develop and write a program. Original code needs to be written in a particular programming language for compilation. The language is referred to as a hardware description language (HDL). There are many HDLs, such as the Advanced Boolean Expression Language (ABEL), the Altéra Hardware Description Language (AHDL), Confluence, the Comell University Programming Language (CUPL), HDCal, the Java Hardware Description Language (JHDL), Lava, Lola, MyHDL, PALASM, and the Ruby Hardware Description Language (RHDL). The very-high-speed integrated circuit hardware description language (VHDL) and Verilog are most commonly used. A person skilled in the art should also understand that a hardware circuit that implements a logical method procedure can be readily obtained once the method procedure is logically programmed by using the several described hardware description languages and is programmed into an integrated circuit.
[0114] A controller can be implemented by using any appropriate method. For example, the controller can be a microprocessor or a processor and a computer readable medium storing computer readable program code (such as software or firmware) that can be executed by the microprocessor or the processor, a logic gâte, a switch, an application-specific integrated circuit (ASIC), a programmable logic controller, or a built-in microcontroller. Examples of the controller include but are not limited to the following microprocessors: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320. The memory controller can also be implemented as a part of the control logic of the memory. A person skilled in the art also knows that, in addition to implementing the controller by using the computer readable program code, logic programming can be performed on method steps to allow the controller to implement the same function in forms of the logic gâte, the switch, the application-specific integrated circuit, the programmable logic controller, and the built-in microcontroller. Therefore, the controller can be considered as a hardware component, and an apparatus configured to implement various fonctions m the controller can also be considered as a structure in the hardware component. Or the apparatus configured to implement various fonctions can even be considered as both a software module implementing the method and a structure in the hardware component.
[0115] The System, apparatus, module, or unit illustrated in the previous implémentations can be implemented by using a computer chip or an entity, or can be implemented by using a product having a certain fonction. A typical implémentation device is a computer. The computer can be, for example, a personal computer, a laptop computer, a cellular phone, a caméra phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
[0116] For ease of description, the apparatus above is described by dividing fonctions into various units. Certainly, when the présent application is implemented, the fonctions of the units can be implemented in one or more pièces of software and/or hardware.
[0117] A person skilled in the art should understand that an implémentation of the présent disclosure can be provided as a method, a System, or a computer program product. Therefore, the présent disclosure can use a form of hardware only implémentations, software only implémentations, or implémentations with a combination of software and hardware. Moreover, the présent disclosure can use a form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a disk memory, a CD-ROM, an optical memory, etc.) that include computer-usable program code.
[0118] The présent disclosure is described with reference to the flowcharts and/or block diagrams of the method, the device (system), and the computer program product based on the implémentations of the présent disclosure. It is worthwhile to note that computer program instructions can be used to implement each process and/or each block in the flowcharts and/or the block diagrams and a combination of a process and/or a block in the flowcharts and/or the block diagrams. These computer program instructions can be provided for a general-purpose computer, a dedicated computer, a built-in processor, or a processor of another programmable data processing device to generate a machine, so that the instructions executed by the computer or the processor of the another programmable data processing device generate a device for implementing a specified fonction in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
[0119] These computer program instructions can be stored in a computer readable memory that can instruct the computer or the another programmable data processing device to work in a spécifie way, so that the instructions stored in the computer readable memory generate an artifact that includes an instruction device. The instruction device implements a specifïed function in one or more processes in the floweharts and/or in one or more blocks in the block diagrams.
[0120] These computer program instructions can be loaded onto the computer or another programmable data processing device, so that a sériés of operations and operations and steps are performed on the computer or the another programmable device, thereby generating computer-implemented processing. Therefore, the instructions executed on the computer or the another programmable device provide steps for implementing a specifïed function in one or more processes in the floweharts and/or in one or more blocks in the block diagrams.
[0121] In a typical configuration, a computing device includes one or more processors (CPU), one or more input/output interfaces, one or more network interfaces, and one or more memories.
[0122] The memory can include a non-persistent memory, a random access memory (RAM), a non-volatile memory, and/or another form that are in a computer readable medium, for example, a read-only memory (ROM) or a flash memory (flash RAM). The memory is an example of the computer readable medium.
[0123] The computer readable medium includes persistent, non-persistent, movable, and unmovable media that can store information by using any method or technology. The information can be a computer readable instruction, a data structure, a program module, or other data. Examples of the computer storage medium include but are not limited to a phase-change random access memory (PRAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), another type of random access memory (RAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory or another memory technology, a compact dise read-only memory (CD-ROM), a digital versatile dise (DVD) or another optical storage, a cassette magnetic tape, a magnetic tape/magnetic disk storage or another magnetic storage device, or any other non-transmission medium. The computer storage medium can be used to store information accessible by the computing device. Based on the description in the présent spécification, the computer readable medium does not include transitory computer readable media (transitory media) such as a modulated data signal and carrier.
[0124] It is worthwhile to further note that, the terms include, comprise, or their any other variants are intended to cover a non-exclusive inclusion, so a process, a method, a product, or a device that includes a list of éléments not only includes those éléments but also includes other éléments which are not expressly listed, or further includes éléments inhérent to such process, method, product, or device. Without more constraints, an element preceded by includes a ... does not preclude the existence of additional identical éléments in the process, method, product, or device that includes the element.
[0125] A person skilled in the art should understand that an implémentation ofthe présent application can be provided as a method, a System, or a computer program product. Therefore, the présent application can use a form of hardware only implémentations, software only implémentations, or implémentations with a combination of software and hardware. In addition, the présent application can use a form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a disk memory, a CD-ROM, an optical memory, etc.) that include computer-usable program code.
[0126] The présent application can be described in the general context of computer exécutable instructions executed by a computer, for example, a program module. Generally, the program module includes a routine, a program, an object, a component, a data structure, etc. executing a particular task or implementing a particular abstract data type. The présent application can also be practiced in distributed computing environments. In the distributed computing environments, tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, the program module can be located in both local and remote computer storage media including storage devices.
[0127] The implémentations of the présent spécification are described in a progressive way. For the same or similar parts of the implémentations, references can be made to the implémentations. Each implémentation focuses on a différence from other implémentations. Particularly, a System implémentation is basically similar to a method implémentation, and therefore is described briefly. For related parts, references can be made to related descriptions in the method implémentation.
[0128] The previous descriptions are implémentations of the présent application, and are not intended to limit the présent application. A person skilled in the art can make various modifications and changes to the présent application. Any modification, équivalent replacement, or improvement made without departing from the spirit and principle of the présent application shall fall with in the scope of the daims in the présent application.

Claims (15)

1. A method for matching names, the method comprising:
obtaining a name to be matched;
determining a first name set that can be used to match the name to be matched;
determining a standard name set used to match the name to be matched by performing similarity matching on each word comprised in the name to be matched and each word comprised in a name in the first name set;
detecting the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical; and determining a matching resuit of the name to be matched based on a détection resuit.
2. The method according to claim 1, wherein prior to determining the standard name set used to match the name to be matched, the method further comprises:
obtaining a set of names that do not need to be matched; and determining that the name to be matched is not comprised in the set of names that do not need to be matched.
3. The method according to claim 1, wherein determining the standard name set comprises: obtaining an index of each name comprised in the first name set, wherein the index of the name is any word comprised in the name;
segmenting the name to be matched to obtain each word comprised in the name to be matched;
performing similarity matching on each word comprised in the name to be matched and each index to obtain a subset of the first name set, wherein the obtained subset comprises a name indexed by each index that is successfully matched; and determining the standard name set used to match the name to be matched based on each subset.
4. The method according to claim 3, wherein performing similarity matching on each word comprised in the name to be matched and each index comprises:
performing similarity matching on each word comprised in the name to be matched and each index by using a string matching algorithm, wherein the string matching algorithm comprises at least one of the following: a prefix tree matching algorithm, a dictionary tree matching algorithm, a string similarity matching algorithm, or a pronunciation similarity matching algorithm.
5. The method according to claim 1, wherein prior to detecting the name to be matched, the method further comprises:
aligning each word comprised in a name in the standard name set with each word comprised in the name to be matched based on the similarity degree between each word comprised in the name in the standard name set and each word comprised in the name to be matched; and wherein detecting the name to be matched comprises:
detecting the name to be matched based on the standard name set to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
6. The method according to claim 1, wherein detecting the name to be matched comprises: performing détection of at least one of an abbreviation, an address terni, a multi-language, or an alias on the name to be matched to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
7. The method according to claim 6, wherein performing détection of the abbreviation on the name to be matched comprises:
obtaining abbreviation contrast combination data, wherein each abbreviation contrast combination reflects an abbreviation mapping relationship between at least one word and an abbreviation of the word;
detecting whether a word comprised in the name to be matched has the abbreviation mapping relationship with a word comprised in a name in the standard name set based on the abbreviation contrast combination data; and determining, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
8. The method according to claim 6, wherein performing détection of the address terni on the name to be matched comprises:
obtaining an address terni;
detecting whether the name to be matched comprises the address term, wherein it is considered that the address term does not affect a meaning of the name that comprises the address term; and determining, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
9. The method according to claim 6, wherein the alias comprises a nickname of a name that corresponds to the alias or a synonymous name of a name that corresponds to the alias in different fields.
10. The method according to claim 6, wherein performing détection of the alias on the name to be matched comprises at least one of the following:
detecting a nickname of the name to be matched, or detecting a synonymous name of the name to be matched in different fields.
11. The method according to claim 6, wherein performing multi-language détection on the name to be matched comprises:
determining a language that corresponds to the name to be matched;
obtaining at least one of a spelling deformation synonym rule or a spelling deformation homonym rule of at least one of the language or another language; and detecting the name to be matched according to at least one of the spelling deformation synonym rule or the spelling deformation homonym rule to détermine whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
12. The method according to claim 6, wherein performing détection of the alias on the name to be matched comprises:
obtaining alias contrast combination data, wherein each alias contrast combination reflects an alias mapping relationship between at least one word and an alias of the word;
detecting whether a word comprised in the name to be matched has the alias mapping relationship with a word comprised in a name in the standard name set based on the alias contrast combination data; and determining, based on a détection resuit, whether the name to be matched is synonymous with at least one name in the standard name set but characters of the names are not identical.
13. The method according to claim 1, wherein determining the matching resuit of the name to be matched based on the détection resuit comprises:
determining the at least one name as the matching resuit of the name to be matched in response to determining that the name to be matched is synonymous with the at least one name in the standard name set but characters of the names are not identical; or in response to determining that the name to be matched is not synonymous with the at least one name in the standard name set, determining the matching resuit of the name to be matched by matching the name to be matched with a name in the standard name set by using one or more 5 similarity algorithms.
14. The method according to claim 13, wherein the similarity algorithm comprises at least one of: an algorithm used to calculate a text matching degree, an algorithm used to calculate a phonetic matching degree, or an algorithm used to calculate a string matching degree.
15. An apparatus for matching names, the apparatus comprising a plurality of modules confïgured to perform the method of any one of daims 1 to 14.
OA1201900200 2016-11-25 2017-11-17 Name matching method and apparatus. OA19238A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611055619.8 2016-11-25

Publications (1)

Publication Number Publication Date
OA19238A true OA19238A (en) 2020-04-24

Family

ID=

Similar Documents

Publication Publication Date Title
AU2017364745C1 (en) Name matching method and apparatus
EP3637295B1 (en) Risky address identification method and apparatus, and electronic device
US10467266B2 (en) Information query
JP2020510852A (en) Voice function control method and device
US11030411B2 (en) Methods, apparatuses, and devices for generating word vectors
TW201822048A (en) Validation code based verification method and device
US20180173694A1 (en) Methods and computer systems for named entity verification, named entity verification model training, and phrase expansion
US20180157646A1 (en) Command transformation method and system
US10824819B2 (en) Generating word vectors by recurrent neural networks based on n-ary characters
CN107329964B (en) Text processing method and device
US20210216664A1 (en) Service processing method and apparatus
OA19238A (en) Name matching method and apparatus.
US10901971B2 (en) Random walking and cluster-based random walking method, apparatus and device
US20160078072A1 (en) Term variant discernment system and method therefor
CN110018957B (en) Method and device for detecting resource loss check script
CN110046180B (en) Method and device for locating similar examples and electronic equipment
Basaj et al. How much should you ask? On the question structure in QA systems
CN113821533B (en) Method, device, equipment and storage medium for data query
CN115859975B (en) Data processing method, device and equipment
RU2684578C2 (en) Language independent technology of typos correction, with the possibility of verification result
CN107391489B (en) Text analysis method and device
US20240153500A1 (en) Data processing method, apparatus, and device
US20240143298A1 (en) Data processing methods and apparatuses, devices, and storage mediums
US20200065368A1 (en) Implementing enhanced autocomplete via multiple mobile keyboards
CN114239493A (en) Data interception method and device