WO2018095281A1 - 一种名称匹配方法及装置 - Google Patents

一种名称匹配方法及装置 Download PDF

Info

Publication number
WO2018095281A1
WO2018095281A1 PCT/CN2017/111604 CN2017111604W WO2018095281A1 WO 2018095281 A1 WO2018095281 A1 WO 2018095281A1 CN 2017111604 W CN2017111604 W CN 2017111604W WO 2018095281 A1 WO2018095281 A1 WO 2018095281A1
Authority
WO
WIPO (PCT)
Prior art keywords
name
matched
matching
standard
synonymous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/111604
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
孙清清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to MX2019006027A priority Critical patent/MX384762B/es
Priority to KR1020197018218A priority patent/KR102151367B1/ko
Priority to BR112019010669-3A priority patent/BR112019010669B1/pt
Priority to RU2019119526A priority patent/RU2725777C1/ru
Priority to AU2017364745A priority patent/AU2017364745C1/en
Priority to EP17874581.6A priority patent/EP3547164A4/en
Priority to JP2019528581A priority patent/JP6860668B2/ja
Priority to CA3044847A priority patent/CA3044847A1/en
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Publication of WO2018095281A1 publication Critical patent/WO2018095281A1/zh
Priority to US16/397,792 priority patent/US10726028B2/en
Priority to PH12019501163A priority patent/PH12019501163B1/en
Anticipated expiration legal-status Critical
Priority to ZA2019/04091A priority patent/ZA201904091B/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the field of computer software technologies, and in particular, to a name matching method and apparatus.
  • Person name matching is a very important technology in the field of risk control.
  • the risk control system records the determined names of the illegal users in the blacklist, and then, when performing risk control, scans each user's name and the name of each person in the blacklist for each user currently performing the business. Matching is performed. If the matching is successful, the user can be considered as an illegal user and refused to conduct business to prevent risks.
  • the matching of person names can be divided into exact matching of person names and fuzzy matching of person names.
  • fuzzy matching of person names is more technically difficult because it is difficult to grasp the appropriate degree of blurring.
  • a string matching algorithm is usually used for fuzzy matching of a person's name, and the string matching degree threshold determines the degree of blurring, and the string matching degree threshold is set by experience.
  • the string matching degree threshold is often The setting is lower, which tends to result in lower matching accuracy and higher false alarm rate of the risk control system.
  • the embodiment of the present invention provides a name matching method and device, which are used to solve the following technical problems: in the prior art, the string matching algorithm is used for the fuzzy matching of the person name, the matching accuracy is low, and the system false alarm rate is high.
  • Determining a module determining a set of standard names for matching the name to be matched
  • the detecting module detects the to-be-matched name to determine whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same;
  • the matching module determines the matching result of the to-be-matched name according to the detection result.
  • the name may include a name of a person
  • the name of the person to be matched in the actual application may be due to timeliness, uncertainty, and variability of the name of the data.
  • the name of the person to be matched is different from the actual person name. This is also the reason for the fuzzy matching.
  • the solution of the present application detects the matching person name for the reason to determine whether the name to be matched is the same as the name of the standard name set. The characters are not all the same, and the matching result of the person name is determined according to the detection result.
  • the degree of blurring is controlled by the string matching degree threshold set by experience, which is more favorable for improving the control of the degree of blur.
  • the degree can improve the matching accuracy and reduce the false alarm rate of the risk control system. Therefore, the problems in the prior art can be partially or completely reduced.
  • FIG. 1 is a schematic flowchart diagram of a name matching method according to an embodiment of the present application
  • FIG. 2 is a schematic flow chart of a specific implementation scheme of a primary screening in a name matching method according to an actual application scenario provided by an embodiment of the present application;
  • FIG. 3 is a schematic flowchart of a specific implementation scheme of an integrated algorithm for fuzzy matching in a name matching method according to an actual application scenario
  • FIG. 4 is a schematic flowchart of a specific implementation manner of a name matching method in an actual application scenario according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a name matching apparatus corresponding to FIG. 1 according to an embodiment of the present application.
  • the embodiment of the present application provides a name matching method and device.
  • the name of the person to be matched in the actual application may be due to the timeliness, uncertainty and variability of the person's name, resulting in the difference between the name of the person to be matched and the actual person's name (mainly the name of the person's name (ie, The character) "changes".
  • the English name is used as an example, and the common types and examples of English names are shown in Table 1.
  • the solution of the present application can perform specific detections such as abbreviate detection, address term detection, multi-language detection, alias detection, etc., so that the person name matching is synonymous and the characters are not all the same (ie, The situation is synonymous and different, which in turn can improve the matching accuracy.
  • “different shape” may refer to a different shape that is “incorrect” due to misspelling, but in the following embodiments, it mainly refers to a “reasonable and correct” different shape due to the above other types of changes. .
  • the solution of the present application is applicable not only to the matching of names, but also to matching of names other than names of people, such as place names, object names, and the like.
  • FIG. 1 is a schematic flowchart diagram of a name matching method according to an embodiment of the present application.
  • the devices that can execute the program include but are not limited to: personal computers, large and medium-sized computers, computer clusters, mobile phones, tablets, smart wearable devices, car machines, and the like. This process can usually be used in the field of risk control, specifically by the risk control system or related systems.
  • the process in Figure 1 can include the following steps:
  • the specific language to which the name is to be matched is not limited, and may be English, Russian, Spanish, etc., or may be Chinese.
  • the following embodiments mainly describe the language in which the name to be matched belongs is English.
  • the standard name set may be a subset selected from a larger set of names, or may be directly the larger name set itself.
  • the screening may be referred to herein.
  • primary screening For “primary screening.”
  • the larger set of names may be a blacklist held by the risk control system.
  • the subset may be a collection containing only names that are similar to the name to be matched, where "similarity" may not be so strict, since there are a series of operations to further determine similarity, which is equivalent to subsequent Also carry out "fine screening".
  • the matching range can be quickly shortened, thereby reducing the workload of the subsequent fine screening, improving the pertinence of the fine screening, and improving the efficiency of the solution of the present application.
  • S103 Detect the to-be-matched name to determine whether the to-be-matched name is related to the target At least one of the names in the quasi-name collection is synonymous and the characters are not all the same.
  • the “synonymous but not identical characters” to be detected is mainly caused by one or more types of changes in Table 1, and the detection may specifically include: abbreviate detection, address word detection, At least one of multi-language detection, nickname detection, and the like will be described in detail later.
  • the detection when the detection includes multiple types, multiple detections may be sequentially performed in a certain order. If the matching result of the name to be matched can be determined in the detection process, the remaining detection may not be performed.
  • S104 Determine, according to the detection result, a matching result of the to-be-matched name.
  • step S103 by performing step S103, if it is determined that the name to be matched is synonymous with at least one of the names in the standard name set and the characters are not all the same, the matching result of the to-be-matched name may be directly determined according to the situation. In this case, the detection process in step S103 is actually the entire matching process of the name to be matched.
  • the characters are not identical to the at least one of the standard name sets, and the characters are not all the same; the other matching manners may be further used to match the matching names to determine the matching result of the names to be matched.
  • the name may include a person's name
  • the name of the person to be matched in the actual application may be due to the timeliness, uncertainty, and variability of the name of the data, resulting in a difference between the name of the person to be matched and the actual name, which is also
  • the solution of the present application detects the matching person name for the reason, to determine whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same, and is determined according to the detection result.
  • the name matching result is only limited to the fuzzy degree controlled by the empirically set string matching degree threshold, which is more beneficial to improve the control accuracy of the fuzzy degree, can improve the matching accuracy, and reduce the risk control.
  • the false alarm rate of the system therefore, can partially or completely reduce the problems in the prior art.
  • the embodiment of the present application further provides some specific implementation manners of the method, and an extended solution, which will be described below.
  • the complexity of different to-be-matched names may be different, and the information to be included may also be different.
  • the information to be included may also be different.
  • the information to be included may also be different.
  • the information to be included may also be different.
  • the information features are too simple, even if they are matched, the obtained
  • the value of the matching results is also difficult to achieve.
  • too simple Common names such as English names “Jim”, “Jimmy”, “David”, “John”, “Mike”, etc., even if the matching is successful, it is difficult to specify a person.
  • the steps S101 and S102 after obtaining the to-be-matched name, determining, before determining the standard name set that matches the to-be-matched name, performing: obtaining a predetermined non-matching name set; determining the to-be-matched Whether the name is included in the non-matching list set; if yes, the subsequent steps are continued; otherwise, the to-be-matched names may not be matched.
  • the determining the set of standard names for matching the to-be-matched name may include: determining a first set of names usable for matching the to-be-matched name; Each word is similarly matched with each word included in the name in the first name set, and a standard name set for matching the name to be matched is determined.
  • the similarity matching between each word included in the name to be matched and each word included in the name in the first name set is determined, and a standard name matching the name to be matched is determined.
  • the collection may include: obtaining an index of each name included in the first name set, the index of the name is any word included in the name; and segmenting the to-be-matched name to obtain the to-be-matched name
  • Each of the included words; each of the words included in the name to be matched is similarly matched with each of the indexes, and a subset of the first name set consisting of the names indexed by the successfully matched indexes is obtained. Determining, according to each of the subsets, a set of standard names for matching the name to be matched.
  • index in the above example is pre-established.
  • the advantage of word segmentation based on index is that it can effectively speed up the acquisition of the required names in the collection during the matching process. If not based on the index, word segmentation can still be achieved (for example, querying the required name directly in the data table of the stored collection for the word segmentation, etc.), but the efficiency may be affected.
  • the similarity matching may include: performing a similarity matching on each of the words included in the to-be-matched name by using a string matching algorithm;
  • the string matching algorithm may include one or more For example, a prefix tree matching algorithm, a dictionary tree matching algorithm, a string similarity matching algorithm, a pronunciation similar matching algorithm, and the like.
  • the string matching algorithm is only a better method, and other algorithms such as a text matching algorithm that can be used to achieve similarity matching can also be used.
  • the determining a set of standard names for matching the name to be matched according to each of the subsets may also have various specific implementations. For example, if N string matching algorithms are used, for each of the M words obtained by the word segmentation, each word matching algorithm is used to match the word with each index, correspondingly to obtain N subsets, and then take N The union of the subsets; a total of M unions are obtained, and the names of the total number of times exceeding the set threshold are taken in M and determined as a standard name set. For another example, the N subsets are obtained, and then the intersection of the N subsets is determined as a standard name set.
  • the "primary screening" process is described above.
  • the embodiment of the present application also provides a simplified flow diagram of a specific implementation scheme of the initial screening in the name matching method in an actual application scenario, as shown in FIG. 2 . Shown.
  • the first name set is a list of English names, and an index of each name in the list is preset (the index can also be indexed to obtain related information other than the corresponding name), and the index is specifically corresponding thereto.
  • the word in the name, each index is included in the index table established by the word corresponding to the index as the primary key.
  • the name to be matched is the English name “Kit Wai Jackson Wong”, and the name is segmented by a space as a separator.
  • the result of the word segmentation is as follows: 2 is shown.
  • the word segmentation result is specifically ⁇ Kit, Wai, Jackson, Wong ⁇ , using prefix tree matching algorithm, dictionary tree matching algorithm, string similarity matching algorithm (such as Simstring algorithm, etc.), pronunciation similar matching
  • the algorithm (such as the metaphone algorithm) respectively matches each word obtained by the word segment with each index, and outputs a set of names corresponding to the index obtained by the single word through the four matching algorithms, and matches the results 1, 2, 3, and 4;
  • the matching name may be detected.
  • one or more pre-processings can be performed prior to detection, which facilitates the reliability of subsequent test results.
  • the pre-processing may include alignment processing, case-synthesis processing, Simplified and traditional processing, and the like.
  • the method further: performing, according to the similarity between each word included in the name in the standard name set and each word included in the to-be-matched name, to the standard The words contained in the name in the name set are aligned with the words included in the name to be matched.
  • the specific principle of the alignment processing based on the similarity may be: based on the principle of similarity maximization, the names of the words with the largest similarity are aligned.
  • the step of detecting the to-be-matched name may include: detecting the to-be-matched name according to the aligned set of standard names, It is determined whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same.
  • the detection in step S103 is a key part of improving the matching accuracy of the name name fuzzy matching.
  • the detecting the to-be-matched name includes:
  • the abbreviating detection of the to-be-matched name may include: obtaining predetermined abbreviated control combination data, each of the abbreviated control combinations reflecting an abbreviation correspondence between at least one word and its abbreviation And determining, according to the abbreviation control combination data, whether the word included in the name to be matched and the word included in the name in the standard name set have the abbreviation correspondence relationship; determining, according to the detection result, whether the to-be-matched name is At least one of the names in the standard name set is synonymous and the characters are not all the same.
  • the performing the title word detection on the to-be-matched name may include: obtaining predetermined title word data; and detecting, according to the title word data, whether the to-be-matched name includes the title term The term is considered to not affect the meaning of the name including the term; according to the detection result, it is determined whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same.
  • the title is generally a generic title (for example, Mr. or Miss., etc.), a title (Dr., Prof., etc.), and the like, which is attached to at least a part of the original name.
  • the address term does not affect the meaning of the at least part of the original name corresponding to it. Therefore, if it is determined that the name to be matched includes a title, and the other part of the reference name can match a name in the standard name set, it can be determined that the name to be matched is synonymous with at least one name in the standard name set. Not all the same.
  • the performing the nickname detection on the to-be-matched name may include: obtaining predetermined nickname comparison combination data, each of the abbreviated control combinations reflecting a correspondence relationship between at least one word and another nickname;
  • the nickname is combined with the data, and the word included in the name to be matched is compared with the word included in the name in the standard name set, and whether the word to be matched is determined according to the detection result.
  • At least one of the names in the standard name set is synonymous and the characters are not all the same.
  • nicknames can include nicknames for their corresponding names (for example, Mick in Table 1) Is the name of Mikey's nickname, etc., or its corresponding name in a different domain.
  • different fields may be different regions (for example, different countries, different provinces, etc.), different languages (for example, languages of different countries, languages of different nationalities, etc.), different industries, and the like.
  • the nickname detection is performed on the to-be-matched name, and/or the synonymous name detection in different domains is performed on the to-be-matched name.
  • the nickname does not affect the meaning of its corresponding name. Therefore, if it is determined that the name to be matched is another name of a name in the standard name set, it can be determined that the name to be matched is synonymous with at least one name in the standard name set and the character is incomplete. the same.
  • the present application does not limit the specific storage format of the comparison combination data and the reference word data, and a commonly used method is stored in the corresponding data table, and is read from the database when needed.
  • the performing multi-language detection on the to-be-matched name may include: determining a language corresponding to the to-be-matched name; obtaining a spelling-synonymous rule of the language and/or other language; a spelling disambiguation rule; detecting the to-be-matched name according to the spelling-synonymous synonymous rule and/or spelling-synonymous rule to determine whether the to-be-matched name is at least in the set of standard names A name is synonymous and the characters are not all the same.
  • Multilingual detection is mainly for situations such as the following: For example, the English word “Pooh” is spelled “puh” in German, and the two appear to be synonymous in the name.
  • the determining the matching result of the to-be-matched name according to the detection result may include: if it is determined that the to-be-matched name is synonymous with at least one of the standard name sets, and the characters are not all the same Determining, by the at least one name, a matching result of the name to be matched; otherwise (that is, when the matching fails, the matching is successful), by using one or more similarity algorithms to the matching name Matching the name in the standard name set to determine a matching result of the to-be-matched name.
  • the multiple similarity algorithms may be based on different dimensions, thereby facilitating improvement of the reliability of the matching result.
  • the similarity algorithm may specifically be an algorithm for calculating a text matching degree (for example, an n-gram algorithm, etc.), and an algorithm for calculating a speech matching degree.
  • the Phonex algorithm, etc. an algorithm for calculating the degree of string matching (for example, the Jaro-Winkler algorithm, etc.).
  • the comprehensive matching result can be obtained by comprehensively measuring the matching results corresponding to the similarity algorithms.
  • the specific measurement method is not limited in this application, and a commonly used method is Weighted addition.
  • the algorithm inputs the words that are included in the name to be matched and the words whose positions are aligned, and the algorithm outputs the text matching degree of each pair of words aligned, denoted as F1;
  • the algorithm inputs the words that are included in the name to be matched and the words at the aligned positions, and the algorithm outputs the string matching degree of each pair of words aligned, denoted as F2;
  • the algorithm inputs the words that are included in the name to be matched and the words in the aligned position, and the algorithm outputs the voice matching degree of each pair of words aligned, denoted as F3;
  • the comprehensive matching degree F of each pair of aligned words is obtained, as shown in the following formula:
  • the matching result between the name to be matched and the name in the standard name set is obtained by averaging, for example, for the pair of names in Table 3, the matching result obtained is as shown in Table 4 below. .
  • the detection and matching process after the initial screening is described above, and a plurality of matching related algorithms may be used in the process.
  • the specific solution of the present application may integrate these possible algorithms, and the process is integrated. It is a process of fuzzy matching of integrated algorithms.
  • some post-rule filtering may be further performed, for example, mapping the matching degree in the matching result to the text description information, or appropriately matching the matching degree according to the specific scenario. Compensation or reduction, etc.
  • the embodiment of the present application further provides a schematic flowchart of a specific implementation scheme of the fuzzy matching of the integrated algorithm in the name matching method in an actual application scenario, as shown in FIG. 3 .
  • FIG. 3 the order of execution of each detection and matching calculation of various modes is merely an example, and is not intended to limit the application.
  • the matching degree may be calculated by one or more methods (for example, text matching degree calculation, voice matching) Degree calculation, string matching degree calculation, etc.), determine and output the matching result of the name to be matched.
  • the embodiment of the present application further provides a schematic flowchart of a specific implementation manner of a name matching method in an actual application scenario, as shown in FIG. 4 .
  • the pre-rule decision may include: determining whether the to-be-matched name is not included in the non-matching list set, and the scan list index is the index of the name in the above-mentioned standard name set.
  • FIGS. 3 and 4 are described in detail above, and are not described herein.
  • the embodiment of the present application further provides a corresponding device, as shown in FIG. 5 .
  • FIG. 5 is a schematic structural diagram of a name matching apparatus corresponding to FIG. 1 according to an embodiment of the present application, and a dotted line indicates an optional module, where the apparatus includes:
  • a determining module 502 determining a set of standard names for matching the name to be matched
  • the detecting module 503 is configured to detect the to-be-matched name to determine whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same;
  • the matching module 504 determines, according to the detection result, a matching result of the to-be-matched name.
  • the determining module 502 determines a standard name set for matching the to-be-matched name, obtaining a predetermined non-matching name set, determining that the to-be-matched name is not included in the non-matching list set.
  • the determining module 502 determines a standard name set for matching the name to be matched, and specifically includes:
  • the determining module 502 obtains a first name set that can be used to match the to-be-matched name, by matching the words included in the to-be-matched name with the words included in the name in the first name set, A set of standard names for matching the name to be matched is determined.
  • the determining module 502 determines, by using a similarity match between each word included in the name to be matched and each word included in the name in the first name set, a criterion for matching the name to be matched.
  • a collection of names including:
  • the determining module 502 obtains an index of each name included in the first name set, where the index of the name is any word included in the name, and the word to be matched is segmented, and the to-be-matched name is obtained. Each word of the word to be matched, and each of the words included in the name to be matched is similarly matched with each of the indexes, and a subset of the first name set composed of names indexed by the successfully matched indexes is obtained. Based on each of the subsets, a set of standard names for matching the name to be matched is determined.
  • the determining module 502 respectively performs similarity matching on each of the words included in the name to be matched, and specifically includes:
  • the determining module 502 uses a string matching algorithm to match the words included in the to-be-matched name with each index, and the string matching algorithm includes at least one of the following: a prefix tree matching algorithm. , dictionary tree matching algorithm, string similarity matching algorithm, pronunciation similarity matching algorithm.
  • the device further includes:
  • the detecting module 503 detects the to-be-matched name, and specifically includes:
  • the detecting module 503 detects the to-be-matched name according to the aligned standard name set to determine whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same.
  • the detecting module 503 detects the to-be-matched name, and specifically includes:
  • the detecting module 503 performs at least one of an abbreviation detection, a terminating word detection, a multi-language detection, and another name detection on the to-be-matched name to determine whether the to-be-matched name is related to the standard name set. At least one of the names is synonymous and the characters are not all the same.
  • the detecting module 503 performs an abbreviated detection on the to-be-matched name, and specifically includes:
  • the detecting module obtains a predetermined abbreviation comparison combination data, each of the abbreviated control combinations reflects an abbreviation correspondence relationship between at least one word and its abbreviation, and detects a word and a place included in the to-be-matched name according to the abbreviation control combination data. Whether the word included in the name in the standard name set has the abbreviation correspondence relationship, and according to the detection result, it is determined whether the to-be-matched name is synonymous with at least one of the standard name set and the characters are not all the same.
  • the detecting module 503 performs the title word detection on the to-be-matched name, and specifically includes:
  • the detecting module 503 obtains predetermined term word data, and according to the terminating word data, detecting whether the to-be-matched name includes the term, the term is considered not to affect the meaning of the name including the term, according to As a result of the detection, it is determined whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same.
  • the nickname includes a nickname of its corresponding name, or a synonymous name of its corresponding name in different fields;
  • the detecting module 503 performs another name detection on the to-be-matched name, and specifically includes:
  • the detecting module 503 performs nickname detection on the to-be-matched name, and/or performs synonymous name detection in different domains on the to-be-matched name.
  • the detecting module 503 performs multi-language detection on the to-be-matched name, and specifically includes:
  • the detecting module 503 determines a language corresponding to the name to be matched, obtains a spelling synonymous rule and/or a spelling disambiguation rule of the language, and synonymous rules and/or spelling disambiguation rules according to the spelling deformation And detecting the to-be-matched name to determine whether the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same.
  • the detecting module 503 performs another name detection on the to-be-matched name, and specifically includes:
  • the detecting module 503 obtains predetermined nickname comparison combination data, each of the abbreviated control combinations reflects an nickname correspondence relationship between the at least one word and another nickname, and detects the word and the name included in the to-be-matched name according to the nickname comparison combination data. Whether the word included in the name in the standard name set has the nickname correspondence, and according to the detection result, determining whether the to-be-matched name is synonymous with at least one name in the standard name set and the characters are not all the same.
  • the matching module 504 determines, according to the detection result, a matching knot of the to-be-matched name. Specifically, including:
  • the matching module 504 determines that the to-be-matched name is synonymous with at least one of the standard name sets and the characters are not all the same, and the at least one name is determined as the matching result of the to-be-matched name. Otherwise, the matching result of the to-be-matched name is determined by matching the name to be matched with the name in the standard name set by using one or more similarity algorithms.
  • the similarity algorithm includes at least one of: an algorithm for calculating a text matching degree, an algorithm for calculating a voice matching degree, and an algorithm for calculating a string matching degree.
  • the name is a person's name.
  • the device and the method provided in the embodiments of the present application have a one-to-one correspondence. Therefore, the device also has similar beneficial technical effects as the corresponding method. Since the beneficial technical effects of the method have been described in detail above, the details are not described herein. A beneficial technical effect of the corresponding device.
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • the controller can be implemented in any suitable manner, for example, the controller can take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor.
  • computer readable program code eg, software or firmware
  • examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, The Microchip PIC18F26K20 and the Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic.
  • the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding.
  • Such a controller can therefore be considered a hardware component, and the means for implementing various functions included therein can also be considered as a structure within the hardware component.
  • a device for implementing various functions can be considered as a software module that can be both a method of implementation and a structure within a hardware component.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer.
  • the computer can be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
  • embodiments of the present invention can be provided as a method, system, or meter.
  • Computer program product Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), fast Flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic storage device or any other non-
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • fast Flash memory or other memory technology
  • CD-ROM compact disc
  • DVD digital versatile disc
  • magnetic cassette magnetic tape storage or other magnetic storage
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the present application can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Automation & Control Theory (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Alarm Systems (AREA)
  • Stored Programmes (AREA)
PCT/CN2017/111604 2016-11-25 2017-11-17 一种名称匹配方法及装置 Ceased WO2018095281A1 (zh)

Priority Applications (11)

Application Number Priority Date Filing Date Title
JP2019528581A JP6860668B2 (ja) 2016-11-25 2017-11-17 名前マッチング方法および装置
BR112019010669-3A BR112019010669B1 (pt) 2016-11-25 2017-11-17 Método implementado por computador para a correspondência de nomes, meio de armazenamento legível por computador não transitório e sistema implementado por computador
RU2019119526A RU2725777C1 (ru) 2016-11-25 2017-11-17 Способ и устройство для сопоставления имен
AU2017364745A AU2017364745C1 (en) 2016-11-25 2017-11-17 Name matching method and apparatus
EP17874581.6A EP3547164A4 (en) 2016-11-25 2017-11-17 NAME COMPENSATION PROCESS AND DEVICE
MX2019006027A MX384762B (es) 2016-11-25 2017-11-17 Método y aparato para comparar nombres.
KR1020197018218A KR102151367B1 (ko) 2016-11-25 2017-11-17 이름들을 매칭시키기 위한 방법 및 장치
CA3044847A CA3044847A1 (en) 2016-11-25 2017-11-17 Method and apparatus for matching names
US16/397,792 US10726028B2 (en) 2016-11-25 2019-04-29 Method and apparatus for matching names
PH12019501163A PH12019501163B1 (en) 2016-11-25 2019-05-24 Method and apparatus for matching names
ZA2019/04091A ZA201904091B (en) 2016-11-25 2019-06-24 Name matching method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611055619.8A CN108108373B (zh) 2016-11-25 2016-11-25 一种名称匹配方法及装置
CN201611055619.8 2016-11-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/397,792 Continuation US10726028B2 (en) 2016-11-25 2019-04-29 Method and apparatus for matching names

Publications (1)

Publication Number Publication Date
WO2018095281A1 true WO2018095281A1 (zh) 2018-05-31

Family

ID=62196168

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/111604 Ceased WO2018095281A1 (zh) 2016-11-25 2017-11-17 一种名称匹配方法及装置

Country Status (14)

Country Link
US (1) US10726028B2 (https=)
EP (1) EP3547164A4 (https=)
JP (1) JP6860668B2 (https=)
KR (1) KR102151367B1 (https=)
CN (1) CN108108373B (https=)
AU (1) AU2017364745C1 (https=)
BR (1) BR112019010669B1 (https=)
CA (1) CA3044847A1 (https=)
MX (1) MX384762B (https=)
PH (1) PH12019501163B1 (https=)
RU (1) RU2725777C1 (https=)
TW (1) TWI724237B (https=)
WO (1) WO2018095281A1 (https=)
ZA (1) ZA201904091B (https=)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962232B (zh) * 2018-07-16 2021-01-01 上海小蚁科技有限公司 语音识别方法及装置、存储介质、终端
CN109408561A (zh) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 业务名称匹配方法及装置
CN109189809B (zh) * 2018-10-17 2020-01-03 北京金堤科技有限公司 一种股东名称关联匹配的方法和装置
CN109472029B (zh) * 2018-11-09 2023-04-07 天津开心生活科技有限公司 药品名称处理方法与装置
CN109471960B (zh) * 2018-11-13 2020-10-13 深圳市景旺电子股份有限公司 智能识别pcb资料工具层名的方法及装置
CN109840316A (zh) * 2018-12-21 2019-06-04 上海诺悦智能科技有限公司 一种客户信息制裁名单匹配系统
GB201902772D0 (en) * 2019-03-01 2019-04-17 Palantir Technologies Inc Fuzzy searching 7 applications thereof
CN110909532B (zh) * 2019-10-31 2021-06-11 银联智惠信息服务(上海)有限公司 用户名称匹配方法、装置、计算机设备和存储介质
CN111092758A (zh) * 2019-12-06 2020-05-01 上海上讯信息技术股份有限公司 降低告警及恢复误报的方法、装置及电子设备
US12079282B2 (en) * 2020-03-12 2024-09-03 Oracle International Corporation Name matching engine boosted by machine learning
CN111563139B (zh) * 2020-07-15 2020-10-23 平安国际智慧城市科技股份有限公司 Ocr识别发票药品名的校验方法、装置及计算机设备
CN113268986B (zh) * 2021-05-24 2024-05-24 交通银行股份有限公司 一种基于模糊匹配算法的单位名称匹配、查找方法及装置
US20230039689A1 (en) * 2021-08-05 2023-02-09 Ebay Inc. Automatic Synonyms, Abbreviations, and Acronyms Detection
CN113822049B (zh) * 2021-09-29 2023-08-25 平安银行股份有限公司 基于人工智能的地址审核方法、装置、设备及存储介质
WO2023132029A1 (ja) * 2022-01-06 2023-07-13 日本電気株式会社 情報処理装置、情報処理方法及びプログラム
CN114595379B (zh) * 2022-01-17 2025-09-19 国投智能(厦门)信息股份有限公司 一种数据标准的智能推荐方法及装置
KR102693782B1 (ko) * 2022-05-26 2024-08-08 주식회사 카카오게임즈 닉네임 간 유사도를 이용하여 다중 접속계정을 탐지하기 위한 방법 및 장치
US12282486B2 (en) 2022-04-29 2025-04-22 Oracle International Corporation Address matching from single string to address matching score
CN114880430B (zh) * 2022-05-10 2023-07-18 马上消费金融股份有限公司 名称处理方法及装置
JP2024094499A (ja) * 2022-12-28 2024-07-10 富士通株式会社 対訳コーパス生成プログラム、対訳コーパス生成方法および情報処理装置
CN116244421A (zh) * 2023-03-03 2023-06-09 广联达科技股份有限公司 项目名称匹配的方法、装置、设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167056A (zh) * 2013-01-31 2013-06-19 中国科学院计算机网络信息中心 一种基于自动审核的域名注册方法
CN104331475A (zh) * 2014-11-04 2015-02-04 郑州悉知信息技术有限公司 一种信息检测方法及装置
EP2860645A1 (en) * 2013-07-09 2015-04-15 G-Cloud Technology Ltd Algorithm for fast character string matching
CN104765858A (zh) * 2015-04-21 2015-07-08 北京航天长峰科技工业集团有限公司上海分公司 公安用同义词库的构建方法及获得的公安用同义词库
CN105843950A (zh) * 2016-04-12 2016-08-10 乐视控股(北京)有限公司 敏感词过滤方法及装置

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8812300B2 (en) * 1998-03-25 2014-08-19 International Business Machines Corporation Identifying related names
US7313513B2 (en) * 2002-05-13 2007-12-25 Wordrake Llc Method for editing and enhancing readability of authored documents
US20040024760A1 (en) * 2002-07-31 2004-02-05 Phonetic Research Ltd. System, method and computer program product for matching textual strings using language-biased normalisation, phonetic representation and correlation functions
US8423563B2 (en) * 2003-10-16 2013-04-16 Sybase, Inc. System and methodology for name searches
US20060074883A1 (en) 2004-10-05 2006-04-06 Microsoft Corporation Systems, methods, and interfaces for providing personalized search and information access
US8700568B2 (en) * 2006-02-17 2014-04-15 Google Inc. Entity normalization via name normalization
US9026514B2 (en) * 2006-10-13 2015-05-05 International Business Machines Corporation Method, apparatus and article for assigning a similarity measure to names
JP2010519655A (ja) * 2007-02-26 2010-06-03 ベイシス テクノロジー コーポレーション 名前照合システムの名前インデックス付け
CN101727464B (zh) * 2008-10-29 2012-08-08 北京搜狗科技发展有限公司 获取别称匹配对的方法及装置
US20110055234A1 (en) * 2009-09-02 2011-03-03 Nokia Corporation Method and apparatus for combining contact lists
TWI443529B (zh) 2010-04-01 2014-07-01 Inst Information Industry 自動化領域名詞建置方法及系統,及其電腦程式產品
US9424556B2 (en) * 2010-10-14 2016-08-23 Nokia Technologies Oy Method and apparatus for linking multiple contact identifiers of an individual
US8468167B2 (en) * 2010-10-25 2013-06-18 Corelogic, Inc. Automatic data validation and correction
US8364692B1 (en) * 2011-08-11 2013-01-29 International Business Machines Corporation Identifying non-distinct names in a set of names
US9275339B2 (en) * 2012-04-24 2016-03-01 Raytheon Company System and method for probabilistic name matching
US9229926B2 (en) 2012-12-03 2016-01-05 International Business Machines Corporation Determining similarity of unfielded names using feature assignments
CN103970798B (zh) * 2013-02-04 2019-05-28 商业对象软件有限公司 数据的搜索和匹配
US10089302B2 (en) 2013-02-26 2018-10-02 International Business Machines Corporation Native-script and cross-script chinese name matching
CN103177122B (zh) * 2013-04-15 2017-04-26 天津理工大学 一种基于同义词的个人桌面文件搜索方法
US9691075B1 (en) * 2014-03-14 2017-06-27 Wal-Mart Stores, Inc. Name comparison
US9535903B2 (en) * 2015-04-13 2017-01-03 International Business Machines Corporation Scoring unfielded personal names without prior parsing
CN104820713B (zh) 2015-05-19 2018-02-27 苏州中炎工业科技有限公司 一种基于用户历史数据获得工业产品名称同义词的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167056A (zh) * 2013-01-31 2013-06-19 中国科学院计算机网络信息中心 一种基于自动审核的域名注册方法
EP2860645A1 (en) * 2013-07-09 2015-04-15 G-Cloud Technology Ltd Algorithm for fast character string matching
CN104331475A (zh) * 2014-11-04 2015-02-04 郑州悉知信息技术有限公司 一种信息检测方法及装置
CN104765858A (zh) * 2015-04-21 2015-07-08 北京航天长峰科技工业集团有限公司上海分公司 公安用同义词库的构建方法及获得的公安用同义词库
CN105843950A (zh) * 2016-04-12 2016-08-10 乐视控股(北京)有限公司 敏感词过滤方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3547164A4 *

Also Published As

Publication number Publication date
EP3547164A1 (en) 2019-10-02
KR20190084319A (ko) 2019-07-16
US20190251085A1 (en) 2019-08-15
EP3547164A4 (en) 2019-10-16
JP2020501255A (ja) 2020-01-16
AU2017364745A1 (en) 2019-06-20
BR112019010669A2 (pt) 2019-10-01
MX2019006027A (es) 2019-08-14
ZA201904091B (en) 2021-05-26
TWI724237B (zh) 2021-04-11
PH12019501163A1 (en) 2020-02-24
JP6860668B2 (ja) 2021-04-21
CN108108373B (zh) 2020-09-25
MX384762B (es) 2025-03-14
BR112019010669B1 (pt) 2021-12-07
CN108108373A (zh) 2018-06-01
PH12019501163B1 (en) 2023-10-13
KR102151367B1 (ko) 2020-09-03
CA3044847A1 (en) 2018-05-31
AU2017364745B2 (en) 2020-04-09
RU2725777C1 (ru) 2020-07-06
AU2017364745C1 (en) 2020-09-10
TW201820179A (zh) 2018-06-01
US10726028B2 (en) 2020-07-28

Similar Documents

Publication Publication Date Title
WO2018095281A1 (zh) 一种名称匹配方法及装置
TWI685761B (zh) 詞向量處理方法及裝置
US11436252B2 (en) Data processing methods, apparatuses, and devices
CN109388801B (zh) 相似词集合的确定方法、装置和电子设备
JP2020510852A (ja) 音声機能制御方法および装置
WO2019105134A1 (zh) 词向量处理方法、装置以及设备
WO2019154162A1 (zh) 一种风控规则生成方法和装置
WO2019007288A1 (zh) 一种风险地址识别方法、装置以及电子设备
CN108874765B (zh) 词向量处理方法及装置
CN105630763B (zh) 用于提及检测中的消歧的方法和系统
US20180157646A1 (en) Command transformation method and system
CN107402945A (zh) 词库生成方法及装置、短文本检测方法及装置
CN107329964B (zh) 一种文本处理方法及装置
US20260073299A1 (en) Methods and apparatuses for training prompt injection detection model, storage media, and electronic devices
TW201911289A (zh) 用於分割句子的系統和方法
CN107577659A (zh) 词向量处理方法、装置以及电子设备
CN115658891B (zh) 一种意图识别的方法、装置、存储介质及电子设备
OA19238A (en) Name matching method and apparatus.
WO2025139937A1 (zh) 一种关键词的扩展方法、装置和存储介质
CN109190115A (zh) 一种文本匹配方法、装置、服务器及存储介质
CN107391591B (zh) 数据处理方法、装置及服务器
CN115222262A (zh) 数据处理方法、装置及设备
HK1248866A1 (zh) 数据应答处理方法、装置及服务器
HK1249780B (zh) 词向量处理方法、装置以及电子设备
HK1248844B (zh) 词向量处理方法、装置以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17874581

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3044847

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2019528581

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112019010669

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2017364745

Country of ref document: AU

Date of ref document: 20171117

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20197018218

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017874581

Country of ref document: EP

Effective date: 20190625

ENP Entry into the national phase

Ref document number: 112019010669

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20190524