WO2017197802A1 - 字符串模糊匹配方法及装置 - Google Patents

字符串模糊匹配方法及装置 Download PDF

Info

Publication number
WO2017197802A1
WO2017197802A1 PCT/CN2016/096429 CN2016096429W WO2017197802A1 WO 2017197802 A1 WO2017197802 A1 WO 2017197802A1 CN 2016096429 W CN2016096429 W CN 2016096429W WO 2017197802 A1 WO2017197802 A1 WO 2017197802A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
target
source
matching
target text
Prior art date
Application number
PCT/CN2016/096429
Other languages
English (en)
French (fr)
Inventor
曾红
Original Assignee
深圳Tcl数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl数字技术有限公司 filed Critical 深圳Tcl数字技术有限公司
Publication of WO2017197802A1 publication Critical patent/WO2017197802A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a string fuzzy matching method and apparatus.
  • the exact matching algorithm is usually used for searching operations, such as search and replace in text editing, index retrieval in the database, etc., and the matching requirements are strict and accurate, and the algorithm has a matching algorithm with backtracking and KMP algorithm. Wait.
  • the exact matching algorithm it is necessary to find the data to be searched from the target data when the data to be searched is exactly the same as the target data. In some cases, people often cannot give complete data to be searched (the string to be searched). If the exact matching algorithm is used, the matching algorithm can be successfully matched because the string to be searched is exactly the same as the target string. Therefore, in this case, the exact matching algorithm is used to obtain the search result, and the recognition rate of the search string is low.
  • the main object of the present invention is to provide a string fuzzy matching method and apparatus, which aims to solve the problem that the prior art uses the exact matching method to find a string with a low recognition rate.
  • the present invention provides a string fuzzy matching method, and the string fuzzy matching method includes the following steps:
  • the step of acquiring the target text of the respective target texts that is greater than or equal to the target text of the first preset threshold, and using the obtained target text as the matching target text includes:
  • the present invention further provides a string fuzzy matching method, where the string fuzzy matching method includes the following steps:
  • the present invention further provides a string fuzzy matching apparatus, where the string fuzzy matching apparatus includes:
  • the obtaining module is configured to obtain the number of characters matched by the source text and each target text;
  • a first calculating module configured to calculate a source matching degree of each target text according to the matched number of characters and the number of characters of the source text
  • the obtaining module is further configured to acquire, according to the number of fields of the source text, a first preset threshold corresponding to the source text;
  • the first as a module is configured to acquire the target text whose source matching degree of the respective target texts is greater than or equal to the first preset threshold, and use the acquired target text as the matching target text.
  • the invention obtains the number of characters matched by the source text and each target text; calculates the source matching degree of each target text according to the number of characters; and sequentially determines whether the source matching degree of each target text satisfies the first preset condition; If yes, the target text that satisfies the first preset condition is used as the matching target text. Since the embodiment uses the fuzzy matching method to find the matching target text, instead of using the exact search method to find the matching target text, the recognition rate of the character string is effectively improved.
  • FIG. 1 is a schematic flow chart of a first embodiment of a string fuzzy matching method according to the present invention
  • FIG. 2 is a schematic flowchart diagram of a second embodiment of a string fuzzy matching method according to the present invention.
  • FIG. 3 is a schematic flowchart diagram of a third embodiment of a string fuzzy matching method according to the present invention.
  • FIG. 4 is a schematic flowchart diagram of a fourth embodiment of a string fuzzy matching method according to the present invention.
  • FIG. 5 is a schematic flowchart diagram of a fifth embodiment of a string fuzzy matching method according to the present invention.
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of a string fuzzy matching apparatus according to the present invention.
  • FIG. 7 is a schematic diagram of functional modules of a second embodiment of a string fuzzy matching apparatus according to the present invention.
  • FIG. 8 is a schematic diagram of functional modules of a third embodiment of a string fuzzy matching apparatus according to the present invention.
  • FIG. 9 is a schematic diagram of functional modules of a fourth embodiment of a string fuzzy matching apparatus according to the present invention.
  • FIG. 10 is a schematic diagram of functional modules of a fifth embodiment of a string fuzzy matching apparatus according to the present invention.
  • the present invention provides a string fuzzy matching method.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a string fuzzy matching method according to the present invention.
  • the string fuzzy matching method includes:
  • Step S10 obtaining the number of characters matching the source text and each target text
  • the source text is text input by a user
  • the source text may be a voice text, a Chinese text, or a pinyin text.
  • Each of the target texts is text for matching with the source text, and the respective target texts may also be voice text, Chinese text, or pinyin text.
  • the system After receiving the source text input by the user, the system performs matching operation on the received source text with each locally stored target text, and searches for characters whose source text matches each target text, that is, searches for characters in the respective target texts. A character that matches the characters in the source text, and then counts the number of characters that each target text matches the source text.
  • Step S20 Calculate a source matching degree of each target text according to the matched number of characters and the number of characters of the source text;
  • the source matching degree of each target text may be calculated according to the number of characters and the number of characters of the source text, where the source matching degree is the number of matched characters and the source text.
  • the percentage of the number of characters, that is, the source matching degree the number of matching characters / the number of characters of the source text * 100%. For example, if the number of source text characters is 8 characters, and the number of characters matching each target text and the source text is 5 characters, 4 characters, 6 characters, 1 character, 0 characters, etc.,
  • the source matching degree of each target text is 62.5%, 50.0%, 75.0%, 12.5%, 0, and so on.
  • Step S30 Obtain a first preset threshold corresponding to the source text according to the number of fields of the source text.
  • step S40 the target text of the respective target texts with the source matching degree greater than or equal to the first preset threshold is obtained, and the obtained target text is used as the matching target text.
  • the matching target text may be searched by sequentially determining whether the source matching degree of each target text is greater than or equal to the first preset threshold. If the target text of the target text has a source matching degree greater than or equal to the first preset threshold, the target matching degree is greater than or equal to the first preset threshold target text as the matching target text, if the source matching degree is greater than or If there are multiple target texts equal to the first preset threshold, all the target matching degrees are greater than or equal to the first preset threshold target text as the matching target text; if the source matching degree of the target text is less than the first preset threshold, Then the target text is not the target text that matches the source text, ie the target text does not match the source text.
  • the first preset threshold is related to the number of fields of the source text, that is, the source text of the different field numbers corresponds to different first preset thresholds, wherein the number of the fields is Chinese in the source text. The number of characters. Therefore, before determining whether the source matching degree of the target text is greater than or equal to the first preset threshold, the number of fields of the source text needs to be determined, and then the first preset threshold corresponding to the source text is obtained according to the number of fields of the source text.
  • the first preset threshold may be set according to the number of fields of the source text. For example, if the number of fields of the source text is less than or equal to 2, the first preset threshold may be set to 1.
  • the first preset threshold is 1 when the source matching degree of the target text is 100%, the target text. If the number of fields of the source text is greater than 2, that is, the number of Chinese characters in the source text is more than 2, the first preset threshold may be set to 0.67, and the first preset threshold is 0.67 for the target. When the source match of the text is 67% or more, the target text matches the source text. It should be noted that the three values mentioned above may be freely set and dynamically adjusted according to actual needs, and more first preset thresholds may be set according to actual needs, which is not limited in this embodiment.
  • the first preset threshold when the number of source text fields exceeds two, the first preset threshold is set to 0.67, and when the number of source text fields is less than or equal to two, the first preset threshold is set to 1, that is, if the user says one Or two words, all must match, if you say three words and above, you must match to 2/3 or more.
  • the number of characters matching the source text and each target text is obtained; the source matching degree of each target text is calculated according to the number of characters; and the source matching degree of each target text is sequentially determined to satisfy the first preset condition. If yes, the target text that satisfies the first preset condition is used as the matching target text. Since the embodiment uses the fuzzy matching method to find the matching target text, instead of using the exact search method to find the matching target text, the recognition rate of the character string is effectively improved.
  • FIG. 2 is a schematic flowchart diagram of a second embodiment of a string fuzzy matching method according to the present invention. Based on the first embodiment of the string fuzzy matching method, the step S40 includes:
  • Step S41 determining a target text with the highest source matching degree according to the calculated source matching degree of each target text
  • the source matching degree of each target text can be compared, and then the target text with the highest source matching degree can be selected. It should be noted that if there are multiple target texts with the highest source matching degree, it is necessary to select the target text with the highest source matching degree.
  • Step S42 determining whether the source matching degree of the target text with the highest source matching degree is greater than or equal to the first preset threshold value
  • Step S43 if yes, the target text with the highest source matching degree is used as the matching target text.
  • the source matching degree of the target text with the highest source matching degree may be small, that is, the difference between the selected target text and the source text is large, and may not be the user.
  • the target text is required. Therefore, it is necessary to discard the selected target text. Therefore, after selecting the target text with the highest source matching degree, it can be determined whether the source matching degree of the target text with the highest source matching degree is greater than or equal to the above.
  • a first preset threshold if the source matching degree of the target text with the highest source matching degree is greater than or equal to the first preset threshold, indicating that the target text with the highest source matching degree matches the source text, and The target text with the highest source matching degree is used as the matching target text, and if the source matching degree of the target text with the highest source matching degree is smaller than the first preset preset, the target text with the highest source matching degree is indicated.
  • the source text does not match, that is, the target text does not match the source text, and the user can re-enter the source text to perform the matching operation.
  • the target text is used as the matching target text.
  • the source matching degree of the target text with the highest source matching degree is selected, and no target is needed. The source matching of the text is judged, thereby saving the time of the matching operation.
  • FIG. 3 is a schematic flowchart diagram of a third embodiment of a string fuzzy matching method according to the present invention. Based on the second embodiment of the string fuzzy matching method, the matching target texts are multiple. After the step S43, the method further includes:
  • Step S44 calculating a target matching degree of each matched target text according to the matched number of characters and the number of characters of each matched target text, and determining a target text with the highest target matching degree according to the calculation result;
  • step S45 the determined target text with the highest target matching degree is used as the target text of the final matching.
  • the target matching degree of the target text is 100%, 83.3%, 62.5%, 50%, and 41.7%, respectively, and the target text with the highest target matching degree is the target text corresponding to the target matching degree of 100%, and the target matching is determined. After the target text with the highest degree, the target text is used as the target text of the final match. It should be noted that if there are multiple target texts with the highest target matching degree, the target text with the highest matching degree is the final. Match the target text.
  • the target matching degree of each matched target text is further filtered to obtain the final matching target text, because the obtained final matching target text is twice. Filtering, therefore, improves the accuracy of getting the target text.
  • FIG. 4 is a schematic flowchart diagram of a fourth embodiment of a string fuzzy matching method according to the present invention. Based on the third embodiment of the string fuzzy matching method, the step S45 includes:
  • Step S451 Acquire a second preset threshold corresponding to the source text according to the first preset threshold.
  • the second preset threshold is related to the first preset threshold.
  • the second preset threshold may be set according to the first preset threshold. For example, if the first preset threshold is 1, the second preset threshold may be set to 1, and the second preset threshold is 1 when the target matching degree of the target text is 100%, the target text. Matching with the source text; if the first preset threshold is 0.67, the second preset threshold may be set to 0.50, and the second preset threshold is 0.50, indicating that the target matching degree of the target text is 50% or more The target text matches the source text.
  • the foregoing thresholds may be freely set and dynamically adjusted according to actual needs, or may be set according to actual needs, and are not limited in this embodiment.
  • Step S452 determining whether the determined target matching degree of the target text with the highest target matching degree is greater than or equal to the second preset threshold value
  • Step S453 if yes, the determined target text with the highest target matching degree is used as the target text of the final matching.
  • the target matching degree of the target text with the highest target matching degree may be small, that is, the difference between the selected target text and the source text is large, which may not be required by the user.
  • the target text therefore, the target text to be selected is discarded, so after determining the target text with the highest target matching degree, it can be determined whether the target matching degree of the target text with the highest target matching degree is greater than or equal to the second pre-target.
  • a threshold value if the target matching degree of the target text with the highest target matching degree is greater than or equal to the second preset threshold, indicating that the target text with the highest target matching degree matches the source text, and the target is matched
  • the target text with the highest matching degree is used as the target text of the final matching. If the target matching degree of the target text with the highest target matching degree is smaller than the second preset preset, the target text with the highest target matching degree and the source are indicated. The text does not match, that is, the target text fails to match the source text.
  • the target matching degree of the target text with the highest target matching degree is determined to improve the accuracy of the obtained target text.
  • FIG. 5 is a schematic flowchart diagram of a fifth embodiment of a string fuzzy matching method according to the present invention. Based on any of the foregoing string fuzzy matching methods, the step S10 includes:
  • Step S11 converting the source text and the respective target texts into character information in a pinyin form
  • step S12 the number of characters matching the character information in the pinyin form corresponding to the respective target texts and the character information in the pinyin form corresponding to the source text is obtained.
  • the source text and the target text are voice text or Chinese text.
  • the system needs to convert the source text and each target text into character information in the form of pinyin in order to perform the matching operation.
  • the pinyin character information of the source text may be matched with the character information of all the pinyin forms of the target text from the first character, and if the matching is successful, the record is recorded.
  • the character is followed, and then the second character of the source text is matched, the above operation is repeated until all the characters of the source text are matched, and then the number of characters matching the target text and the source text is counted, and It is stated that if there are multiple identical characters in the source text, when the source text and the target text are matched, if only one character in the target text matches more than one of the same characters in the source text, the statistical matching is performed. The number of characters, only the number of characters matching the character is one, not Count as multiple.
  • the recognition rate of the target text can be improved.
  • the invention further provides a string fuzzy matching device.
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of a string fuzzy matching apparatus according to the present invention.
  • the string fuzzy matching device includes: an obtaining module 10, a first computing module 20, and a first module 30.
  • the obtaining module 10 is configured to obtain the number of characters matched by the source text and each target text;
  • the source text is text input by a user
  • the source text may be a voice text, a Chinese text, or a pinyin text.
  • Each of the target texts is text for matching with the source text, and the respective target texts may also be voice text, Chinese text, or pinyin text.
  • the obtaining module 10 After receiving the source text input by the user, the obtaining module 10 performs matching operations on the received source text and the locally stored target texts, and searches for characters matching the source text and the respective target texts, that is, searching for the respective targets. A character in the text that matches the characters in the source text, and then counts the number of characters that each target text matches the source text.
  • the first calculating module 20 is configured to calculate a source matching degree of each target text according to the matched number of characters and the number of characters of the source text;
  • the first calculating module 20 may calculate the source matching degree of the source text and each target text according to the number of characters and the number of characters of the source text, the source matching degree.
  • the source matching degree of each target text is 62.5%, 50.0%, 75.0%, 12.5%, 0, and so on.
  • the obtaining module 10 is further configured to acquire, according to the number of fields of the source text, a first preset threshold corresponding to the source text;
  • the first as a module 30 is configured to acquire the target text whose source matching degree of the respective target texts is greater than or equal to the first preset threshold, and use the acquired target text as the matching target text.
  • the matching target text may be searched by sequentially determining whether the source matching degree of each target text is greater than or equal to the first preset threshold. If the target text of the target text has a source matching degree greater than or equal to the first preset threshold, the target matching degree is greater than or equal to the first preset threshold target text as the matching target text, if the source matching degree is greater than or If there are multiple target texts equal to the first preset threshold, all the target matching degrees are greater than or equal to the first preset threshold target text as the matching target text; if the source matching degree of the target text is less than the first preset threshold, Then the target text is not the target text that matches the source text, ie the target text does not match the source text.
  • the first preset threshold is related to the number of fields of the source text, that is, the source text of the different field numbers corresponds to different first preset thresholds, wherein the number of the fields is Chinese in the source text. The number of characters. Therefore, before determining whether the source matching degree of the target text is greater than or equal to the first preset threshold, the number of fields of the source text needs to be determined, and then the first preset threshold corresponding to the source text is obtained according to the number of fields of the source text.
  • the first preset threshold may be set according to the number of fields of the source text. For example, if the number of fields of the source text is less than or equal to 2, the first preset threshold may be set to 1.
  • the first preset threshold is 1 when the source matching degree of the target text is 100%, the target text. If the number of fields of the source text is greater than 2, that is, the number of Chinese characters in the source text is more than 2, the first preset threshold may be set to 0.67, and the first preset threshold is 0.67 for the target. When the source match of the text is 66.7% or more, the target text matches the source text. It should be noted that the three values mentioned above may be freely set and dynamically adjusted according to actual needs, and more first preset thresholds may be set according to actual needs, which is not limited in this embodiment.
  • the first preset threshold when the number of source text fields exceeds two, the first preset threshold is set to 0.67, and when the number of source text fields is less than or equal to two, the first preset threshold is set to 1, that is, if the user says two Words must be matched to all. If you say three words or more, you must match more than 2/3.
  • the number of characters matching the source text and each target text is obtained; the source matching degree of each target text is calculated according to the number of characters; and the source matching degree of each target text is sequentially determined to satisfy the first preset condition. If yes, the target text that satisfies the first preset condition is used as the matching target text. Since the embodiment uses the fuzzy matching method to find the matching target text, instead of using the exact search method to find the matching target text, the recognition rate of the character string is effectively improved.
  • FIG. 7 is a schematic diagram of functional modules of a second embodiment of a string fuzzy matching apparatus according to the present invention.
  • the first as module 30 includes a determining unit 31, a first determining unit 32, and a first unit 33.
  • the determining unit 31 is configured to determine, according to the calculated source matching degree of each target text, the target text with the highest source matching degree;
  • the source matching degree of each target text can be compared, and then the target text with the highest source matching degree can be selected. It should be noted that if there are multiple target texts with the highest source matching degree, it is necessary to select the target text with the highest source matching degree.
  • the first determining unit 32 is configured to determine whether a source matching degree of the target text with the highest source matching degree is greater than or equal to the first preset threshold;
  • the first as the unit 33 is configured to use the target text with the highest source matching degree as the matching target if the source matching degree of the target text with the highest source matching degree is greater than or equal to the first preset threshold value. text.
  • the target text is used as the matching target text.
  • the source matching degree of the target text with the highest source matching degree is selected, and no target is needed. The source matching of the text is judged, thereby saving the time of the matching operation.
  • FIG. 8 is a schematic diagram of functional modules of a third embodiment of a string fuzzy matching apparatus according to the present invention. Based on the second embodiment of the string fuzzy matching device, when there is a plurality of matching target texts, the string fuzzy matching device further includes: a second computing module 40 and a second module 50.
  • the second calculating module 40 is configured to calculate a target matching degree of each matched target text according to the matched number of characters and the number of characters of each matched target text, and determine that the target matching degree is the highest according to the calculation result.
  • the second as module 50 is further configured to use the determined target text with the highest target matching degree as the target text of the final matching.
  • the target matching degree of each matched target text is further filtered to obtain the final matching target text, because the obtained final matching target text is twice. Filtering, therefore, improves the accuracy of getting the target text.
  • FIG. 9 is a schematic flowchart diagram of a fourth embodiment of a string fuzzy matching apparatus according to the present invention.
  • the second module includes 50: the acquiring unit 51, the second determining unit 52, and the second unit 53.
  • the acquiring unit 51 is configured to acquire a second preset threshold corresponding to the source text according to the first preset threshold;
  • the second preset threshold is related to the first preset threshold.
  • the second preset threshold may be set according to the first preset threshold. For example, if the first preset threshold is 1, the second preset threshold may be set to 1, and the second preset threshold is 1 when the target matching degree of the target text is 100%, the target text. And matching with the source text; if the source matching degree corresponding to the first preset threshold is 0.67, the second preset threshold may be set to 0.50, and the second preset threshold is 0.50, which represents the target matching degree of the target text.
  • the target text matches the source text when it is 50% or more. It should be noted that the foregoing thresholds may be freely set and dynamically adjusted according to actual needs, or may be set according to actual needs, and are not limited in this embodiment.
  • the second determining unit 52 is configured to determine whether the determined target matching degree of the target text with the highest target matching degree is greater than or equal to the second preset threshold;
  • the second as the unit 53 is configured to determine the target text with the highest target matching degree if the target matching degree of the target text with the highest target matching degree is greater than or equal to the second preset threshold value.
  • the target text as the final match.
  • the target matching degree of the target text with the highest target matching degree is determined to improve the accuracy of the obtained target text.
  • FIG. 10 is a schematic flowchart diagram of a fifth embodiment of a string fuzzy matching apparatus according to the present invention.
  • the acquisition module 10 includes a conversion unit 11 and an acquisition unit 12, based on any of the foregoing embodiments of the string fuzzy matching device.
  • the converting unit 11 is configured to convert the source text and the respective target text into character information in a pinyin form
  • the obtaining unit 12 is configured to acquire the number of characters in the pinyin form corresponding to the target text and the character information in the pinyin form corresponding to the source text.
  • the recognition rate of the target text can be improved.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • a storage medium such as ROM/RAM, disk
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种字符串模糊匹配方法和装置,所述字符串模糊匹配方法包括以下步骤:获取源文本和各个目标文本匹配的字符数量(S10);根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度(S20);根据所述源文本的字段数量获取所述源文本对应的第一预设阈值(S30);获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本(S40)。该方法解决了采用的精确查找的方式查找匹配的目标字符串的精确度低的问题,提高了字符串的识别率。

Description

字符串模糊匹配方法及装置
技术领域
本发明涉及信息处理技术领域,尤其涉及一种字符串模糊匹配方法及装置。
背景技术
现有的数据查找过程中,通常使用精确匹配算法进行查找操作,如文本编辑中的查找替换,数据库中按索引进行检索等,其匹配要求严格准确,实现算法有带回溯的匹配算法、KMP算法等。然而,采用精确匹配算法查找数据时,需要待查找的数据与目标数据完全相同时,才能将待查找数据从目标数据中找出。在一些场合下,人们往往不能准确地给出完整的待查找数据(待查找字符串),如果采用精确匹配算法,由于精确匹配算法需要待查找字符串与目标字符串完全一样时,才能匹配成功,因此,在这种场合下使用精确匹配算法就得不到查找结果,造成查找字符串的识别率较低。
发明内容
本发明的主要目的在于提供一种字符串模糊匹配方法及装置,旨在解决现有技术采用精确匹配方法查找字符串的识别率较低的问题。
为实现上述目的,本发明提供的一种字符串模糊匹配方法,所述字符串模糊匹配方法包括以下步骤:
获取源文本和各个目标文本匹配的字符数量;其中,将所述源文本和所述各个目标文本转换为拼音形式的字符信息;获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量;
根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本;
其中,所述获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本的步骤包括:
根据计算得出的各个目标文本的源匹配度确定源匹配度最高的目标文本;
判断所述源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值;若是,则将所述源匹配度最高的目标文本作为匹配的目标文本;
当所述匹配的目标文本存在多个时,根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个所述匹配的目标文本的目标匹配度,并根据计算结果确定目标匹配度最高的目标文本;
根据所述第一预设阈值获取所述源文本对应的第二预设阈值;判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或者等于所述第二预设阈值;若是,则将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
为实现上述目的,本发明还提供的一种字符串模糊匹配方法,所述字符串模糊匹配方法包括以下步骤:
获取源文本和各个目标文本匹配的字符数量;
根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本。
此外,为实现上述目的,本发明还提供一种字符串模糊匹配装置,所述字符串模糊匹配装置包括:
获取模块,用于获取源文本和各个目标文本匹配的字符数量;
第一计算模块,用于根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
所述获取模块,还用于根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
第一作为模块,用于获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本。
本发明通过获取源文本和各个目标文本匹配的字符数量;根据所述字符数量计算所述各个目标文本的源匹配度;依次判断所述各个目标文本的源匹配度是否满足第一预设条件;若是,则将满足第一预设条件的目标文本作为匹配的目标文本。由于本实施例采用的是模糊匹配的方式查找匹配的目标文本,而不是采用的精确查找的方式查找匹配的目标文本,从而有效提高了字符串的识别率。
附图说明
图1为本发明字符串模糊匹配方法的第一实施例的流程示意图;
图2为本发明字符串模糊匹配方法的第二实施例的流程示意图;
图3为本发明字符串模糊匹配方法的第三实施例的流程示意图;
图4为本发明字符串模糊匹配方法的第四实施例的流程示意图;
图5为本发明字符串模糊匹配方法的第五实施例的流程示意图;
图6为本发明字符串模糊匹配装置的第一实施例的功能模块示意图;
图7为本发明字符串模糊匹配装置的第二实施例的功能模块示意图;
图8为本发明字符串模糊匹配装置的第三实施例的功能模块示意图;
图9为本发明字符串模糊匹配装置的第四实施例的功能模块示意图;
图10为本发明字符串模糊匹配装置的第五实施例的功能模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实
施例仅仅用以解释本发明,并不用于限定本发明。
基于上述问题,本发明提供一种字符串模糊匹配方法。
参照图1,图1为本发明字符串模糊匹配方法的第一实施例的流程示意图。
在本实施例中,所述字符串模糊匹配方法包括:
步骤S10,获取源文本和各个目标文本匹配的字符数量;
在本实施中,所述源文本为用户输入的文本,所述源文本可以为语音文本、中文文本或者拼音文本。所述各个目标文本为用于和源文本匹配的文本,所述各个目标文本也可以为语音文本、中文文本或者拼音文本。在接收到用户输入的源文本后,系统将接收到的源文本与本地预存的各个目标文本进行匹配操作,查找源文本与各个目标文本相匹配的字符,即查找所述各个目标文本中的字符与源文本中的字符一致的字符,然后统计各个目标文本与源文本匹配的字符数量。
步骤S20,根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
在获取到各个目标文本与源文本匹配的字符数量后,可以根据该字符数量与源文本的字符数量计算各个目标文本的源匹配度,所述源匹配度为所述匹配的字符数量与源文本的字符数量的百分比,即所述源匹配度=匹配字符数量/源文本的字符数量*100%。例如,假设所述源文本字符数量为8个字符,各个目标文本与所述源文本匹配的字符数量分别为5个字符,4个字符,6个字符,1个字符,0个字符等,则各个目标文本的源匹配度依次为62.5%、50.0%、75.0%、12.5%、0等。
步骤S30,根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
步骤S40,获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本。
在获取到各个目标文本的源匹配度后,为了筛选出匹配的目标文本,可以通过依次判断所述各个目标文本的源匹配度是否大于或等于第一预设阈值的方式来查找匹配的目标文本,若目标文本的源匹配度大于或等于第一预设阈值的目标文本只有一个,则将该源匹配度大于或等于第一预设阈值目标文本作为匹配的目标文本,若源匹配度大于或等于第一预设阈值目标文本存在多个,则将所有的源匹配度大于或等于第一预设阈值目标文本都作为匹配的目标文本;若目标文本的源匹配度小于第一预设阈值,则该目标文本不是与源文本所匹配的目标文本,即该目标文本与所述源文本不匹配。在本实施例中,所述第一预设阈值与源文本的字段数量有关,即不同字段数量的源文本对应着不同的第一预设阈值,其中,所述字段数量为源文本中的中文字符数量。故在判断目标文本的源匹配度是否大于或等于所述第一预设阈值之前,需要先确定源文本的字段数量,然后根据源文本的字段数量获取源文本对应的第一预设阈值。具体来说,所述第一预设阈值可以根据源文本的字段数量进行设置。例如,若源文本的字段数量为小于或等于2个,则可以设置第一预设阈值为1,该第一预设阈值为1代表该目标文本的源匹配度为100%时,该目标文本才与源文本匹配;若源文本的字段数量大于2,即源文本中的中文字符数量多于2个,则可以设置第一预设阈值为0.67,该第一预设阈值为0.67代表该目标文本的源匹配度为67%或以上时,该目标文本与源文本才匹配。需要说明的是,上述所述的三个值可以根据实际需要自由设定和动态调整,也可以根据实际需要设置更多个第一预设阈值,在本实施例中,不做限定。比如在语音应用中,在源文本字段数量超过2个时设置第一预设阈值为0.67,在源文本字段数量小于或等于两个时设置第一预设阈值为1,即如果用户说了一个或者两个字,就必须全部匹配到,如果说了三个字及以上,必须匹配到2/3以上。
本实施例通过获取源文本和各个目标文本匹配的字符数量;根据所述字符数量计算所述各个目标文本的源匹配度;依次判断所述各个目标文本的源匹配度是否满足第一预设条件;若是,则将满足第一预设条件的目标文本作为匹配的目标文本。由于本实施例采用的是模糊匹配的方式查找匹配的目标文本,而不是采用的精确查找的方式查找匹配的目标文本,从而有效提高了字符串的识别率。
进一步的,参照图2,图2为本发明字符串模糊匹配方法的第二实施例的流程示意图。基于上述字符串模糊匹配方法的第一实施例,所述步骤S40包括:
步骤S41,根据计算得出的各个目标文本的源匹配度确定源匹配度最高的目标文本;
在计算出各个目标文本的源匹配度后,可以通过比较各个目标文本的源匹配度,然后从中选择出源匹配度最高的目标文本。需要说明的是,若源匹配度最高的目标文本存在多个,则需要将该多个源匹配度最高的目标文本都选择出来。
步骤S42,判断所述源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值;
步骤S43,若是,则将所述源匹配度最高的目标文本作为匹配的目标文本。
在选择出源匹配度最高的目标文本后,由于选择出的源匹配度最高的目标文本的源匹配度可能很小,也就是表明选出的目标文本与源文本的差距很大,可能不是用户所需要的目标文本,因此,需要丢弃此次选出的目标文本,故在选择出源匹配度最高的目标文本后,可以判断源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值,若所述源匹配度最高的目标文本的源匹配度大于或等于所述第一预设阈值,则表明所述源匹配度最高的目标文本与所述源文本匹配,并将该源匹配度最高的目标文本作为匹配的目标文本,若源匹配度最高的目标文本的源匹配度小于所述第一预设预设,则表明所述源匹配度最高的目标文本与所述源文本不匹配,也即目标文本与所述源文本不匹配,用户可以重新输入源文本进行匹配操作。
本实施例通过在计算出所述各个目标文本的源匹配度之后,选择出源匹配度最高的目标文本组,然后对该选择出的目标文本进行判断,若该目标文本组的源匹配度大于或等于所述第一预设阈值,则将该目标文本作为匹配的目标文本,本实施例中通过对选择出源匹配度最高的目标文本的源匹配度进行判断,而不需要对每个目标文本的源匹配度进行判断,从而节省了匹配操作的时间。
进一步的,参照图3,图3为本发明字符串模糊匹配方法的第三实施例的流程示意图。基于上述字符串模糊匹配方法的第二实施例,所述匹配的目标文本存在多个,所述步骤S43之后,还包括:
步骤S44,根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个匹配的所述目标文本的目标匹配度,并根据计算结果确定目标匹配度最高的目标文本;
步骤S45,将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
由于通过上述方式获取的匹配的目标文本可能存在多个,因此,为了获取到更准确的匹配的目标文本,可以在获取到匹配的多个目标文本之后,根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个匹配的目标文本的目标匹配度,并在计算出各个匹配的目标文本的目标匹配度后,确定目标匹配度最高的目标文本,所述目标匹配度为所述匹配的字符数量与目标文本的字符数量的百分比,即所述目标匹配度=匹配的字符数量/目标文本的字符数量*100%。例如,假设各个匹配的目标文本与源文本匹配的字符数量为5个字符,各个目标文本的字符数量分别为5个字符,6个字符,8个字符,10个字符,12个字符,则各个目标文本的目标匹配度依次为100%、83.3%、62.5%、50%、41.7%,则目标匹配度最高的目标文本为所述目标匹配度为100%所对应的目标文本,在确定目标匹配度最高的目标文本后,将该目标文本作为最终匹配的目标文本,需要说明的是,若目标匹配度最高的目标文本存在多个,则将该多个目标匹配度最高的目标文本都作为最终匹配的目标文本。
本实施例通过在筛选出匹配的目标文本后,进一步根据该匹配的各个目标文本的目标匹配度作进一步的筛选,以获取最终匹配的目标文本,由于获取的最终匹配的目标文本经过了两次筛选,因此,提高了获取目标文本的准确率。
进一步的,参照图4,图4为本发明字符串模糊匹配方法的第四实施例的流程示意图。基于上述字符串模糊匹配方法的第三实施例,所述步骤S45包括:
步骤S451,根据所述第一预设阈值获取所述源文本对应的第二预设阈值;
在本实施例中,所述第二预设阈值与所述第一预设阈值有关,具体地来说,所述第二预设阈值可以根据所述第一预设阈值进行设置。例如,若所述第一预设阈值为1,则可以设置所述第二预设阈值也为1,该第二预设阈值为1代表目标文本的目标匹配度为100%时,该目标文本与源文本才匹配;若所述第一预设阈值为0.67,则可以设置所述第二预设阈值为0.50,该第二预设阈值为0.50代表目标文本的目标匹配度为50%或以上时,该目标文本与源文本才匹配。需要说明的是,上述所述的各个阈值可以根据实际需要自由设定和动态调整,也可以根据实际需要设置更多个第二预设阈值,在本实施例中,不做限定。
步骤S452,判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或者等于所述第二预设阈值;
步骤S453,若是,则将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
在确定目标匹配度最高的目标文本后,由于确定的目标匹配度最高的目标文本的目标匹配度可能很小,也就是表明选出的目标文本与源文本的差距很大,可能不是用户所需要的目标文本,因此,需要丢弃此次选出的目标文本,故在确定目标匹配度最高的目标文本后,可以判断目标匹配度最高的目标文本的目标匹配度是否大于或等于所述第二预设阈值,若所述目标匹配度最高的目标文本的目标匹配度大于或等于所述第二预设阈值,则表明所述目标匹配度最高的目标文本与所述源文本匹配,并将该目标匹配度最高的目标文本作为最终匹配的目标文本,若目标匹配度最高的目标文本的目标匹配度小于所述第二预设预设,则表明所述目标匹配度最高的目标文本与所述源文本不匹配,也即目标文本与所述源文本匹配失败。
本实施例通过在将所述目标匹配度最高的目标文本作为最终匹配的目标文本之前,判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或等于所述第二预设阈值,若是,则匹配成功,若否,则匹配失败,本实施例通过对目标匹配度最高的目标文本的目标匹配度进行判断,以提高获取的目标文本的准确率。
进一步的,参照图5,图5为本发明字符串模糊匹配方法的第五实施例的流程示意图。基于上述字符串模糊匹配方法的任一实施例,所述步骤S10包括:
步骤S11,将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
步骤S12,获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
在本实施例中,所述源文本及目标文本为语音文本或者中文文本,系统在接收到源文本后,为了进行匹配操作,需要将源文本和各个目标文本转换为拼音形式的字符信息,在将源文本和各个目标文本转换为拼音形式的字符信息后,可以将源文本的各个拼音字符信息从第一字符开始依次与目标文本的所有拼音形式的字符信息进行匹配,若匹配成功,则记录下该字符,然后对源文本的第二个字符进行匹配操作,重复上述操作,直到对所述源文本的所有字符完成匹配操作,然后统计出目标文本与该源文本中匹配的字符数量,需要说明的是,若源文本中存在多个相同的字符,则在源文本与目标文本进行匹配操作时,若目标文本中只存在一个字符与源文本中多个相同的字符匹配,则在统计匹配的字符数量时,只计算与该字符匹配的字符数量为一个,而不应该计算为多个。
本实施例通过将源文本和目标文本转换为拼音形式的字符信息进行匹配操作,可以提高目标文本的识别率。
本发明进一步提供一种字符串模糊匹配装置。
参照图6,图6为本发明字符串模糊匹配装置的第一实施例的功能模块示意图。
在本实施例中,所述字符串模糊匹配装置包括:获取模块10、第一计算模块20及第一作为模块30。
所述获取模块10,用于获取源文本和各个目标文本匹配的字符数量;
在本实施中,所述源文本为用户输入的文本,所述源文本可以为语音文本、中文文本或者拼音文本。所述各个目标文本为用于和源文本匹配的文本,所述各个目标文本也可以为语音文本、中文文本或者拼音文本。在接收到用户输入的源文本后,所述获取模块10将接收到的源文本与本地预存的各个目标文本进行匹配操作,查找源文本与各个目标文本相匹配的字符,即查找所述各个目标文本中的字符与源文本中的字符一致的字符,然后统计各个目标文本与源文本匹配的字符数量。
所述第一计算模块20,用于根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
在获取到各个目标文本与源文本匹配的字符数量后,所述第一计算模块20可以根据该字符数量与源文本的字符数量计算源文本与各个目标文本的源匹配度,所述源匹配度为所述匹配的字符数量与源文本的字符数量的百分比,即所述源匹配度=匹配字符数量/源文本的字符数量*100%。例如,假设所述源文本字符数量为8个字符,各个目标文本与所述源文本匹配的字符数量分别为5个字符,4个字符,6个字符,1个字符,0个字符等,则各个目标文本的源匹配度依次为62.5%、50.0%、75.0%、12.5%、0等。
所述获取模块10,还用于根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
所述第一作为模块30,用于获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本。
在获取到各个目标文本的源匹配度后,为了筛选出匹配的目标文本,可以通过依次判断所述各个目标文本的源匹配度是否大于或等于第一预设阈值的方式来查找匹配的目标文本,若目标文本的源匹配度大于或等于第一预设阈值的目标文本只有一个,则将该源匹配度大于或等于第一预设阈值目标文本作为匹配的目标文本,若源匹配度大于或等于第一预设阈值目标文本存在多个,则将所有的源匹配度大于或等于第一预设阈值目标文本都作为匹配的目标文本;若目标文本的源匹配度小于第一预设阈值,则该目标文本不是与源文本所匹配的目标文本,即该目标文本与所述源文本不匹配。在本实施例中,所述第一预设阈值与源文本的字段数量有关,即不同字段数量的源文本对应着不同的第一预设阈值,其中,所述字段数量为源文本中的中文字符数量。故在判断目标文本的源匹配度是否大于或等于所述第一预设阈值之前,需要先确定源文本的字段数量,然后根据源文本的字段数量获取源文本对应的第一预设阈值。具体来说,所述第一预设阈值可以根据源文本的字段数量进行设置。例如,若源文本的字段数量为小于或等于2个,则可以设置第一预设阈值为1,该第一预设阈值为1代表该目标文本的源匹配度为100%时,该目标文本才与源文本匹配;若源文本的字段数量大于2,即源文本中的中文字符数量多于2个,则可以设置第一预设阈值为0.67,该第一预设阈值为0.67代表该目标文本的源匹配度为66.7%或以上时,该目标文本与源文本才匹配。需要说明的是,上述所述的三个值可以根据实际需要自由设定和动态调整,也可以根据实际需要设置更多个第一预设阈值,在本实施例中,不做限定。比如在语音应用中,在源文本字段数量超过2个时设置第一预设阈值为0.67,在源文本字段数量小于或等于两个时设置第一预设阈值为1,即如果用户说了两个字,就必须全部匹配到,如果说了三个字及以上,必须匹配到2/3以上。
本实施例通过获取源文本和各个目标文本匹配的字符数量;根据所述字符数量计算所述各个目标文本的源匹配度;依次判断所述各个目标文本的源匹配度是否满足第一预设条件;若是,则将满足第一预设条件的目标文本作为匹配的目标文本。由于本实施例采用的是模糊匹配的方式查找匹配的目标文本,而不是采用的精确查找的方式查找匹配的目标文本,从而有效提高了字符串的识别率。
进一步的,参照图7,图7为本发明字符串模糊匹配装置的第二实施例的功能模块示意图。基于上述字符串模糊匹配装置的第一实施例,所述第一作为模块30包括:确定单元31、第一判断单元32及第一作为单元33。
所述确定单元31,用于根据计算得出的各个目标文本的源匹配度确定源匹配度最高的目标文本;
在计算出各个目标文本的源匹配度后,可以通过比较各个目标文本的源匹配度,然后从中选择出源匹配度最高的目标文本。需要说明的是,若源匹配度最高的目标文本存在多个,则需要将该多个源匹配度最高的目标文本都选择出来。
所述第一判断单元32,用于判断所述源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值;
所述第一作为单元33,用于若所述源匹配度最高的目标文本的源匹配度大于或等于所述第一预设阈值,则将所述源匹配度最高的目标文本作为匹配的目标文本。
本实施例通过在计算出所述各个目标文本的源匹配度之后,选择出源匹配度最高的目标文本组,然后对该选择出的目标文本进行判断,若该目标文本组的源匹配度大于或等于所述第一预设阈值,则将该目标文本作为匹配的目标文本,本实施例中通过对选择出源匹配度最高的目标文本的源匹配度进行判断,而不需要对每个目标文本的源匹配度进行判断,从而节省了匹配操作的时间。
进一步的,参照图8,图8为本发明字符串模糊匹配装置的第三实施例的功能模块示意图。基于上述字符串模糊匹配装置的第二实施例,所述匹配的目标文本存在多个时,所述字符串模糊匹配装置还包括:第二计算模块40及第二作为模块50。
所述第二计算模块40,用于根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个匹配的所述目标文本的目标匹配度,并根据计算结果确定目标匹配度最高的目标文本;
所述第二作为模块50,还用于将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
本实施例通过在筛选出匹配的目标文本后,进一步根据该匹配的各个目标文本的目标匹配度作进一步的筛选,以获取最终匹配的目标文本,由于获取的最终匹配的目标文本经过了两次筛选,因此,提高了获取目标文本的准确率。
进一步的,参照图9,图9为本发明字符串模糊匹配装置的第四实施例的流程示意图。基于上述字符串模糊匹配装置的第三实施例,所述第二作为模块包括50:获取单元51、第二判断单元52及第二作为单元53。
所述获取单元51,用于根据所述第一预设阈值获取所述源文本对应的第二预设阈值;
在本实施例中,所述第二预设阈值与所述第一预设阈值有关,具体地来说,所述第二预设阈值可以根据所述第一预设阈值进行设置。例如,若所述第一预设阈值为1,则可以设置所述第二预设阈值也为1,该第二预设阈值为1代表目标文本的目标匹配度为100%时,该目标文本与源文本才匹配;若所述第一预设阈值对应的源匹配度为0.67,则可以设置所述第二预设阈值为0.50,该第二预设阈值为0.50代表目标文本的目标匹配度为50%或以上时,该目标文本与源文本才匹配。需要说明的是,上述所述的各个阈值可以根据实际需要自由设定和动态调整,也可以根据实际需要设置更多个第二预设阈值,在本实施例中,不做限定。
所述第二判断单元52,用于判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或者等于所述第二预设阈值;
所述第二作为单元53,用于若确定的所述目标匹配度最高的目标文本的目标匹配度大于或者等于所述第二预设阈值,则将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
本实施例通过在将所述目标匹配度最高的目标文本作为最终匹配的目标文本之前,判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或等于所述第二预设阈值,若是,则匹配成功,若否,则匹配失败,本实施例通过对目标匹配度最高的目标文本的目标匹配度进行判断,以提高获取的目标文本的准确率。
进一步的,参照图10,图10为本发明字符串模糊匹配装置的第五实施例的流程示意图。基于上述字符串模糊匹配装置的任一实施例,所述获取模块10包括:转换单元11及获取单元12。
所述转换单元11,用于将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
所述获取单元12,用于获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
本实施例通过将源文本和目标文本转换为拼音形式的字符信息进行匹配操作,可以提高目标文本的识别率。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (19)

  1. 一种字符串模糊匹配方法,其特征在于,所述字符串模糊匹配方法包括以下步骤:
    获取源文本和各个目标文本匹配的字符数量;其中,将所述源文本和所述各个目标文本转换为拼音形式的字符信息;获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量;
    根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
    根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
    获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本;
    其中,所述获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本的步骤包括:
    根据计算得出的各个目标文本的源匹配度确定源匹配度最高的目标文本;
    判断所述源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值;若是,则将所述源匹配度最高的目标文本作为匹配的目标文本;
    当所述匹配的目标文本存在多个时,根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个所述匹配的目标文本的目标匹配度,并根据计算结果确定目标匹配度最高的目标文本;
    根据所述第一预设阈值获取所述源文本对应的第二预设阈值;判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或者等于所述第二预设阈值;若是,则将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
  2. 一种字符串模糊匹配方法,其特征在于,所述字符串模糊匹配方法包括以下步骤:
    获取源文本和各个目标文本匹配的字符数量;
    根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
    根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
    获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本。
  3. 如权利要求2所述的字符串模糊匹配方法,其特征在于,所述获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本的步骤包括:
    根据计算得出的各个目标文本的源匹配度确定源匹配度最高的目标文本;
    判断所述源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值;
    若是,则将所述源匹配度最高的目标文本作为匹配的目标文本。
  4. 如权利要求3所述的字符串模糊匹配方法,其特征在于,所述匹配的目标文本存在多个时,所述将所述源匹配度最高的目标文本作为匹配的目标文本的步骤之后,还包括:
    根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个所述匹配的目标文本的目标匹配度,并根据计算结果确定目标匹配度最高的目标文本;
    将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
  5. 如权利要4所述的字符串模糊匹配方法,其特征在于,所述将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本的步骤包括:
    根据所述第一预设阈值获取所述源文本对应的第二预设阈值;
    判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或者等于所述第二预设阈值;
    若是,则将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
  6. 如权利要求2所述的字符串模糊匹配方法,其特征在于,所述获取源文本和各个目标文本匹配的字符数量的步骤包括:
    将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
    获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
  7. 如权利要求3所述的字符串模糊匹配方法,其特征在于,所述获取源文本和各个目标文本匹配的字符数量的步骤包括:
    将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
    获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
  8. 如权利要求4所述的字符串模糊匹配方法,其特征在于,所述获取源文本和各个目标文本匹配的字符数量的步骤包括:
    将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
    获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
  9. 如权利要求2所述的字符串模糊匹配方法,其特征在于,所述获取源文本和各个目标文本匹配的字符数量的步骤包括:
    将接收到的源文本与本地预存的各个目标文本进行匹配操作,查找源文本与各个目标文本相匹配的字符;
    根据查找结果统计出各个所述目标文本与所述源文本匹配的字符数量。
  10. 如权利要求2所述的字符串模糊匹配方法,其特征在于,所述根据所述源文本的字段数量获取所述源文本对应的第一预设阈值的步骤包括:
    确定所述源文本的字段数量,所述字段数量为所述源文本中的中文字符数量;
    获取与确定的所述字段数量对应的第一预设阈值。
  11. 一种字符串模糊匹配装置,其特征在于,所述字符串模糊匹配装置包括:
    获取模块,用于获取源文本和各个目标文本匹配的字符数量;
    第一计算模块,用于根据匹配的所述字符数量与所述源文本的字符数量计算各个目标文本的源匹配度;
    所述获取模块,还用于根据所述源文本的字段数量获取所述源文本对应的第一预设阈值;
    第一作为模块,用于获取所述各个目标文本的源匹配度大于或等于所述第一预设阈值的目标文本,并将获取到的所述目标文本作为匹配的目标文本。
  12. 如权利要求11所述的字符串模糊匹配装置,其特征在于,所述第一作为模块包括:
    确定单元,用于根据计算得出的各个目标文本的源匹配度确定源匹配度最高的目标文本;
    第一判断单元,用于判断所述源匹配度最高的目标文本的源匹配度是否大于或等于所述第一预设阈值;
    第一作为单元,用于若所述源匹配度最高的目标文本的源匹配度大于或等于所述第一预设阈值,则将所述源匹配度最高的目标文本作为匹配的目标文本。
  13. 如权利要求12所述的字符串模糊匹配装置,其特征在于,所述匹配的目标文本存在多个时,所述字符串模糊匹配装置还包括:
    第二计算模块,用于根据匹配的所述字符数量与各个匹配的所述目标文本的字符数量计算各个匹配的所述目标文本的目标匹配度,并根据计算结果确定目标匹配度最高的目标文本;
    第二作为模块,还用于将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
  14. 如权利要求13所述的字符串模糊匹配装置,其特征在于,所述第二作为模块包括:
    获取单元,用于根据所述第一预设阈值获取所述源文本对应的第二预设阈值;
    第二判断单元,用于判断确定的所述目标匹配度最高的目标文本的目标匹配度是否大于或者等于所述第二预设阈值;
    第二作为单元,用于若确定的所述目标匹配度最高的目标文本的目标匹配度大于或者等于所述第二预设阈值,则将确定的所述目标匹配度最高的目标文本作为最终匹配的目标文本。
  15. 如权利要求11所述的字符串模糊匹配装置,其特征在于,所述获取模块包括:
    转换单元,用于将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
    获取单元,用于获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
  16. 如权利要求12所述的字符串模糊匹配装置,其特征在于,所述获取模块包括:
    转换单元,用于将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
    获取单元,用于获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
  17. 如权利要求13所述的字符串模糊匹配装置,其特征在于,所述获取模块包括:
    转换单元,用于将所述源文本和所述各个目标文本转换为拼音形式的字符信息;
    获取单元,用于获取所述各个目标文本对应的拼音形式的字符信息与所述源文本对应的拼音形式的字符信息匹配的字符数量。
  18. 如权利要求11所述的字符串模糊匹配装置,其特征在于,所述获取模块,还用于将接收到的源文本与本地预存的各个目标文本进行匹配操作,查找源文本与各个目标文本相匹配的字符;以及,根据查找结果统计出各个所述目标文本与所述源文本匹配的字符数量。
  19. 如权利要求11所述的字符串模糊匹配装置,其特征在于,所述获取模块,还用于确定所述源文本的字段数量,所述字段数量为所述源文本中的中文字符数量;以及,获取与确定的所述字段数量对应的第一预设阈值。
PCT/CN2016/096429 2016-05-20 2016-08-23 字符串模糊匹配方法及装置 WO2017197802A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610343584.1A CN106021504A (zh) 2016-05-20 2016-05-20 字符串模糊匹配方法及装置
CN201610343584.1 2016-05-20

Publications (1)

Publication Number Publication Date
WO2017197802A1 true WO2017197802A1 (zh) 2017-11-23

Family

ID=57096944

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/096429 WO2017197802A1 (zh) 2016-05-20 2016-08-23 字符串模糊匹配方法及装置

Country Status (2)

Country Link
CN (1) CN106021504A (zh)
WO (1) WO2017197802A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191087A (zh) * 2019-12-31 2020-05-22 歌尔股份有限公司 字符匹配方法、终端设备及计算机可读存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919663A (zh) * 2017-02-14 2017-07-04 华北电力大学 电力调控系统多源异构数据融合中的字符串匹配方法
CN108572998A (zh) * 2017-03-14 2018-09-25 北京橙鑫数据科技有限公司 一种针对电子卡片数据的数据查找方法及装置
CN107123185A (zh) * 2017-06-20 2017-09-01 深圳怡化电脑股份有限公司 一种有价文件磁性字符的识别装置及方法
CN108734571A (zh) * 2018-05-29 2018-11-02 佛山市金晶微阅信息科技有限公司 一种信贷反欺诈侦测模糊匹配算法
CN109542785B (zh) * 2018-11-19 2021-07-27 北京云测网络科技有限公司 一种无效bug确定方法和装置
CN109740361B (zh) * 2018-12-29 2021-08-06 深圳Tcl新技术有限公司 数据处理方法、装置及计算机可读存储介质
CN110600003A (zh) * 2019-10-18 2019-12-20 北京云迹科技有限公司 机器人的语音输出方法、装置、机器人和存储介质
CN112215216A (zh) * 2020-09-10 2021-01-12 中国东方电气集团有限公司 一种图像识别结果的字符串模糊匹配系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101236566A (zh) * 2008-03-06 2008-08-06 宇龙计算机通信科技(深圳)有限公司 一种名称查询的方法及系统
CN101984422A (zh) * 2010-10-18 2011-03-09 百度在线网络技术(北京)有限公司 一种容错文本查询的方法和设备
CN103336850A (zh) * 2013-07-24 2013-10-02 昆明理工大学 一种数据库检索系统中确定检索词的方法及装置
CN103440865A (zh) * 2013-08-06 2013-12-11 普强信息技术(北京)有限公司 语音识别的后处理方法
CN103456297A (zh) * 2012-05-29 2013-12-18 中国移动通信集团公司 一种语音识别匹配的方法和设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831232B (zh) * 2012-08-30 2015-12-16 山石网科通信技术有限公司 字符串的匹配方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101236566A (zh) * 2008-03-06 2008-08-06 宇龙计算机通信科技(深圳)有限公司 一种名称查询的方法及系统
CN101984422A (zh) * 2010-10-18 2011-03-09 百度在线网络技术(北京)有限公司 一种容错文本查询的方法和设备
CN103456297A (zh) * 2012-05-29 2013-12-18 中国移动通信集团公司 一种语音识别匹配的方法和设备
CN103336850A (zh) * 2013-07-24 2013-10-02 昆明理工大学 一种数据库检索系统中确定检索词的方法及装置
CN103440865A (zh) * 2013-08-06 2013-12-11 普强信息技术(北京)有限公司 语音识别的后处理方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191087A (zh) * 2019-12-31 2020-05-22 歌尔股份有限公司 字符匹配方法、终端设备及计算机可读存储介质
CN111191087B (zh) * 2019-12-31 2023-11-07 歌尔股份有限公司 字符匹配方法、终端设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN106021504A (zh) 2016-10-12

Similar Documents

Publication Publication Date Title
WO2017197802A1 (zh) 字符串模糊匹配方法及装置
WO2019056482A1 (zh) 语音关键词识别方法、装置、设备及计算机可读存储介质
WO2017177524A1 (zh) 音视频同步播放的方法及装置
WO2017028601A1 (zh) 智能终端的语音控制方法、装置及电视机系统
WO2017201913A1 (zh) 一种精准语音控制方法及装置
WO2015144089A1 (en) Application recommending method and apparatus
WO2019161615A1 (zh) 账单录入方法、系统、光学字符识别服务器和存储介质
WO2017143692A1 (zh) 智能电视及其语音控制方法
WO2019041831A1 (zh) 一种绩效指标考核评估方法、设备、装置及存储介质
WO2018223857A1 (zh) 文本行识别方法及系统
WO2015131803A1 (en) Application recommending method and system
WO2019037396A1 (zh) 账户清结算方法、装置、设备及存储介质
WO2019051902A1 (zh) 终端控制方法、空调器及计算机可读存储介质
WO2015149588A1 (zh) 手持设备上用户操作模式的识别方法及手持设备
WO2017206601A1 (zh) 客户端数据处理方法及装置
WO2017152603A1 (zh) 显示方法及装置
WO2016058258A1 (zh) 终端远程控制方法和系统
WO2015120774A1 (en) Network access method and apparatus applied to mobile application
WO2016090652A1 (zh) 视频压缩方法及装置
WO2018053963A1 (zh) 智能电视的系统升级方法及装置
WO2016090991A1 (zh) 流媒体数据的下载方法及装置
WO2017080195A1 (zh) 音频识别方法及装置
WO2017059686A1 (zh) 桌面显示方法及装置
WO2019056750A1 (zh) 信息唯一性识别方法、应用服务器、系统及存储介质
WO2015135497A1 (en) User classification method, apparatus, and server

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16902182

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTIFICATION OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (=EPO FORM 1205A DATED 12.04.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16902182

Country of ref document: EP

Kind code of ref document: A1