CN115203495A - Character string fuzzy matching method and device and electronic equipment - Google Patents

Character string fuzzy matching method and device and electronic equipment Download PDF

Info

Publication number
CN115203495A
CN115203495A CN202211112979.2A CN202211112979A CN115203495A CN 115203495 A CN115203495 A CN 115203495A CN 202211112979 A CN202211112979 A CN 202211112979A CN 115203495 A CN115203495 A CN 115203495A
Authority
CN
China
Prior art keywords
character string
character
derivative
strings
intersection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211112979.2A
Other languages
Chinese (zh)
Other versions
CN115203495B (en
Inventor
陈智隆
陈琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huakong Tsingjiao Information Technology Beijing Co Ltd filed Critical Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority to CN202211112979.2A priority Critical patent/CN115203495B/en
Publication of CN115203495A publication Critical patent/CN115203495A/en
Application granted granted Critical
Publication of CN115203495B publication Critical patent/CN115203495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and electronic equipment for fuzzy matching of character strings, which relate to the technical field of computers and the technical field of multi-party security computing and comprise the following steps: generating first derivative character strings meeting a preset fuzzy matching rule with character strings in a first character string set according to the preset fuzzy matching rule aiming at the character strings in the first character string set to obtain a first derivative character string set consisting of the character strings in the first derivative character strings and the first character string set; performing intersection operation on the first derivative character string set and the character string set to be intersected to obtain a character string intersection of the first derivative character string set and the character string set to be intersected; and aiming at each intersection character string in the character string intersection, determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set. By adopting the scheme, the complexity of character string fuzzy matching is reduced.

Description

Character string fuzzy matching method and device and electronic equipment
Technical Field
The present application relates to the field of computer technologies and multi-party secure computing technologies, and in particular, to a method and an apparatus for fuzzy matching of character strings, and an electronic device.
Background
String fuzzy matching refers to searching for a string close to a target string in a string set. In practical application scenarios, for example, in the scenarios of bank finance and many blacklist queries, fuzzy matching is often required to be performed on a plurality of character strings simultaneously. The scene is as follows:
for example, there is a string array A = [ Mike, mills ], and another string array B = [ Mik, mills, TTAA, csvvf ]. It is desirable to find "Mik" corresponding to "Mike" and "Mills" corresponding to "Mills" in string array B by using string array A. This fuzzy matching scenario is mainly widespread in queries when the user input is in error, such as the letter e is written less or l is written as I, and when the handwriting recognition is in error.
At present, the existing fuzzy matching technology for plain text strings mainly comprises a string coding method and a similarity calculation method.
The string coding method is to encode a string into one number and then calculate the similarity, for example, the string array a = [ Mike, mills ], to [0.98,0.6], the string array B = [ Mik, mills, TTAA, csvvf ], to [0.95,0.59,0.1,0.2], and then find the two closest coded values.
The similarity calculation method is used for calculating the similarity between every two character strings and can be realized on the basis of a Hamming distance, a Dice distance, a jaccard distance or an editing distance. The similarity scores returned last are all between 0 and 1, and the closer to 1, the more similar the two are.
In the above method for fuzzy matching of character strings, the computation complexity is O (n ^ 2), that is, if one party has n character strings and the other party has n character strings, then the computation similarity and the coding similarity corresponding to each other need to be circulated, and the complexity is high.
And because whether the matching is determined based on the similarity, the corresponding threshold control is very important, and complex scenes are difficult to process, such as replacement of I and L, closeness of e and ee, and the like. It may not be possible to accurately follow the rules of fuzzy matching.
In addition, in the technical field of multiparty security computing and the technical field of privacy computing, an application scenario of fuzzy matching of character strings based on ciphertexts may exist, and the existing known methods are all directed to plaintexts, and the fuzzy matching of character strings directed to ciphertexts cannot be realized.
Disclosure of Invention
The embodiment of the application provides a method and a device for fuzzy matching of character strings and electronic equipment, and aims to solve the problem that in the prior art, complexity of fuzzy matching of character strings is high.
The embodiment of the application provides a method for fuzzy matching of character strings, which comprises the following steps:
generating first derivative character strings meeting the preset fuzzy matching rule with the character strings in the first character string set according to a preset fuzzy matching rule aiming at the character strings in the first character string set to obtain a first derivative character string set consisting of each first derivative character string and the character strings in the first character string set;
performing intersection operation on the first derivative character string set and a character string set to be subjected to intersection to obtain a character string intersection of the first derivative character string set and the character string set to be subjected to intersection, wherein the character string set to be subjected to intersection comprises character strings in a second character string set;
and for each intersection character string in the character string intersection, determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set.
Further, the set of character strings to be solved is the second set of character strings.
Further, before performing an intersection operation on the first set of derived strings and the set of to-be-intersected strings, the method further includes:
and aiming at the character strings in a second character string set, generating second derivative character strings meeting the preset fuzzy matching rule with the character strings in the second character string set according to the preset fuzzy matching rule, and obtaining a second derivative character string set consisting of the second derivative character strings and the character strings in the second character string set, wherein the second derivative character string set is used as a character string set to be submitted.
Further, the performing an intersection operation on the first derivative string set and the to-be-intersected string set includes:
splitting the first derivative character string set and the character string set to be solved into each first derivative character string subset and each character string subset to be solved according to the length of the character strings, wherein the length of the character strings contained in each first derivative character string subset and each character string subset to be solved is the same;
and performing intersection operation respectively for the first derivative character string subset and the character string subset to be intersected, which have the same character string length.
Further, the performing an intersection operation on the first derivative string set and the to-be-intersected string set includes:
and carrying out privacy set intersection operation on the first derivative character string set and the character string set to be intersected.
Further, the generating, according to a preset fuzzy matching rule, a first derivative character string that satisfies the preset fuzzy matching rule with the character string in the first character string set for the character string in the first character string set includes:
aiming at character strings in a first character string set, generating a first derivative character string meeting a preset fuzzy matching rule with the character strings in the first character string set by using wildcards according to the preset fuzzy matching rule, wherein the first derivative character string comprises the wildcards;
the generating a second derivative character string meeting the preset fuzzy matching rule with the character string in the second character string set according to the preset fuzzy matching rule aiming at the character string in the second character string set comprises the following steps:
and aiming at the character strings in a second character string set, generating a second derivative character string meeting the preset fuzzy matching rule with the character strings in the second character string set by using the wildcard according to the preset fuzzy matching rule, wherein the second derivative character string comprises the wildcard.
The embodiment of the present application further provides a fuzzy matching device for character strings, including:
the character string generating module is used for generating first derivative character strings meeting a preset fuzzy matching rule with the character strings in the first character string set according to the preset fuzzy matching rule aiming at the character strings in the first character string set, and obtaining a first derivative character string set consisting of all the first derivative character strings and the character strings in the first character string set;
the intersection solving operation module is used for executing intersection solving operation on the first derivative character string set and a character string set to be solved to obtain a character string intersection of the first derivative character string set and the character string set to be solved, wherein the character string set to be solved comprises character strings in a second character string set;
and the character string matching module is used for determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set aiming at each intersection character string in the character string intersection set.
Further, the set of character strings to be solved is the second set of character strings.
Further, the character string generating module is further configured to generate, according to the preset fuzzy matching rule, a second derivative character string that satisfies the preset fuzzy matching rule with the character strings in the second character string set for the character strings in the second character string set, to obtain a second derivative character string set composed of each second derivative character string and the character strings in the second character string set, where the second derivative character string set is used as a character string set to be submitted.
Further, the intersection operation module is specifically configured to split the first derivative character string set and the to-be-solved character string set into each first derivative character string subset and each to-be-solved character string subset according to a character string length, where the lengths of character strings included in each first derivative character string subset and each to-be-solved character string subset are the same;
and respectively executing intersection operation aiming at the first derivative character string subset and the character string subset to be intersected, which have the same character string length.
Further, the intersection operation module is specifically configured to perform a privacy set intersection operation on the first derivative string set and the to-be-intersected string set.
Further, the character string derivation module is specifically configured to generate, according to a preset fuzzy matching rule and using wildcards, a first derived character string that satisfies the preset fuzzy matching rule with a character string in the first character string set for the character string in the first character string set, where the first derived character string includes the wildcards;
and aiming at the character strings in a second character string set, generating a second derivative character string meeting the preset fuzzy matching rule with the character strings in the second character string set by using the wildcard according to the preset fuzzy matching rule, wherein the second derivative character string comprises the wildcard.
Embodiments of the present application further provide an electronic device, including a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: and realizing any character string fuzzy matching method.
An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any one of the above character string fuzzy matching methods.
Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above string fuzzy matching methods.
The beneficial effect of this application includes:
in the method provided by the embodiment of the application, for a first character string set and a second character string set, a character string matched with a character string of the first character string set needs to be found out from the second character string set through fuzzy matching, first, for a character string in the first character string set, a first derivative character string meeting a preset fuzzy matching rule with the character string in the first character string set is generated, the first derivative character string set is obtained, intersection operation is performed on the first derivative character string set and a character string set to be solved, a character string intersection of the first derivative character string set and the character string set to be solved is obtained, the character string intersection in the second character string set comprises the character string in the second character string set, and for each intersection character string in the character string intersection, a character string corresponding to the intersection character string in the second character string set is determined to be a character string matched with the character string intersection corresponding to the character string intersection character string in the first character string set. By adopting the method, the character strings in the first character string set are expanded according to the preset fuzzy matching rule to obtain a first derivative character string set, the character string intersection is obtained through the set intersection operation, and the matched character strings are further determined.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a method for fuzzy matching of character strings according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for fuzzy matching of strings according to another embodiment of the present application;
fig. 3 is a schematic structural diagram of a fuzzy matching apparatus for character strings according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to provide an implementation scheme for reducing complexity of performing string fuzzy matching, embodiments of the present application provide a string fuzzy matching method, an apparatus and an electronic device, and a preferred embodiment of the present application is described below with reference to the drawings of the specification, it should be understood that the preferred embodiment described herein is only for illustrating and explaining the present application, and is not limited to the present application. And the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The embodiment of the present application provides a method for fuzzy matching of character strings, as shown in fig. 1, including:
step 11, generating first derivative character strings meeting a preset fuzzy matching rule with character strings in a first character string set according to the preset fuzzy matching rule aiming at the character strings in the first character string set, and obtaining a first derivative character string set consisting of the first derivative character strings and the character strings in the first character string set;
step 12, performing intersection operation on the first derivative character string set and the character string set to be intersected to obtain a character string intersection of the first derivative character string set and the character string set to be intersected, wherein the character string set to be intersected comprises character strings in a second character string set;
and step 13, aiming at each intersection character string in the character string intersection, determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set.
By adopting the fuzzy matching method for the character strings, the character strings in the first character string set are expanded according to the preset fuzzy matching rule to obtain a first derivative character string set, the character string intersection is obtained through the set intersection operation, and the matched character strings are further determined.
In the embodiment of the application, because the first character string set is used as the target character string in cooperation, and the character string matched with the character string of the first character string set is searched from the second character string set, based on the characteristic of the preset fuzzy matching rule, the first derivative character string can be generated only aiming at the character string in the first character string set, for example, the preset fuzzy matching rule is that one letter is lacked, at this time, the second derivative character string does not need to be generated aiming at the character string in the second character string set, and the second character string set can be directly used as the character string set to be submitted; or, if the generated first derivative character strings can include all character strings meeting the preset fuzzy matching rule, the first derivative character strings may also be generated only for the character strings in the first character string set, and the second derivative character strings do not need to be generated for the character strings in the second character string set.
Based on the characteristics of the preset fuzzy matching rule, according to the preset fuzzy matching rule, a second derivative character string meeting the preset fuzzy matching rule with the character string in the second character string set can be generated for the character string in the second character string set, so as to obtain a second derivative character string set composed of each second derivative character string and the character string in the second character string set, where the second derivative character string set is used as a character string set to be submitted, for example, the preset fuzzy matching rule replaces a letter, and at this time, the second derivative character string set can be generated in the above manner, used as the character string set to be submitted, and is used for carrying out the submitting operation with the first derivative character string set.
In the embodiment of the application, the Intersection operation is executed for the first derivative string Set and the to-be-intersected string Set, and specifically, privacy Set Intersection (PSI) operation can be performed, so that fuzzy matching of strings of a ciphertext can be realized, the purpose of fuzzy matching of the strings can be achieved, information of the string sets held by both sides can be hidden, and the safety of data use is improved.
The method and apparatus provided herein are described in detail below with reference to the accompanying drawings using specific embodiments.
An embodiment of the present application provides a method for fuzzy matching of character strings, as shown in fig. 2, including:
and step 21, aiming at the character strings in the first character string set, generating first derivative character strings meeting the preset fuzzy matching rule with the character strings in the first character string set according to the preset fuzzy matching rule, and obtaining a first derivative character string set consisting of the first derivative character strings and the character strings in the first character string set.
In this step, for a character string (referred to as a first character string for convenience of description) in the first character string set, according to a preset fuzzy matching rule, all first derivative character strings that satisfy the preset fuzzy matching rule with the first character string may be generated, for example, if the preset fuzzy matching rule is to replace one letter, each position of the first character string may be replaced with another 25 letters, respectively, for one first character string, so as to obtain all first derivative character strings of the first character string.
All the first derivative character strings meeting the preset fuzzy matching rule with the first character string are generated and can be used for subsequent intersection operation, but all the first derivative character strings are generated, so that the number of the character strings in the first derivative character string set is large, and the calculation amount of the subsequent intersection operation is large.
In order to further reduce the complexity and reduce the calculation amount of the intersection operation, in this embodiment of the application, for this step, specifically, a wildcard may be used according to a preset fuzzy matching rule for a character string in the first character string set, and a first derivative character string that satisfies the preset fuzzy matching rule with the character string in the first character string set is generated, where the first derivative character string includes the wildcard.
For example, in one example of the present application, the first set of strings is [ mike, abc ], the second set of strings is [ mika, ac ], fuzzy matching rules are preset as a replacement rule and a deletion rule, the replacement rule indicates that one letter is allowed to be replaced, and the deletion rule indicates that one letter is allowed to be deleted.
For the above example, the first derivative string may be generated using wildcards "# according to the substitution rule, the wildcards need not be used according to the deletion rule, [ mike, abc ] for the first set of strings, and the first derivative string set containing the first derivative string and the first string is generated as [ mike, [ ike, m ] ke, mi, mik, ike, mke, mie, mik, abc, [ bc, a ] c, ab, ac, ab, bc ] according to the substitution rule and the deletion rule.
As can be seen from the above example, the generation of the first derived character string by using the wildcards can significantly reduce the number of character strings in the first derived character string set, thereby reducing the calculation amount of the subsequent intersection operation.
And step 22, generating second derivative character strings meeting the preset fuzzy matching rule with the character strings in the second character string set according to the preset fuzzy matching rule aiming at the character strings in the second character string set to obtain a second derivative character string set consisting of the second derivative character strings and the character strings in the second character string set, wherein the second derivative character string set is used as a character string set to be solved.
This step is an optional step, and may not be executed if the first derivative character string generated in step 21 can include all character strings that satisfy the preset fuzzy matching rule.
Alternatively, based on the characteristics of the preset fuzzy matching rule, the step, such as the missing rule, may not be performed.
And when the step is not executed, directly taking the second character string set as a character string set to be submitted, and executing subsequent submitting operation.
When the wildcard is used to generate the first derived character string in step 21, in this step, correspondingly, for the character strings in the second character string set (called as second character strings for convenience of description), according to the preset fuzzy matching rule, the wildcard is used to generate a second derived character string satisfying the preset fuzzy matching rule with the second character string, where the second derived character string includes the wildcard.
For the above example, wildcards "# may be used to generate the second derivative string according to the replacement rules, wildcards need not be used according to the deletion rules, mika, ac for the second set of strings, and mika, mi, mik, ac, mic, a for the second set of derived strings and the second string according to the replacement rules and the deletion rules.
And after the step is executed, taking the obtained second derivative character string set as a character string set to be subjected to intersection, and executing subsequent intersection operation.
And step 23, performing intersection operation on the first derivative character string set and the character string set to be intersected to obtain a character string intersection of the first derivative character string set and the character string set to be intersected, wherein the character string set to be intersected comprises character strings in the second character string set.
In this step, for the first derived string set and the to-be-solved string set, in order to further reduce the amount of calculation and improve the efficiency during the specific interaction operation, the first derived string set and the to-be-solved string set may be split into each first derived string subset and each to-be-solved string subset according to the length of the string, and the length of the string included in each first derived string subset and each to-be-solved string subset is the same;
then, the intersection operation is executed respectively for the first derivative character string subset and the character string subset to be intersected, which have the same character string length, so that the number of character strings contained in the character string subset executing the intersection operation each time is small, the calculation amount can be reduced on the whole, and the efficiency is improved.
In some practical application scenarios, aiming at each first derivative character string subset obtained by splitting and each character string subset to be subjected to intersection, intersection operations can be sequentially executed according to the sequence of the length of the character strings from small to large or from large to small, and aiming at a first character string corresponding to an intersection character string in an intersection set, the first character string and the first derivative character string thereof are removed in the intersection operation of the next length subset, so that the calculation amount is reduced, and the efficiency is improved.
In the embodiment of the application, under the condition that the computing resources are enough, the intersection operation can be executed in parallel aiming at each group of the first derivative character string subset and the character string subset to be intersected, which have the same length, so that the efficiency is improved.
For the above example, the first set of derived strings is [ mike, [ ike, [ m ] ke, mi ] e, mik, ike, mke, mie, mik, abc, [ bc, a ] c, ab, ac, ab, bc ], the second set of derived strings is [ mika, [ ika, m ] ka, mi ] a, mik, ika, mka, mia, mik, ac ], and the intersection of the resulting strings is [ mik, ac ] after performing the intersection operation on the two sets.
At present, in the prior art, only fuzzy matching of character strings can be performed on plaintext, however, in the technical field of multiparty security computation and privacy computation, an application scenario of performing fuzzy matching of character strings based on ciphertext may also exist, and for this problem, in this step, a privacy set intersection operation may be specifically performed on a first derivative character string set and a character string set to be intersected, that is, a PSI operation is performed through a 2PC protocol, so that information of the character string sets held by both parties can be hidden, and the security of data use is improved.
And 24, aiming at each intersection character string in the character string intersection, determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set.
In this step, the intersection string may be the first string and/or the second string itself, and the string corresponding to the intersection string is the first string and/or the second string itself;
the intersection string may also be the first derivative string and/or the second derivative string, and then the string corresponding to the intersection string is the first string and/or the second string on which the first derivative string and/or the second derivative string is generated, respectively.
For the above example, after the intersection operation in step 23 is performed, the intersection of the obtained strings is [ mik, ac ], and for the intersection strings "mik" and "mik", the corresponding second string in the second string set [ mika, ac ] is "mika", and the corresponding first string in the first string set [ mike, abc ] is "mike", so that it may be determined that the second string "mika" matches the first string "mike";
similarly, for the intersection string "ac" therein, the corresponding second string in the second string set [ mika, ac ] is "ac", and the corresponding first string in the first string set [ mike, abc ] is "abc", so that it can be determined that the second string "ac" matches the first string "abc".
By adopting the method shown in fig. 2 provided by the embodiment of the application, the intersection of the character strings is obtained through the set intersection operation, and the matched character strings are further determined.
In addition, when generating the derivative character string set, wildcards can be used, thereby further reducing the amount of calculation and improving the efficiency.
In practical application, for some longer character strings, if the character string is composed of a plurality of parts with specific meanings, for example, a character string composed of surnames and names, the character string can be split according to the parts, and in the fuzzy matching method for the character strings provided by the embodiment of the application, matching is performed on one part, matching of the other part is performed on the basis of a matching result, and then the matched part needs to be added when matching of the other part is performed, so that the number of derived character strings can be reduced, and the matching efficiency is improved.
Based on the same inventive concept, according to the method for fuzzy matching of character strings provided in the foregoing embodiment of the present application, correspondingly, another embodiment of the present application further provides a fuzzy matching device for character strings, a schematic structural diagram of which is shown in fig. 3, and specifically includes:
the character string generating module 31 is configured to generate, according to a preset fuzzy matching rule, first derivative character strings that satisfy the preset fuzzy matching rule with the character strings in the first character string set for the character strings in the first character string set, and obtain a first derivative character string set composed of the first derivative character strings and the character strings in the first character string set;
an intersection operation module 32, configured to perform an intersection operation on the first derivative string set and a to-be-solved string set to obtain a string intersection of the first derivative string set and the to-be-solved string set, where the to-be-solved string set includes a string in a second string set;
and a character string matching module 33, configured to determine, for each intersection character string in the character string intersection, a character string in the second character string set corresponding to the intersection character string as a character string that matches the character string in the first character string set corresponding to the intersection character string.
Further, the set of character strings to be solved is the second set of character strings.
Further, the character string generating module 31 is further configured to generate, according to the preset fuzzy matching rule, a second derivative character string that satisfies the preset fuzzy matching rule with a character string in the second character string set for a character string in the second character string set, to obtain a second derivative character string set composed of each second derivative character string and a character string in the second character string set, where the second derivative character string set is used as a character string set to be solved.
Further, the intersection operation module 32 is specifically configured to split the first derived character string set and the to-be-solved character string set into each first derived character string subset and each to-be-solved character string subset according to the length of the character string, where the lengths of the character strings included in each first derived character string subset and each to-be-solved character string subset are the same;
and respectively executing intersection operation aiming at the first derivative character string subset and the character string subset to be intersected, which have the same character string length.
Further, the intersection operation module 32 is specifically configured to perform a privacy set intersection operation on the first derivative string set and the to-be-intersected string set.
Further, the character string derivation module 31 is specifically configured to generate, according to a preset fuzzy matching rule and using wildcards, a first derived character string that satisfies the preset fuzzy matching rule with a character string in a first character string set, where the first derived character string includes the wildcards, for the character string in the first character string set;
and aiming at the character strings in a second character string set, generating a second derivative character string meeting the preset fuzzy matching rule with the character strings in the second character string set by using the wildcard according to the preset fuzzy matching rule, wherein the second derivative character string comprises the wildcard.
The functions of the above modules may correspond to the corresponding processing steps in the flows shown in fig. 1 and fig. 2, and are not described herein again.
The character string fuzzy matching device provided by the embodiment of the application can be realized by a computer program. It should be understood by those skilled in the art that the above-mentioned module division is only one of many module division, and if the module division is divided into other modules or not, it is within the scope of the present application as long as the string fuzzy matching apparatus has the above-mentioned functions.
Embodiments of the present application further provide an electronic device, as shown in fig. 4, including a processor 41 and a machine-readable storage medium 42, where the machine-readable storage medium 42 stores machine-executable instructions capable of being executed by the processor 41, and the processor 41 is caused by the machine-executable instructions to: and realizing any character string fuzzy matching method.
An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any one of the above character string fuzzy matching methods.
Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above string fuzzy matching methods.
The machine-readable storage medium in the electronic device may include a Random Access Memory (RAM) and a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and in the relevant places, reference may be made to the partial description of the method embodiment.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method for fuzzy matching of character strings, comprising:
generating first derivative character strings meeting a preset fuzzy matching rule with character strings in a first character string set according to the preset fuzzy matching rule aiming at the character strings in the first character string set to obtain a first derivative character string set consisting of the first derivative character strings and the character strings in the first character string set;
performing intersection operation on the first derivative character string set and a character string set to be subjected to intersection to obtain a character string intersection of the first derivative character string set and the character string set to be subjected to intersection, wherein the character string set to be subjected to intersection comprises character strings in a second character string set;
and for each intersection character string in the character string intersection, determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set.
2. The method of claim 1, wherein the set of to-be-intersected strings is the second set of strings.
3. The method of claim 1, prior to performing an intersection operation on the first set of derivative strings and a set of to-be-intersected strings, further comprising:
and aiming at the character strings in the second character string set, generating second derivative character strings meeting the preset fuzzy matching rule with the character strings in the second character string set according to the preset fuzzy matching rule, and obtaining a second derivative character string set consisting of the second derivative character strings and the character strings in the second character string set, wherein the second derivative character string set is used as a character string set to be submitted.
4. The method of claim 1, wherein performing an intersection operation on the first set of derivative strings and a set of to-be-intersected strings comprises:
splitting the first derivative character string set and the character string set to be solved into each first derivative character string subset and each character string subset to be solved according to the length of the character strings, wherein the length of the character strings contained in each first derivative character string subset and each character string subset to be solved is the same;
and performing intersection operation respectively for the first derivative character string subset and the character string subset to be intersected, which have the same character string length.
5. The method of claim 1, wherein performing an intersection operation on the first set of derivative strings and a set of to-be-intersected strings comprises:
and carrying out privacy set intersection operation on the first derivative character string set and the character string set to be intersected.
6. The method of claim 3, wherein the generating, for the character strings in the first character string set, a first derivative character string satisfying a preset fuzzy matching rule with the character strings in the first character string set according to the preset fuzzy matching rule comprises:
aiming at character strings in a first character string set, generating a first derivative character string meeting a preset fuzzy matching rule with the character strings in the first character string set by using wildcards according to the preset fuzzy matching rule, wherein the first derivative character string comprises the wildcards;
the generating, according to the preset fuzzy matching rule, a second derivative character string that satisfies the preset fuzzy matching rule with the character string in the second character string set for the character string in the second character string set includes:
and aiming at the character strings in a second character string set, generating a second derivative character string meeting the preset fuzzy matching rule with the character strings in the second character string set by using the wildcard characters according to the preset fuzzy matching rule, wherein the second derivative character string comprises the wildcard characters.
7. A string fuzzy matching apparatus, comprising:
the character string generating module is used for generating first derivative character strings meeting a preset fuzzy matching rule with the character strings in the first character string set according to the preset fuzzy matching rule aiming at the character strings in the first character string set, and obtaining a first derivative character string set consisting of all the first derivative character strings and the character strings in the first character string set;
the intersection solving operation module is used for executing intersection solving operation on the first derivative character string set and a character string set to be solved to obtain a character string intersection of the first derivative character string set and the character string set to be solved, wherein the character string set to be solved comprises character strings in a second character string set;
and the character string matching module is used for determining the character string corresponding to the intersection character string in the second character string set as the character string matched with the character string corresponding to the intersection character string in the first character string set aiming at each intersection character string in the character string intersection set.
8. The apparatus according to claim 7, wherein the intersection operation module is specifically configured to split the first derived string set and the to-be-intersected string set into first derived string subsets and to-be-intersected string subsets according to string lengths, where the lengths of strings included in each of the first derived string subsets and the to-be-intersected string subsets are the same;
and respectively executing intersection operation aiming at the first derivative character string subset and the character string subset to be intersected, which have the same character string length.
9. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method of any one of claims 1 to 6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 6.
CN202211112979.2A 2022-09-14 2022-09-14 Character string fuzzy matching method and device and electronic equipment Active CN115203495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211112979.2A CN115203495B (en) 2022-09-14 2022-09-14 Character string fuzzy matching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211112979.2A CN115203495B (en) 2022-09-14 2022-09-14 Character string fuzzy matching method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN115203495A true CN115203495A (en) 2022-10-18
CN115203495B CN115203495B (en) 2022-11-29

Family

ID=83573573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211112979.2A Active CN115203495B (en) 2022-09-14 2022-09-14 Character string fuzzy matching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115203495B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130066898A1 (en) * 2011-09-09 2013-03-14 Microsoft Corporation Matching target strings to known strings
CN103440865A (en) * 2013-08-06 2013-12-11 普强信息技术(北京)有限公司 Post-processing method for voice recognition
CN111079421A (en) * 2019-11-25 2020-04-28 北京小米智能科技有限公司 Text information word segmentation processing method, device, terminal and storage medium
CN112861175A (en) * 2021-02-03 2021-05-28 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN114357257A (en) * 2022-01-07 2022-04-15 上海盎维信息技术有限公司 Wildcard character generation method and wildcard character generation device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130066898A1 (en) * 2011-09-09 2013-03-14 Microsoft Corporation Matching target strings to known strings
CN103440865A (en) * 2013-08-06 2013-12-11 普强信息技术(北京)有限公司 Post-processing method for voice recognition
CN111079421A (en) * 2019-11-25 2020-04-28 北京小米智能科技有限公司 Text information word segmentation processing method, device, terminal and storage medium
CN112861175A (en) * 2021-02-03 2021-05-28 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN114357257A (en) * 2022-01-07 2022-04-15 上海盎维信息技术有限公司 Wildcard character generation method and wildcard character generation device

Also Published As

Publication number Publication date
CN115203495B (en) 2022-11-29

Similar Documents

Publication Publication Date Title
Lu et al. Confidentiality-preserving image search: A comparative study between homomorphic encryption and distance-preserving randomization
TWI745861B (en) Data processing method, device and electronic equipment
Afrati et al. Fuzzy joins using mapreduce
JP7302987B2 (en) Data Augmentation Methods, Devices, and Systems for Improved Fraud Detection
CN109446407A (en) Correlation recommendation method, apparatus, computer equipment and storage medium
CN110689349B (en) Transaction hash value storage and searching method and device in blockchain
CN111552799B (en) Information processing method, information processing device, electronic equipment and storage medium
CN105868305A (en) A fuzzy matching-supporting cloud storage data dereplication method
Mitzenmacher A model for learned bloom filters and related structures
US10217469B2 (en) Generation of a signature of a musical audio signal
Bingöl et al. An efficient 2-party private function evaluation protocol based on half gates
CN111552798B (en) Name information processing method and device based on name prediction model and electronic equipment
Zheng et al. Encrypted video search: Scalable, modular, and content-similar
CN115203495B (en) Character string fuzzy matching method and device and electronic equipment
Lisin et al. Order-preserving encryption as a tool for privacy-preserving machine learning
CN114816243B (en) Log compression method and device, electronic equipment and storage medium
CN115314268A (en) Malicious encrypted traffic detection method and system based on traffic fingerprints and behaviors
CN111552890B (en) Name information processing method and device based on name prediction model and electronic equipment
Ying et al. A novel rainbow table sorting method
Yang et al. Effective error-tolerant keyword search for secure cloud computing
Lin et al. Combining ordinal preferences by boosting
Thing et al. Rainbow table optimization for password recovery
CN111400624A (en) Multifunctional sequencing system
Karakasidis et al. Efficient privacy preserving record linkage at scale using Apache Spark
Catalano et al. Verifiable pattern matching on outsourced texts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant