CN106933834B

CN106933834B - Data matching method and device

Info

Publication number: CN106933834B
Application number: CN201511018347.XA
Authority: CN
Inventors: 皇甫庆彬
Original assignee: Youxinpai Beijing Information Technology Co ltd
Current assignee: Hefei Youquan Information Technology Co ltd
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2020-09-08
Anticipated expiration: 2035-12-29
Also published as: CN106933834A

Abstract

The application discloses a data matching method and device. In the method, firstly, a character string to be matched and a character string set to be matched are obtained, and then matching parameters of each character string in the character string set and the character string to be matched are respectively calculated, wherein the matching parameters comprise: and determining a target character string matched with the character string to be matched in the character string set according to the matching parameters. By adopting the scheme disclosed by the application to carry out data matching, the matched character strings are not required to be completely equal, and whether the matching is carried out or not can be determined according to the matching parameters, so that the matching rate is improved.

Description

Data matching method and device

Technical Field

The present disclosure relates to the field of data matching technologies, and in particular, to a data matching method and apparatus.

Background

With the development of information technology, the data volume of various information is continuously expanding. Data matching is usually required in order to clarify the relationship between different data. Wherein, data matching refers to registration between data according to some internal relation.

In the prior art, when data matching is performed, an congruent matching method is usually adopted, in the method, characters in two character strings to be matched are compared one by one, and if the two character strings are completely equal, the matching is considered to be successful.

However, in the research process of the present application, the inventor finds that when the data matching is performed by using the congruent matching method, the matching rate is not high, and a large amount of data cannot be matched because the matching is confirmed to be successful only by the fact that two character strings are completely equal.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a data matching method and apparatus.

In order to solve the technical problem, the embodiment of the invention discloses the following technical scheme:

according to a first aspect of the embodiments of the present disclosure, there is provided a data matching method, including:

acquiring a character string to be matched and a character string set needing to be matched;

respectively calculating matching parameters of each character string in the character string set and the character string to be matched, wherein the matching parameters comprise: similarity and/or amount of matching;

and determining a target character string matched with the character string to be matched in the character string set according to the matching parameters.

Preferably, if the matching parameter is similarity, the calculating the matching parameter between each character string in the character string set and the character string to be matched respectively includes:

21) selecting any character string from the character string set as a calculation character string, and acquiring the length str1 of the character string to be matched₁；

22) Acquiring the length str2 of the calculation character string₁；

23) Acquiring the maximum common substrings of the character strings to be matched and the calculation character strings, calculating the length of the maximum common substring, and respectively acquiring the number of the maximum common substrings in the calculation character strings and the character strings to be matched;

24) removing the maximum common substring contained in the calculation character string, obtaining a new calculation character string, removing the maximum common substring contained in the matching character string, obtaining a new matching character string, returning to execute the step 23), and executing the operation of the step 25) until the character string to be matched and the calculation character string do not contain the maximum common substring;

25) according to the length str1 of the character string to be matched₁Length str2 of the calculated string₁Calculating the similarity between the calculated character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculated character string and the character string to be matched;

26) selecting another character string from the character string set as a calculation character string, and returning to execute the operation of the step 22) until the similarity between all the character strings contained in the character string set and the character string to be matched is obtained.

Preferably, the similarity between the calculation character string and the character string to be matched is calculated by adopting the following formula:

of these, str1₁Representing the length of the character string to be matched before the maximum common substring is not removed; str2₁Representing the length of the character string to be calculated before the maximum common substring is not removed; m is_iRepresenting the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated; k is a radical of_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2Representing the number of the obtained ith maximum public substrings in the character string to be calculated; setting the number of the obtained maximum public substrings as a when calculating the character strings to be matched and calculating the matching parameters of the character strings, wherein n represents any numerical value not less than a; and L represents the similarity of the character string to be matched and the calculation character string.

Preferably, if the matching parameter is a matching degree, the calculating the matching parameter between each character string in the character string set and the character string to be matched respectively includes:

41) selecting one character string from the character string set as a calculation character string;

42) acquiring the maximum common substrings of the character strings to be matched and the calculation character strings, calculating the length of the maximum common substring, and respectively acquiring the number of the maximum common substrings in the calculation character strings and the character strings to be matched;

43) removing the maximum common substring contained in the calculation character string, obtaining a new calculation character string, removing the maximum common substring contained in the matching character string, obtaining a new matching character string, and returning to execute the step 42), and executing the operation of the step 44) until the character string to be matched and the calculation character string do not contain the maximum common substring;

44) calculating the matching degree of the calculation character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched;

45) selecting another character string from the character string set as a calculation character string, and returning to execute the operation of the step 22) until the similarity between all the character strings contained in the character string set and the character string to be matched is obtained.

Preferably, the matching degree between the calculation character string and the character string to be matched is calculated by adopting the following formula:

wherein m is_iRepresenting the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated; k is a radical of_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2Representing the number of the obtained ith maximum public substrings in the character string to be calculated; setting the number of the obtained maximum public substrings as a when calculating the character strings to be matched and calculating the matching parameters of the character strings, wherein n represents any numerical value not less than a; and E represents the matching degree of the character string to be matched and the calculation character string.

According to a second aspect of the embodiments of the present disclosure, there is provided a data matching apparatus including:

the acquisition module is used for acquiring a character string to be matched and a character string set to be matched;

a calculating module, configured to calculate matching parameters between each character string in the character string set and the character string to be matched, where the matching parameters include: similarity and/or amount of matching;

and the determining module is used for determining a target character string matched with the character string to be matched in the character string set according to the matching parameters.

Preferably, if the matching parameter is similarity, the calculating module includes:

a first length obtaining unit, configured to select any character string from the character string set as a calculation character string, and obtain a length str1 of the character string to be matched₁；

A second length acquisition unit for acquiring the length str2 of the calculation string₁；

The first maximum common substring obtaining unit is used for obtaining the maximum common substring of the character string to be matched and the calculation character string, calculating the length of the maximum common substring, and respectively obtaining the number of the maximum common substring in the calculation character string and the character string to be matched;

the first removal unit is used for removing the maximum common substring contained in the calculation character string, obtaining a new calculation character string, removing the maximum common substring contained in the matching character string, obtaining a new matching character string, triggering the first maximum common substring obtaining unit to execute operation until the character string to be matched and the calculation character string do not contain the maximum common substring, and triggering the similarity calculation unit to execute operation;

a similarity calculation unit for calculating the similarity according to the length str1 of the character string to be matched₁Length str2 of the calculated string₁Calculating the similarity between the calculated character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculated character string and the character string to be matched;

and the first selection unit is used for selecting another character string from the character string set as a calculation character string, and triggering the second length acquisition unit to execute corresponding operation until the similarity between all the character strings contained in the character string set and the character string to be matched is acquired.

Preferably, the similarity calculation unit calculates the similarity between the calculation character string and the character string to be matched by using the following formula:

of these, str1₁Representing the length of the character string to be matched before the maximum common substring is not removed; str2₁Representing the length of the character string to be calculated before the maximum common substring is not removed; m is_iShowing the matching parameters in calculating the character string to be matched and calculating the character stringWhen counting, the length of the ith maximum common substring is obtained; k is a radical of_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2Representing the number of the obtained ith maximum public substrings in the character string to be calculated; setting the number of the obtained maximum public substrings as a when calculating the character strings to be matched and calculating the matching parameters of the character strings, wherein n represents any numerical value not less than a; and L represents the similarity of the character string to be matched and the calculation character string.

Preferably, if the matching parameter is a matching degree, the calculating module includes:

a calculation character string obtaining unit, configured to select any one character string from the character string set as a calculation character string;

the second maximum common substring obtaining unit is used for obtaining the maximum common substring of the character string to be matched and the calculation character string, calculating the length of the maximum common substring, and respectively obtaining the number of the maximum common substring in the calculation character string and the character string to be matched;

the second removal unit is used for removing the maximum common substring contained in the calculation character string, acquiring a new calculation character string, removing the maximum common substring contained in the matching character string, acquiring a new matching character string, returning to execute corresponding operations by the second maximum common substring acquisition unit until the character string to be matched and the calculation character string do not contain the maximum common substring, and triggering the matching degree calculation unit to execute the operations;

the matching degree calculation unit is used for calculating the matching degree of the calculation character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched;

and the second selection unit is used for selecting another character string from the character string set as a calculation character string and triggering the second maximum common sub-string acquisition unit to execute operation until the similarity between all the character strings contained in the character string set and the character string to be matched is acquired.

Preferably, the matching degree calculating unit calculates the matching degree between the calculation character string and the character string to be matched by using the following formula:

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

when data matching is carried out through the method and the device, after the character string to be matched and the character string set needing to be matched are obtained, the target character string matched with the character string to be matched is determined according to the matching parameters of each character string in the character string set and the character string to be matched. The method is adopted to carry out data matching, the matched character strings are not required to be completely equal, and whether the matching is carried out or not can be determined according to the matching parameters, so that the matching rate is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic workflow diagram illustrating a data matching method according to an example embodiment;

FIG. 2 is a schematic diagram illustrating a workflow for calculating similarity in a data matching method according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a process of calculating a degree of match in a data matching method according to an exemplary embodiment;

fig. 4 is a schematic structural diagram illustrating a data matching apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In order to solve the problems that the matching rate is not high and a large amount of data cannot be matched when data matching is carried out through the prior art, the application discloses a data matching method and a data matching device.

The embodiment of the application discloses a data matching method. Referring to the workflow diagram shown in fig. 1, the data matching method includes the following steps:

and step S11, acquiring the character string to be matched and the character string set needing to be matched.

When data matching is performed, it is often necessary to match a plurality of character strings with a certain character string to determine whether the plurality of character strings match the certain character string, in this case, the certain character string is referred to as a character string to be matched, and the plurality of character strings form a character string set to be matched.

Step S12, respectively calculating matching parameters between each character string in the character string set and the character string to be matched, where the matching parameters include: similarity and/or amount of match.

The similarity is used for representing the similarity between each character string in the character string set and the character string to be matched, and the matching degree is used for representing the matching information between each character string in the character string set and the character string to be matched.

And step S13, determining a target character string matched with the character string to be matched in the character string set according to the matching parameters. And the target character string is the character string matched with the character string to be matched.

The first embodiment of the application discloses a data matching method, which is characterized in that after a character string to be matched and a character string set to be matched are obtained, a target character string matched with the character string to be matched is determined according to matching parameters of each character string in the character string set and the character string to be matched. The method is adopted to carry out data matching, the matched character strings are not required to be completely equal, and whether the matching is carried out or not can be determined according to the matching parameters, so that the matching rate is improved.

In this application, the matching parameters include: similarity and/or amount of match. If the matching parameters are similarity, referring to the workflow diagram shown in fig. 2, the step of calculating the matching parameters of each character string in the character string set and the character string to be matched respectively includes the following steps:

step S21, selecting one optional character string from the character string set as a calculation character string, and obtaining the length str1 of the character string to be matched₁。

Step S22, obtaining the length str2 of the calculation character string₁。

And S23, judging whether the character string to be matched and the calculation character string have the same substring, if so, executing the operation of S24, and if not, executing the operation of S26.

Step S24, if the same substring exists between the character string to be matched and the calculation character string, obtaining the maximum common substring of the character string to be matched and the calculation character string, calculating the length of the maximum common substring, and respectively obtaining the number of the maximum common substring in the calculation character string and the number of the maximum common substring in the character string to be matched. Wherein, the maximum common substring refers to the longest identical substring between two given character strings (i.e. the character string to be matched and the calculation character string).

And S25, removing the maximum common substring contained in the calculation character string to obtain a new calculation character string, removing the maximum common substring contained in the matching character string to obtain a new matching character string, and then returning to execute the operation of the S23.

Step S26, according to the length str1 of the character string to be matched₁Length str2 of the calculated string₁The length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched, calculating the similarity between the calculation character string and the character string to be matched, and then executing the operation of step S27.

And step S27, judging whether a character string with the similarity not calculated exists in the character string set, if so, executing the operation of step S28, and if not, executing the operation of step S29.

Step S28, selecting another character string from the character string set as a calculation character string, and returning to perform the operation of step S22. And completing the calculation of the similarity until the similarity between all the character strings contained in the character string set and the character string to be matched is obtained.

And step S29, finishing the calculation of the similarity at this time.

Wherein in step S25, the length str1 according to the character string to be matched is disclosed₁Length str2 of the calculated string₁And calculating the matching parameters of the calculation character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched. If the matching parameter is similarity, calculating the similarity between the calculation character string and the character string to be matched by adopting the following formula:

wherein, L represents the similarity of the character string to be matched and the calculation character string; str1₁Representing the length of the character string to be matched before the maximum common substring is not removed; str2₁Representing the length of the character string to be calculated before the largest common substring is not removed.

m_iAnd the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated is shown. Wherein m is₁Indicating the length of the first largest common substring retrieved, i.e. m₁Indicating the length, m, of the largest common substring between the matching string and the calculated string before the removal operation was not performed₂Indicating the length of the largest common substring between the obtained new matching and calculation strings after the first removal of the matching and calculation strings, and so on.

k_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2And the number of the obtained ith maximum common substring in the character string to be calculated is represented. For example, if the character string to be matched is "abcabcadd" and the calculated character string is "abcmnf" before any removal operation is not performed, the first largest common substring is "abc" and two "abcs" are included in the character string to be matched, k is₁₁A value of 2, if the calculated string contains an "abc", then k₁₂Is 1.

And setting the number of the acquired maximum common substrings as a when calculating the character strings to be matched and the matching parameters of the character strings to be matched, wherein n represents any numerical value not less than a. For example, if the character string to be matched and the calculated character string contain a maximum common substring before any removal operation is not performed, the maximum common substring between the new matched character string and the new calculated character string can still be obtained after the first removal operation is performed, but after the second removal operation is performed, the maximum common substring between the new matched character string and the new calculated character string does not contain any common substring any more, and the obtained maximum common substringThe number of substrings is 2, i.e., a is 2, n is a number not less than 2, and in this case, if i is a number greater than 2, m is a number greater than 2₁Is 0, and k_i1And k_i2Is 0.

If the matching parameter is the matching degree, referring to the workflow diagram shown in fig. 3, the step of calculating the matching parameter between each character string in the character string set and the character string to be matched respectively includes the following steps:

and step S31, selecting one character string from the character string set as a calculation character string.

And S32, judging whether the character string to be matched and the calculation character string have the same substring, if so, executing the operation of S33, and if not, executing the operation of S35.

Step S33, if the same substring exists between the character string to be matched and the calculation character string, obtaining the maximum common substring of the character string to be matched and the calculation character string, calculating the length of the maximum common substring, and respectively obtaining the number of the maximum common substring in the calculation character string and the number of the maximum common substring in the character string to be matched. Wherein, the maximum common substring refers to the longest identical substring between two given character strings (i.e. the character string to be matched and the calculation character string).

And S34, removing the maximum common substring contained in the calculation character string to obtain a new calculation character string, removing the maximum common substring contained in the matching character string to obtain a new matching character string, and then returning to execute the operation of the S32.

Step S35, calculating the matching degree of the calculation character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched, and then executing the operation of step S36.

And step S36, judging whether a character string with a matching degree not calculated yet exists in the character string set, if so, executing the operation of step S37, and if not, executing the operation of step S38.

Step S37, selecting another character string from the character string set as a calculation character string, and returning to perform the operation of step S32. And completing the calculation of the matching degree until the similarity between all the character strings contained in the character string set and the character string to be matched is obtained.

And step S38, finishing the calculation of the matching degree.

In step S35, an operation of calculating the matching degree between the calculated character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculated character string and the character string to be matched is disclosed. If the matching parameters are similarity, calculating the matching parameters of the calculation character string and the character string to be matched by adopting the following formula:

e represents the matching degree of the character string to be matched and the calculated character string; m is_iAnd the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated is shown. Wherein m is₁Indicating the length of the first largest common substring retrieved, i.e. m₁Indicating the length, m, of the largest common substring between the matching string and the calculated string before the removal operation was not performed₂Indicating the length of the largest common substring between the obtained new matching and calculation strings after the first removal of the matching and calculation strings, and so on.

k_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2And the number of the obtained ith maximum common substring in the character string to be calculated is represented. For example, if the character string to be matched is "abcabcadd" and the calculated character string is "abcmnf" before any removal operation is not performed, the first largest common substring is "abc" and two "abcs" are included in the character string to be matched, k is₁₁To 2, calculate the characterThe string contains an "abc", then k₁₂Is 1.

And setting the number of the acquired maximum common substrings as a when calculating the character strings to be matched and the matching parameters of the character strings to be matched, wherein n represents any numerical value not less than a. For example, if a maximum common substring is included in the character string to be matched and the calculated character string before any removal operation is performed, and after the first removal operation is performed, the maximum common substring between the new matched character string and the new calculated character string can still be obtained, but after the second removal operation is performed, the maximum common substring is no longer included between the new matched character string and the new calculated character string, the number of the obtained maximum common substrings is 2, that is, a is 2, n is a numerical value not less than 2, in this case, if i is a number greater than 2, m is a number greater than 2₁Is 0, and k_i1And k_i2Is 0.

In addition, if the same substring is not included between the character string to be matched and the calculation character string before any removal operation is not performed, the similarity and matching degree between the calculation character string and the character string to be matched are generally considered to be 0.

In step S13, an operation of determining a target character string in the character string set that matches the character string to be matched according to the matching parameter is disclosed. Specifically, when the target character string is determined, the character string in the character string set, of which the similarity, or the matching degree, or the combination of the similarity and the matching degree is within a preset range, may be determined as the target character string. In addition, after the matching parameters are obtained, the character strings included in the character string set are sequenced according to the sequence from the similarity, or the matching degree, or the combination of the similarity and the matching degree from large to small to obtain a new character string sequence, and then the first N character strings in the character string sequence are determined as target character strings, wherein the value of N is a preset positive integer.

When calculating the matching parameters of the character string to be matched and the calculation character string, the maximum common substring of the character string to be matched and the calculation character string needs to be obtained. When the maximum common substring is obtained, the two character strings can be combined into one character string using symbol intervals, such as a form of 'character string 1/character string 2', then the character strings before and after the symbol are matched, all the character strings (namely, the same substring) matched from the left to the right are added into an array, and then all the character strings in the array are sorted according to the length, wherein the largest length is the character string to be matched and the maximum common substring of the calculated character strings.

Accordingly, a second embodiment of the present application discloses a data matching apparatus, referring to the schematic structural diagram shown in fig. 4, the data matching apparatus includes: an acquisition module 100, a calculation module 200 and a determination module 300.

The acquiring module 100 is configured to acquire a character string to be matched and a character string set to be matched, and when data matching is performed, it is often necessary to match a plurality of character strings with a certain character string to determine whether the plurality of character strings are matched with the certain character string, where in this case, the certain character string is called as a character string to be matched, and the plurality of character strings form the character string set to be matched;

the calculating module 200 is configured to calculate matching parameters between each character string in the character string set and the character string to be matched, where the matching parameters include: the similarity and/or the matching quantity are/is used for representing the similarity between each character string in the character string set and the character string to be matched, and the matching quantity is used for representing the matching information between each character string in the character string set and the character string to be matched;

the determining module 300 is configured to determine, according to the matching parameter, a target character string in the character string set, which is matched with the character string to be matched. And the target character string is the character string matched with the character string to be matched.

In this application, the matching parameters include: similarity and/or amount of match. If the matching parameter is similarity, the calculating module 200 includes:

a first length obtaining unit, configured to select one of the character strings from the character string set as a calculation character string, and obtain the to-be-matched character stringLength of matching string str1₁；

The first maximum common substring obtaining unit is used for obtaining the maximum common substring of the character string to be matched and the calculation character string, calculating the length of the maximum common substring, and respectively obtaining the number of the maximum common substring in the calculation character string and the character string to be matched, wherein the maximum common substring refers to the longest identical substring between two given character strings;

The similarity calculation unit calculates the similarity between the calculation character string and the character string to be matched by adopting the following formula:

wherein, L tableDisplaying the similarity of the character string to be matched and the calculated character string; str1₁Representing the length of the character string to be matched before the maximum common substring is not removed; str2₁Representing the length of the character string to be calculated before the largest common substring is not removed.

m_iAnd the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated is shown. Wherein m is₁Indicating the length of the first largest common substring retrieved, i.e. m₁Indicating the length, m, of the largest common substring between the matching string and the calculated string before the removal operation was performed₂Indicating the length of the largest common substring between the obtained new matching and calculation strings after the first removal of the matching and calculation strings, and so on.

And setting the number of the acquired maximum common substrings as a when calculating the character strings to be matched and the matching parameters of the character strings to be matched, wherein n represents any numerical value not less than a. For example, if a maximum common substring is included in the character string to be matched and the calculated character string before any removal operation is performed, and after the first removal operation is performed, the maximum common substring between the new matched character string and the new calculated character string can still be obtained, but after the second removal operation is performed, the maximum common substring is no longer included between the new matched character string and the new calculated character string, the number of the obtained maximum common substrings is 2, that is, a is 2, n is a numerical value not less than 2, in this case, if i is a number greater than 2, then the number of the obtained maximum common substrings is 2m₁Is 0, and k_i1And k_i2Is 0.

Further, if the matching parameter is a matching degree, the calculating module 200 includes:

The matching degree calculation unit calculates the matching degree of the calculation character string and the character string to be matched by adopting the following formula:

wherein L represents the character to be matchedSimilarity of strings and calculation character strings; str1₁Representing the length of the character string to be matched before the maximum common substring is not removed; str2₁Representing the length of the character string to be calculated before the largest common substring is not removed.

And setting the number of the acquired maximum common substrings as a when calculating the character strings to be matched and the matching parameters of the character strings to be matched, wherein n represents any numerical value not less than a. For example, if a maximum common substring is included in the character string to be matched and the calculated character string before any removal operation is performed, and after the first removal operation is performed, the maximum common substring between the new matched character string and the new calculated character string can still be obtained, but after the second removal operation is performed, the maximum common substring is no longer included between the new matched character string and the new calculated character string, the number of the obtained maximum common substrings is 2, that is, a is 2, n is a numerical value not less than 2, in this case, if i is a number greater than 2, m is a number greater than 2₁Is 0, and，k_i1and k_i2Is 0.

The second embodiment of the application discloses a data matching device, which determines a target character string matched with a character string to be matched according to matching parameters of each character string in a character string set and the character string set to be matched after acquiring the character string to be matched and the character string set to be matched. The device is adopted to carry out data matching, the matched character strings are not required to be completely equal, and whether the matching is carried out or not can be determined according to the matching parameters, so that the matching rate is improved.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of data matching, comprising:

respectively calculating matching parameters of each character string in the character string set and the character string to be matched, wherein the matching parameters comprise: similarity;

determining a target character string matched with the character string to be matched in the character string set according to the matching parameters;

if the matching parameters are similarity, the calculating the matching parameters of each character string in the character string set and the character string to be matched respectively comprises:

22) Acquiring the length str2 of the calculation character string₁；

24) removing the maximum common substring contained in the calculation character string, obtaining a new calculation character string, removing the maximum common substring contained in the character string to be matched, obtaining a new matching character string, returning to execute the step 23), and executing the operation of the step 25) until the character string to be matched and the calculation character string do not contain the maximum common substring;

2. The method according to claim 1, wherein the similarity between the calculation character string and the character string to be matched is calculated by adopting the following formula:

of these, str1₁Representing the length of the character string to be matched before the maximum common substring is not removed; str2₁Indicating the length of the calculated character string before the largest common substring is not removed; m is_iRepresenting the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated; k is a radical of_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2Representing the number of the obtained ith maximum public substrings in the calculation character string; setting the number of the obtained maximum public substrings as a when calculating the character strings to be matched and calculating the matching parameters of the character strings, wherein n represents any numerical value not less than a; and L represents the similarity of the character string to be matched and the calculation character string.

3. A method of data matching, comprising:

respectively calculating matching parameters of each character string in the character string set and the character string to be matched, wherein the matching parameters comprise: matching amount;

if the matching parameters are matching quantities, the step of respectively calculating the matching parameters of each character string in the character string set and the character string to be matched comprises the following steps:

43) removing the maximum common substring contained in the calculation character string, obtaining a new calculation character string, removing the maximum common substring contained in the character string to be matched, obtaining a new matching character string, and returning to execute the step 42) until the character string to be matched and the calculation character string do not contain the maximum common substring, and then executing the operation of the step 44);

44) calculating the matching amount of the calculation character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched;

45) selecting another character string from the character string set as a calculation character string, and returning to execute the operation of the step 42) until obtaining the matching amount of all the character strings contained in the character string set and the character string to be matched.

4. The method according to claim 3, wherein the matching amount of the calculation character string and the character string to be matched is calculated by adopting the following formula:

wherein m is_iRepresenting the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated; k is a radical of_i1Representing the number of the obtained ith maximum public substrings in the character strings to be matched; k is a radical of_i2Representing the number of the obtained ith maximum public substrings in the calculation character string; setting the number of the obtained maximum public substrings as a when calculating the character strings to be matched and calculating the matching parameters of the character strings, wherein n represents any numerical value not less than a; and E represents the matching amount of the character string to be matched and the calculation character string.

5. A data matching apparatus, comprising:

a calculating module, configured to calculate matching parameters between each character string in the character string set and the character string to be matched, where the matching parameters include: similarity;

the determining module is used for determining a target character string matched with the character string to be matched in the character string set according to the matching parameters;

if the matching parameter is similarity, the calculating module includes:

the first removal unit is used for removing the maximum common substring contained in the calculation character string, acquiring a new calculation character string, removing the maximum common substring contained in the character string to be matched, acquiring a new matching character string, triggering the first maximum common substring acquisition unit to execute operation until the character string to be matched and the calculation character string do not contain the maximum common substring, and triggering the similarity calculation unit to execute operation;

6. The apparatus according to claim 5, wherein the similarity calculation unit calculates the similarity between the calculation string and the string to be matched using the following formula:

7. A data matching apparatus, comprising:

a calculating module, configured to calculate matching parameters between each character string in the character string set and the character string to be matched, where the matching parameters include: matching amount;

if the matching parameter is a matching quantity, the calculation module comprises:

the second removal unit is used for removing the maximum common substring contained in the calculation string, acquiring a new calculation string, removing the maximum common substring contained in the to-be-matched string, acquiring a new matched string, returning to execute corresponding operations by the second maximum common substring acquisition unit until the to-be-matched string and the calculation string do not contain the maximum common substring, and triggering the matching amount calculation unit to execute the operations;

the matching amount calculation unit is used for calculating the matching amount of the calculation character string and the character string to be matched according to the length of each maximum common substring and the number of the maximum common substrings in the calculation character string and the character string to be matched;

and the second selection unit is used for selecting another character string from the character string set as a calculation character string and triggering the second maximum common sub-string acquisition unit to execute operation until the matching amount of all the character strings contained in the character string set and the character string to be matched is acquired.

8. The apparatus according to claim 7, wherein the matching amount calculating unit calculates the matching amount of the calculation character string and the character string to be matched using the following formula:

wherein m is_iRepresenting the length of the ith maximum common substring acquired when the character string to be matched and the matching parameter of the character string are calculated; k is a radical of_i1Indicating the i-th acquiredThe number of the maximum common substrings in the character strings to be matched; k is a radical of_i2Representing the number of the obtained ith maximum public substrings in the calculation character string; setting the number of the obtained maximum public substrings as a when calculating the character strings to be matched and calculating the matching parameters of the character strings, wherein n represents any numerical value not less than a; and E represents the matching amount of the character string to be matched and the calculation character string.