CN109165326A - A kind of character string matching method and device - Google Patents

A kind of character string matching method and device Download PDF

Info

Publication number
CN109165326A
CN109165326A CN201810936946.7A CN201810936946A CN109165326A CN 109165326 A CN109165326 A CN 109165326A CN 201810936946 A CN201810936946 A CN 201810936946A CN 109165326 A CN109165326 A CN 109165326A
Authority
CN
China
Prior art keywords
field
character string
word
determined
repetitive rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810936946.7A
Other languages
Chinese (zh)
Inventor
曾伟雄
薛重阳
孟庆文
王维
刘晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bee Wisdom (beijing) Technology Co Ltd
Original Assignee
Bee Wisdom (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bee Wisdom (beijing) Technology Co Ltd filed Critical Bee Wisdom (beijing) Technology Co Ltd
Priority to CN201810936946.7A priority Critical patent/CN109165326A/en
Publication of CN109165326A publication Critical patent/CN109165326A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of character string matching method and devices.The described method includes: after obtaining the first character string and the second character string, two character strings can be segmented, and determine the corresponding field of each word that the two character strings include, and then the matching degree between the two character strings can be determined according to the weighted value of each field, if match degree is greater than the preset threshold, it may be considered that the two character strings match.Wherein, the weighted value of each field can be determined according to sample character string.In this way, matched accuracy between different character strings can be improved by the weighted value of setting different field;Further, compared with the prior art for the mode of middle artificial contrast, the embodiment of the present invention is not necessarily to artificial contrast, effectively reduces human cost, can simplify and carry out matched operation to enterprise name, and can also shorten match time.

Description

A kind of character string matching method and device
Technical field
The present invention relates to data science field more particularly to a kind of character string matching methods and device.
Background technique
Enterprise name matching is the very important technology in risk control field.For example, in financial industry, especially in credit row Industry often wants client to fill in enterprise name for risk management, and the enterprise name filled in client matches.As an example Son can be matched with the enterprise name that client fills in the enterprise name that its reference is reported, see whether before client whether Also in the enterprise work;Alternatively, can be compared with the enterprise name of the client and the enterprise name of other clients, see that the client is It is no that there are also colleague and mechanism clients.
The prior art generallys use the mode of artificial contrast to match, that is, recognizes when carrying out the matching of enterprise name Different enterprise names is matched for ground.Obviously, this mode human cost is higher and complicated for operation, takes a long time.
Based on this, a kind of character string matching method is needed at present, the side for solving to use artificial contrast in the prior art Formula, which carries out string matching, leads to the higher problem of human cost.
Summary of the invention
The embodiment of the present invention provides a kind of character string matching method and device, to solve to use artificial contrast in the prior art Mode carry out string matching and lead to the higher technical problem of human cost.
The embodiment of the present invention provides a kind of character string matching method, which comprises
Obtain the first character string and the second character string;
First character string and second character string are segmented respectively, obtaining first character string includes Each word that each word and second character string include;
According to the corresponding relationship of preset field and word, the corresponding word of each word that first character string includes is determined The corresponding field of each word that section and second character string include;
Each word that each word for including according to first character string and corresponding field, second character string include And the weighted value of corresponding field and each field, determine the matching of first character string and second character string Degree;The weighted value of each field is determined according to multiple sample character strings;
If it is determined that described, match degree is greater than the preset threshold, it is determined that first character string and the second character string phase Match.
In this way, matched accuracy between different character strings can be improved by the weighted value of setting different field;Into One step, compared with the prior art for the mode of middle artificial contrast, the embodiment of the present invention is not necessarily to artificial contrast, effectively reduces Human cost can simplify and carry out matched operation to enterprise name, and can also shorten match time.
In one possible implementation, the weighted value of each field determines in the following manner:
Each sample character string is segmented, each word that each sample character string includes is obtained;
According to the corresponding relationship of the field and word, each word for determining that each sample character string includes is corresponding Each field;
According to the repetitive rate of the corresponding word of each field, the repetitive rate of each field is determined;
According to the repetitive rate of each field, the weighted value of each field is determined.
In this way, the weighted value for the field determined according to the repetitive rate of each field, enables to the weighted value determined It is more accurate, it is more in line with the significance level of field, and then improve and carry out matched accuracy between different character strings.
In one possible implementation, according to the corresponding multiple words of each field, each word is determined Before the repetitive rate of section, the method also includes:
According to the corresponding word of each field, determine any one corresponding word of each field in the field pair The repetitive rate in word answered.
In one possible implementation, according to the repetitive rate of each field, the power of each field is determined Weight values, comprising:
According to the repetitive rate of each field, the discrimination between the corresponding multiple words of each field is determined;
According to the discrimination between the corresponding multiple words of each field, the corresponding total discrimination of all fields is determined;
According to the discrimination of each field and the corresponding total discrimination of all fields, the field is determined Weighted value.
The embodiment of the present invention provides a kind of string matching device, and described device includes:
Acquiring unit, for obtaining the first character string and the second character string;
Processing unit obtains described for segmenting respectively to first character string and second character string Each word that each word and second character string that one character string includes include;And it is closed according to preset field is corresponding with word System determines the corresponding field of each word that first character string includes and each word that second character string includes point Not corresponding field;And each word and corresponding field, second character string for according to first character string including The weighted value of each word and corresponding field and each field that include determines first character string and second word Accord with the matching degree of string;The weighted value of each field is determined according to multiple sample character strings;
Matching unit is used for if it is determined that described match degree is greater than the preset threshold, it is determined that first character string with it is described Second character string matches.
In one possible implementation, the processing unit is specifically used for:
Each sample character string is segmented, each word that each sample character string includes is obtained;And according to described The corresponding relationship of field and word determines the corresponding each field of each word that each sample character string includes;And according to every The repetitive rate of the corresponding word of a field, determines the repetitive rate of each field;And the repetitive rate according to each field, Determine the weighted value of each field.
In one possible implementation, the processing unit is according to the corresponding multiple words of each field, really Before the repetitive rate of fixed each field, it is also used to:
According to the corresponding word of each field, determine any one corresponding word of each field in the field pair The repetitive rate in word answered.
In one possible implementation, the specific unit is specifically used for:
According to the repetitive rate of each field, the discrimination between the corresponding multiple words of each field is determined;And According to the discrimination between the corresponding multiple words of each field, the corresponding total discrimination of all fields is determined;And according to The discrimination of each field and the corresponding total discrimination of all fields, determine the weighted value of the field.
The embodiment of the present application also provides a kind of device, which, which has, realizes character string matching method as described above Function.The function can execute corresponding software realization by hardware, and in a kind of possible design, which includes: place Manage device, transceiver, memory;The memory is for storing computer executed instructions, and the transceiver is for realizing the device and its He communicates communication entity, which is connect with the memory by the bus, and when the apparatus is operative, which holds Computer executed instructions of row memory storage, so that the device executes character string matching method as described above.
The embodiment of the present invention also provides a kind of computer storage medium, stores software program in the storage medium, this is soft Part program realizes word described in above-mentioned various possible implementations when being read and executed by one or more processors Accord with string matching method.
The embodiment of the present invention also provides a kind of computer program product comprising instruction, when run on a computer, So that computer executes character string matching method described in above-mentioned various possible implementations.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced.
Fig. 1 is a kind of flow diagram of character string matching method provided in an embodiment of the present invention;
Fig. 2 is flow diagram corresponding to a kind of determination method of the weighted value of field provided in an embodiment of the present invention;
Fig. 3 provides a kind of structural schematic diagram of string matching device for the embodiment of the present invention.
Specific embodiment
The application is specifically described with reference to the accompanying drawings of the specification, the concrete operation method in embodiment of the method can also To be applied in Installation practice.
The prior art is when determining whether two enterprise names match, in addition to that can also make by the way of manually comparing All-character matching is carried out with computer.However, this method is it is possible that erroneous judgement.For example, if required matched two Enterprise name is respectively " Legend Company " and " Baidu company ", using the matched mode of existing all-character, since " association is public Word " company " in this character string of department " is equal with word " company " in " Baidu company " this character string, therefore, the prior art It may think that " Legend Company " and " Baidu company " matches.Obviously, this cognition is wrong.
Based on this, the embodiment of the present invention provides a kind of character string matching method, as shown in Figure 1, mentioning for the embodiment of the present invention A kind of flow diagram of the character string matching method supplied, specifically comprises the following steps:
Step 101, the first character string and the second character string are obtained.
Step 102, first character string and second character string are segmented respectively, obtains first character Each word that each word and second character string that string includes include.
Step 103, according to the corresponding relationship of preset field and word, each word difference that first character string includes is determined The corresponding field of each word that corresponding field and second character string include.
Step 104, each word and corresponding field, the second character string packet for including according to first character string The weighted value of each word and corresponding field and each field that contain determines first character string and second character The matching degree of string.
Step 105, however, it is determined that described match degree is greater than the preset threshold, it is determined that first character string and second word Symbol string matches.
In this way, matched accuracy between different character strings can be improved by the weighted value of setting different field;Into One step, compared with the prior art for the mode of middle artificial contrast, the embodiment of the present invention is not necessarily to artificial contrast, effectively reduces Human cost can simplify and carry out matched operation to enterprise name, and can also shorten match time.
Specifically, in step 101, the matching of character string can be adapted for can also be applied between two character strings Matching between multiple character strings.Character string is made of multiple characters, and the character number for forming character string can basis Depending on specific requirements.The content for forming character string can also be depending on specific requirements, for example, if desired matched is enterprise's name Claim the character string of class, then the character string got can be " Legend Company ", " Baidu company " etc..
In step 102, character string generally includes multiple characters, it is contemplated that contacting between character and character, it can be first right Character string carries out word segmentation processing.For example, the first character string is " Beijing XXX information technology share Co., Ltd ", the character String can be divided into " Beijing ", " XXX ", " information technology ", " limited liability company " this four words after being segmented;Second character string For " Shanghai XXX information technology share Co., Ltd ", which can be divided into " Shanghai ", " XXX ", " information after being segmented Technology ", " limited liability company " this four words.
In step 103, according to the type of required matched character string, the corresponding field of such character string can be determined.With Character string type is for enterprise name class, it is generally the case that enterprise name is generally by administrative division, font size, industry, tissue shape Formula successively forms (laws and regulations are unless otherwise specified), that is to say, that the character string of enterprise name class generally has administrative area It draws, font size, industry, organizational form this four fields.
Further, according to the corresponding relationship of preset field and word, each word that can determine that character string includes is right respectively The field answered.Specifically, the corresponding relationship of field and word can determine in the following manner:
(1) " administrative division " this field in the character string of enterprise name class, can for this enterprise location it is at county level with The title or place name of upper administrative division.In the case where certain special, country name can also be referred to as to administrative division processing. That is, " administrative division " field corresponds to the word of the types such as place name, administrative area name, country name.
(2) " font size " this field in the character string of enterprise name class, can be made of more than two Chinese characters.It can make Make font size with the name of natural person investor, but administrative division cannot (the above administrative division place name in county has it as font size Except his meaning).That is, " font size " field corresponds to the word of the types such as name, brand name.
(3) " industry " this field in the character string of enterprise name class, it may be that reflection enterprise economic activity property institute Belong to the term of industrial sectors of national economy or enterprise operation feature, and the content that " industry " this field is stated should be with enterprise Business scope is consistent.Enterprise economic activity property is belonging respectively to industrial sectors of national economy difference major class, should select main economic Industry in the affiliated industrial sectors of national economy classification term statement enterprise name of activity nature.That is, " industry " field corresponds to industry class Not, the word of the types such as operational characteristics.
(4) " organizational form " this field in the character string of enterprise name class, according to relevant laws and regulations, corporation enterprise Industry can apply using " Co., Ltd ", " Co., Ltd ", " limited liability company " etc. as organizational form;Non- corporation enterprise Industry can apply using " factory ", " shop ", " portion ", "center" etc. as organizational form.That is, " organizational form " field corresponds to " limited public affairs The words such as department ", " Co., Ltd ", " limited liability company ", " factory ", " shop ", " portion ", "center".
For example, if the first character string is " Beijing AA information technology share Co., Ltd ", the second character string is " Beijing AA Technology Co., Ltd. ", as described in Table 1, a kind of example of field corresponding to each word for including for character string.For details, reference can be made to Content shown in table 1, is not described in detail herein.
A kind of table 1: example of field corresponding to each word that character string includes
Character string Administrative division Font size Industry Organizational form
First character string Beijing AA Information technology Limited liability company
Second character string Beijing AA Technology Co., Ltd
In step 104, the weighted value of each field can be determined according to multiple sample character strings.Specifically, such as Fig. 2 institute Show, is flow diagram corresponding to a kind of determination method of the weighted value of field provided in an embodiment of the present invention, specifically includes Following steps:
Step 201, each sample character string is segmented, obtains each word that each sample character string includes.
Before executing above-mentioned steps 201, data cleansing first can be carried out to sample character string, specific data cleansing There are many methods, such as Universal Character, the non-Chinese character of deletion, deletion repeat character string etc., specifically without limitation.
Further, the mode segmented to sample character string can refer to content described in above-mentioned steps 102, It no longer specifically describes herein.
Step 202, according to the corresponding relationship of the field and word, each word point that each sample character string includes is determined Not corresponding each field.
Wherein it is determined that the method for the corresponding each field of each word that sample character string includes can refer to above-mentioned steps Content described in 103, no longer specifically describes herein.
It for example, as shown in table 2, is a kind of example of the corresponding field of sample character string.Wherein, sample 1 is " linkage Advantage company ", the word of corresponding font size field are " linkage advantage ", and the word of corresponding organizational form is " company ";Sample 2 is " Lenovo Group ", the word of corresponding font size field are " association ", and the word of corresponding organizational form is " group ";Sample 3 is " association Company ", the word of corresponding font size field are " association ", and the word of corresponding organizational form is " company ";Sample 4 is that " Baidu is public Department ", the word of corresponding font size field are " Baidu ", and the word of corresponding organizational form is " company ".
A kind of table 2: example of the corresponding field of sample character string
Number Sample character string Font size field Organizational form field
Sample 1 Linkage advantage company Linkage advantage Company
Sample 2 Lenovo Group Association Group
Sample 3 Legend Company Association Company
Sample 4 Baidu company Baidu Company
Step 203, according to the repetitive rate of the corresponding word of each field, the repetitive rate of each field is determined.
Before executing above-mentioned steps 203, each field corresponding can be determined first according to the corresponding word of each field It anticipates repetitive rate of the word in the corresponding word of field, i.e., first determines the repetitive rate of word in each field.Specifically, it is calculating When the repetitive rate of some word, the repetitive rate of the word can be determined using formula (1).
Wherein, CiFor the repetitive rate of corresponding i-th of the word of field;nciThe number that i-th of word repeats in field;N For the sum of word corresponding to field.
It for example, is the repetition of each word in sample character string as shown in table 3 by taking the sample character string shown in table 2 as an example A kind of example of rate.In font size field shown in table 3, " linkage advantage " corresponding repetitive rate is 0, " association " corresponding repetitive rate It is 1/3, " Baidu " corresponding repetitive rate is 0;In organizational form field shown in table 3, " company " corresponding repetitive rate is 2/3, " group " corresponding repetitive rate is 0.
A kind of table 3: example of the repetitive rate of each word in sample character string
It, can be by each field pair in an example further, it is determined that there are many ways to repetitive rate of each field Repetitive rate of the average value of the repetitive rate for the word answered as the field.It specifically, can when calculating the repetitive rate of certain field To determine the repetitive rate of the field using formula (2).
Wherein, ZjFor the repetitive rate of j-th of field;CiFor the repetitive rate of corresponding i-th of the word of j-th of field;NjFor jth The sum of word corresponding to a field.
It for example, is sample character as shown in table 4 by taking the repetitive rate of each word in the sample character string shown in table 3 as an example A kind of example of the repetitive rate of the field of string.Wherein, the repetitive rate of font size field is 1/6, and the repetitive rate of organizational form field is 1/2。
A kind of table 4: example of the repetitive rate of the field of sample character string
Font size field Organizational form field
The repetitive rate of field 1/6 1/2
In other possible examples, the repetitive rate of field can also be determined using other methods, for example, according to field The repetitive rate of corresponding word and preset coefficient determine the repetitive rate of field, specifically without limitation.
Step 204, according to the repetitive rate of each field, the weighted value of each field is determined.
In the embodiment of the present invention, there are many methods of determination of the weighted value of field, and a kind of possible implementation is first root According to the repetitive rate of each field, the discrimination between the corresponding multiple words of each field is determined, it is then corresponding according to each field Multiple words between discrimination, determine the corresponding total discrimination of all fields, so can according to the discrimination of each field, And the corresponding total discrimination of all fields, determine the weighted value of field.
Specifically, the repetitive rate of the repetitive rate of field word corresponding with field is positively correlated, that is to say, that the repetition of field Rate is higher, shows that the duplicate number of the corresponding word of field is more, is less susceptible to distinguish the corresponding each word of field, the i.e. repetition of field The negatively correlated relationship of discrimination between rate multiple words corresponding with field.Further, corresponding in calculating certain field When discrimination between multiple words, the discrimination between the corresponding multiple words of the field can be determined using formula (3).
Qj=1-ZjFormula (3)
Wherein, QjFor the discrimination between the corresponding multiple words of j-th of field;ZjFor the repetitive rate of j-th of field.
It for example, is sample character as shown in table 5 by taking the repetitive rate of the field of the sample character string shown in table 4 as an example A kind of example of the discrimination of the field of string.Wherein, the discrimination of font size field is 5/6, and the discrimination of organizational form field is 1/2。
A kind of table 5: example of the discrimination of the field of sample character string
Font size field Organizational form field
The repetitive rate of field 1/6 1/2
The discrimination of field 5/6 1/2
Further, total discrimination of all fields can be the sum of the discrimination of each field, with the character shown in table 5 Total discrimination of all fields of string is 5/6+1/2=8/6.
Further, the discrimination of field is bigger, can distribute biggish weighted value.It specifically, can be using public affairs Formula (4) determines the weighted value of the field.
Wherein, WjFor the weighted value of j-th of field;QjFor the discrimination between the corresponding multiple words of j-th of field;∑Qj For total discrimination of all fields.
It for example, is sample character as shown in table 6 by taking the discrimination of the field of the sample character string shown in table 5 as an example A kind of example of the weighted value of the field of string.Wherein, the weighted value of font size field is 5/8, and the weighted value of organizational form field is 3/8。
A kind of table 5: example of the weighted value of the field of sample character string
Font size field Organizational form field
The repetitive rate of field 1/6 1/2
The discrimination of field 5/6 1/2
The weighted value of field 5/8 3/8
In other possible implementations, the weighted value of field can also be determined using other way, such as can be with It is that those skilled in the art rule of thumb and actual conditions can determine the weighted value of field in conjunction with the repetitive rate of field.
Using the weighted value for the field that above-mentioned steps 201 to content described in step 205 are determined, enable to determine Weighted value out is more accurate, is more in line with the significance level of field, so improve carried out between different character strings it is matched Accuracy.
In the embodiment of the present invention, the matching degree of the first character string and the second character string can be in conjunction with field out identified above Weighted value determine.Specifically, each word and corresponding field, the second character string that can first include according to the first character string The i.e. corresponding field of each word for including, come determine be belonging respectively in field the first character string and the second character string word whether phase Together, if they are the same, then the matching degree of the first character string and the second character string is determined according to the weighted value of the field.
For example, if the first character string is " company A ", the second character string is " A Pty. Ltd. ".It is found that the first character " A " corresponds to font size field, " company " corresponding organizational form field in string;" A " corresponds to font size field in second character string, " limited Company " corresponds to organizational form field.It is further known that between the first character string and the second character string, the corresponding word of font size field (i.e. " A ") is identical, and the corresponding word of organizational form field is different.Further, if the weighted value of font size field is 5/8, tissue The weighted value of form field is 3/8, then can determine that the matching degree of the first character string and the second character string is 5/8.
In step 105, what preset threshold can rule of thumb be determined with actual conditions for those skilled in the art, specifically not It limits.
For example, if setting preset threshold as 1/2, the first character string is " company A ", and the second character string is " the limited public affairs of A Department ", then according to that can determine that the matching degree of the first character string and the second character string is 5/8 above, which is greater than Preset threshold (1/2) determines that the first character string matches with the second character string.
Further, however, it is determined that the matching degree is less than or equal to preset threshold, then can determine first character string It is mismatched with second character string.
Based on same inventive concept, Fig. 3 illustrates the embodiment of the present invention and provides a kind of string matching device Structural schematic diagram, as shown in figure 3, the device includes acquiring unit 301, processing unit 302 and matching unit 303;Wherein,
Acquiring unit 301, for obtaining the first character string and the second character string;
Processing unit 302 obtains described for segmenting respectively to first character string and second character string Each word that each word and second character string that first character string includes include;And it is closed according to preset field is corresponding with word System determines the corresponding field of each word that first character string includes and each word that second character string includes point Not corresponding field;And each word and corresponding field, second character string for according to first character string including The weighted value of each word and corresponding field and each field that include determines first character string and second word Accord with the matching degree of string;The weighted value of each field is determined according to multiple sample character strings;
Matching unit 303 is used for if it is determined that described match degree is greater than the preset threshold, it is determined that first character string and institute The second character string is stated to match.
In one possible implementation, the processing unit 302 is specifically used for:
Each sample character string is segmented, each word that each sample character string includes is obtained;And according to described The corresponding relationship of field and word determines the corresponding each field of each word that each sample character string includes;And according to every The repetitive rate of the corresponding word of a field, determines the repetitive rate of each field;And the repetitive rate according to each field, Determine the weighted value of each field.
In one possible implementation, the processing unit 302 is according to the corresponding multiple words of each field, Before the repetitive rate for determining each field, it is also used to:
According to the corresponding word of each field, determine any one corresponding word of each field in the field pair The repetitive rate in word answered.
In one possible implementation, the specific unit 302 is specifically used for:
According to the repetitive rate of each field, the discrimination between the corresponding multiple words of each field is determined;And According to the discrimination between the corresponding multiple words of each field, the corresponding total discrimination of all fields is determined;And according to The discrimination of each field and the corresponding total discrimination of all fields, determine the weighted value of the field.
The embodiment of the present application also provides a kind of device, which, which has, realizes character string matching method as described above Function.The function can execute corresponding software realization by hardware, and in a kind of possible design, which includes: place Manage device, transceiver, memory;The memory is for storing computer executed instructions, and the transceiver is for realizing the device and its He communicates communication entity, which is connect with the memory by the bus, and when the apparatus is operative, which holds Computer executed instructions of row memory storage, so that the device executes character string matching method as described above.
The embodiment of the present invention also provides a kind of computer storage medium, stores software program in the storage medium, this is soft Part program realizes word described in above-mentioned various possible implementations when being read and executed by one or more processors Accord with string matching method.
The embodiment of the present invention also provides a kind of computer program product comprising instruction, when run on a computer, So that computer executes character string matching method described in above-mentioned various possible implementations.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of character string matching method, which is characterized in that the described method includes:
Obtain the first character string and the second character string;
First character string and second character string are segmented respectively, obtain each word that first character string includes And each word that second character string includes;
According to the corresponding relationship of preset field and word, the corresponding field of each word that first character string includes is determined, And the corresponding field of each word that second character string includes;
Each word and divide that each word for including according to first character string and corresponding field, second character string include The weighted value of not corresponding field and each field determines the matching degree of first character string and second character string; The weighted value of each field is determined according to multiple sample character strings;
If it is determined that described, match degree is greater than the preset threshold, it is determined that first character string matches with second character string.
2. the method according to claim 1, wherein the weighted value of each field is true in the following manner It is fixed:
Each sample character string is segmented, each word that each sample character string includes is obtained;
According to the corresponding relationship of the field and word, the corresponding each word of each word that each sample character string includes is determined Section;
According to the repetitive rate of the corresponding word of each field, the repetitive rate of each field is determined;
According to the repetitive rate of each field, the weighted value of each field is determined.
3. according to the method described in claim 2, it is characterized in that, being determined according to the corresponding multiple words of each field Before the repetitive rate of each field, the method also includes:
According to the corresponding word of each field, determine that any one corresponding word of each field is corresponding in the field Repetitive rate in word.
4. according to the method described in claim 2, it is characterized in that, being determined described every according to the repetitive rate of each field The weighted value of a field, comprising:
According to the repetitive rate of each field, the discrimination between the corresponding multiple words of each field is determined;
According to the discrimination between the corresponding multiple words of each field, the corresponding total discrimination of all fields is determined;
According to the discrimination of each field and the corresponding total discrimination of all fields, the power of the field is determined Weight values.
5. a kind of string matching device, which is characterized in that described device includes:
Acquiring unit, for obtaining the first character string and the second character string;
Processing unit obtains first word for segmenting respectively to first character string and second character string Each word that each word and second character string that symbol string includes include;And according to the corresponding relationship of preset field and word, really Each word that the corresponding field of each word and second character string that fixed first character string includes include respectively corresponds Field;And each word and corresponding field, second character string for according to first character string including include The weighted value of each word and corresponding field and each field determines first character string and second character string Matching degree;The weighted value of each field is determined according to multiple sample character strings;
Matching unit is used for if it is determined that described match degree is greater than the preset threshold, it is determined that first character string and described second Character string matches.
6. device according to claim 5, which is characterized in that the processing unit is specifically used for:
Each sample character string is segmented, each word that each sample character string includes is obtained;And according to the field With the corresponding relationship of word, the corresponding each field of each word that each sample character string includes is determined;And according to each word The repetitive rate of the corresponding word of section, determines the repetitive rate of each field;And the repetitive rate according to each field, it determines The weighted value of each field.
7. device according to claim 6, which is characterized in that the processing unit is corresponding according to each field Multiple words before the repetitive rate for determining each field, are also used to:
According to the corresponding word of each field, determine that any one corresponding word of each field is corresponding in the field Repetitive rate in word.
8. device according to claim 6, which is characterized in that the specific unit is specifically used for:
According to the repetitive rate of each field, the discrimination between the corresponding multiple words of each field is determined;And according to Discrimination between the corresponding multiple words of each field, determines the corresponding total discrimination of all fields;And according to described The discrimination of each field and the corresponding total discrimination of all fields, determine the weighted value of the field.
9. a kind of computer readable storage medium, which is characterized in that the storage medium is stored with instruction, when described instruction is being counted When being run on calculation machine, so that computer realizes method described in any one of perform claim requirement 1 to 4.
10. a kind of computer equipment characterized by comprising
Memory, for storing program instruction;
Processor, for calling the program instruction stored in the memory, according to acquisition program execute as claim 1 to Method described in 4 any claims.
CN201810936946.7A 2018-08-16 2018-08-16 A kind of character string matching method and device Pending CN109165326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810936946.7A CN109165326A (en) 2018-08-16 2018-08-16 A kind of character string matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810936946.7A CN109165326A (en) 2018-08-16 2018-08-16 A kind of character string matching method and device

Publications (1)

Publication Number Publication Date
CN109165326A true CN109165326A (en) 2019-01-08

Family

ID=64896089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810936946.7A Pending CN109165326A (en) 2018-08-16 2018-08-16 A kind of character string matching method and device

Country Status (1)

Country Link
CN (1) CN109165326A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427991A (en) * 2019-07-22 2019-11-08 联动优势科技有限公司 A kind of character string matching method and device
CN110750509A (en) * 2019-10-24 2020-02-04 赛诺贝斯(北京)营销技术股份有限公司 Enterprise name duplicate checking method and device, equipment and medium
CN111104795A (en) * 2019-11-19 2020-05-05 平安金融管理学院(中国·深圳) Company name matching method and device, computer equipment and storage medium
CN112954387A (en) * 2021-01-26 2021-06-11 广州欢网科技有限责任公司 Method, system and readable storage medium for updating and optimizing television program list
CN113343076A (en) * 2021-04-23 2021-09-03 山东师范大学 Innovative technology recommendation method and system based on feature matching degree
CN113553360A (en) * 2021-07-30 2021-10-26 北京金堤征信服务有限公司 Multi-enterprise relationship analysis method, device, electronic equipment, storage medium and computer program
CN114297461A (en) * 2021-12-10 2022-04-08 北京羽乐创新科技有限公司 Company information matching method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309857A (en) * 2012-03-06 2013-09-18 腾讯科技(深圳)有限公司 Method and equipment for determining classified linguistic data
CN103761341A (en) * 2014-02-21 2014-04-30 北京嘉和美康信息技术有限公司 Information matching method and device
CN104268137A (en) * 2013-07-31 2015-01-07 深圳市华傲数据技术有限公司 Method and device for matching pharmaceutical name data
CN104778171A (en) * 2014-01-10 2015-07-15 携程计算机技术(上海)有限公司 Character string matching system and method
CN106033416A (en) * 2015-03-09 2016-10-19 阿里巴巴集团控股有限公司 A string processing method and device
CN106650803A (en) * 2016-12-09 2017-05-10 北京锐安科技有限公司 Method and device for calculating similarity between strings
CN106951415A (en) * 2017-04-01 2017-07-14 银联智策顾问(上海)有限公司 A kind of name of firm searching method and device
CN108363729A (en) * 2018-01-12 2018-08-03 中国平安人寿保险股份有限公司 A kind of string comparison method, device, terminal device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309857A (en) * 2012-03-06 2013-09-18 腾讯科技(深圳)有限公司 Method and equipment for determining classified linguistic data
CN104268137A (en) * 2013-07-31 2015-01-07 深圳市华傲数据技术有限公司 Method and device for matching pharmaceutical name data
CN104778171A (en) * 2014-01-10 2015-07-15 携程计算机技术(上海)有限公司 Character string matching system and method
CN103761341A (en) * 2014-02-21 2014-04-30 北京嘉和美康信息技术有限公司 Information matching method and device
CN106033416A (en) * 2015-03-09 2016-10-19 阿里巴巴集团控股有限公司 A string processing method and device
CN106650803A (en) * 2016-12-09 2017-05-10 北京锐安科技有限公司 Method and device for calculating similarity between strings
CN106951415A (en) * 2017-04-01 2017-07-14 银联智策顾问(上海)有限公司 A kind of name of firm searching method and device
CN108363729A (en) * 2018-01-12 2018-08-03 中国平安人寿保险股份有限公司 A kind of string comparison method, device, terminal device and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427991A (en) * 2019-07-22 2019-11-08 联动优势科技有限公司 A kind of character string matching method and device
CN110750509A (en) * 2019-10-24 2020-02-04 赛诺贝斯(北京)营销技术股份有限公司 Enterprise name duplicate checking method and device, equipment and medium
CN111104795A (en) * 2019-11-19 2020-05-05 平安金融管理学院(中国·深圳) Company name matching method and device, computer equipment and storage medium
CN112954387A (en) * 2021-01-26 2021-06-11 广州欢网科技有限责任公司 Method, system and readable storage medium for updating and optimizing television program list
CN113343076A (en) * 2021-04-23 2021-09-03 山东师范大学 Innovative technology recommendation method and system based on feature matching degree
CN113553360A (en) * 2021-07-30 2021-10-26 北京金堤征信服务有限公司 Multi-enterprise relationship analysis method, device, electronic equipment, storage medium and computer program
CN114297461A (en) * 2021-12-10 2022-04-08 北京羽乐创新科技有限公司 Company information matching method

Similar Documents

Publication Publication Date Title
CN109165326A (en) A kind of character string matching method and device
CN105808988B (en) Method and device for identifying abnormal account
US20200065710A1 (en) Normalizing text attributes for machine learning models
CN108833458B (en) Application recommendation method, device, medium and equipment
CN109359186B (en) Method and device for determining address information and computer readable storage medium
CN110175909A (en) A kind of enterprise's incidence relation determines method and system
US11074516B2 (en) Load balancing for distributed processing of deterministically assigned data using statistical analysis of block data
CN109145003B (en) Method and device for constructing knowledge graph
CN111222976A (en) Risk prediction method and device based on network diagram data of two parties and electronic equipment
CN113283675B (en) Index data analysis method, device, equipment and storage medium
CN105678129B (en) A kind of method and apparatus of determining subscriber identity information
CN109189892A (en) A kind of recommended method and device based on article review
CN111639077A (en) Data management method and device, electronic equipment and storage medium
CN115222443A (en) Client group division method, device, equipment and storage medium
CN110197078B (en) Data processing method and device, computer readable medium and electronic equipment
CN113343700B (en) Data processing method, device, equipment and storage medium
CN106055640A (en) Buffer memory management method and system
CN112949305B (en) Negative feedback information acquisition method, device, equipment and storage medium
CN109214640A (en) Determination method, equipment and the computer readable storage medium of index result
CN113269179A (en) Data processing method, device, equipment and storage medium
CN111400413B (en) Method and system for determining category of knowledge points in knowledge base
JP2022153339A (en) Record matching in database system (computer-implemented method, computer program and computer system for record matching in database system)
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
CN110059180B (en) Article author identity recognition and evaluation model training method and device and storage medium
CN109800433A (en) Method, apparatus of filing, electronic equipment and medium based on two disaggregated model of mail

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190108