CN106502995B - A kind of hierarchical information intelligent identification Method and device - Google Patents

A kind of hierarchical information intelligent identification Method and device Download PDF

Info

Publication number
CN106502995B
CN106502995B CN201611079236.4A CN201611079236A CN106502995B CN 106502995 B CN106502995 B CN 106502995B CN 201611079236 A CN201611079236 A CN 201611079236A CN 106502995 B CN106502995 B CN 106502995B
Authority
CN
China
Prior art keywords
matching degree
character string
module
candidate item
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611079236.4A
Other languages
Chinese (zh)
Other versions
CN106502995A (en
Inventor
林利炜
孙玉友
靳谊
余良美
毕彦斌
于颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FUJIAN RONGJI SOFTWARE Co Ltd
Original Assignee
FUJIAN RONGJI SOFTWARE Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUJIAN RONGJI SOFTWARE Co Ltd filed Critical FUJIAN RONGJI SOFTWARE Co Ltd
Priority to CN201611079236.4A priority Critical patent/CN106502995B/en
Publication of CN106502995A publication Critical patent/CN106502995A/en
Application granted granted Critical
Publication of CN106502995B publication Critical patent/CN106502995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Abstract

A kind of hierarchical information intelligent identification Method, include the following steps, receive character string to be identified, the keyword of the character string is calculated according to character string to be identified, corresponding candidate item is obtained according to the keyword of the character string, character string to be identified is ranked up by the matching degree for calculating each candidate item according to the matching degree height being calculated.The present invention being capable of short-cut method by the option of character string reversal of identification level, without re-starting selection.

Description

A kind of hierarchical information intelligent identification Method and device
Technical field
Identify that there is hierarchical structure in field more particularly to a kind of list the present invention relates to the information in computer field Information intelligent recognition methods and device.
Background technique
Computer is used to fill in the list with hierarchy structure information and carries out carrying out Data Matching when data inversely convert, than Such as, input Fujian China Foochow Minqing County carries out Data Matching, and affiliated administrative division is typically all the drop-down of three layers or four layers Mount structure.
Can prepare to match when the data of typing into system are all specifications, but for certain administrations or For information collection work, many existing legacy datas have been the verbal descriptions write down, and there is no methods to require vast use Family is gone to update respective value or reselects option.We must inversely be converted according to existing written historical materials.
Summary of the invention
It can be by the option of character string reversal of identification level, without re-starting choosing for this reason, it may be necessary to provide one kind The short-cut method selected.
To achieve the above object, a kind of hierarchical information intelligent identification Method is inventor provided, is included the following steps, is received The keyword of the character string is calculated according to character string to be identified for character string to be identified, is obtained according to the keyword of the character string To corresponding candidate item, the matching degree of each candidate item is calculated, by character string to be identified according to the matching degree height being calculated It is ranked up.
Further, further include step, generate data level option library, the data level option library includes option text Originally, the corresponding former name of option text or abbreviation.
Further, further include step, judge candidate item with the presence or absence of conflict, specifically include step, by candidate item by With degree inverted order arrangement, judgement make number one with deputy matching degree whether, if matching degree is the same, by matching degree The same candidate item matching degree zero setting.
Further, further include step, define matching degree threshold value, screening matching degree is located to be identified word of the matching degree in Symbol string, is added in result set, result set is returned.
Specifically, the matching degree method for calculating each candidate item are as follows:
Matching degree: result=100* (log (1+A)/log (1+B)),
Wherein:
Variables A=Y*m+Z*n;
Variable B=X*p+C*q;
C=2*X-3;
C indicates the matched quantity of maximum possible, if X=1 so C is forced to 0;
X is keyword quantity, Y is coupling number, Z is continuous coupling number;
M, n, p, q are customized coefficient.
A kind of hierarchical information intelligent identification device, including receiving module, keyword computing module, candidate item matching module, Sorting module,
The receiving module is for receiving character string to be identified;
The keyword computing module is used to be calculated the keyword of the character string according to character string to be identified;
The candidate item matching module is used to obtain corresponding candidate item according to the keyword of the character string, calculates each time The matching degree of option;
The sorting module is used to for character string to be identified being ranked up according to the matching degree height being calculated.
It further, further include option library module, the option library module is for generating data level option library, the number It include option text, the corresponding former name of option text or abbreviation according to level option library.
It further, further include conflict judgment module,
The conflict judgment module is for judging candidate item with the presence or absence of conflict;
The conflict judgment module is specifically also used to, by candidate item by matching degree inverted order arrange, judgement make number one and Whether deputy matching degree, if matching degree is the same, by the same candidate item matching degree zero setting of matching degree.
It further, further include threshold value matching module, result return module, the threshold value matching module is for defining matching Threshold value is spent, screening matching degree is greater than or equal to the character string to be identified of matching degree threshold values, is added in result set, the result is returned Module is returned for returning to result set.
Specifically, the matching degree method for calculating each candidate item are as follows:
Matching degree: result=100* (log (1+A)/log (1+B)),
Wherein:
Variables A=Y*m+Z*n;
Variable B=X*p+C*q;
C=2*X-3;
C indicates the matched quantity of maximum possible, if X=1 so C is forced to 0;
X is keyword quantity, Y is coupling number, Z is continuous coupling number;
M, n, p, q are customized coefficient.
It being different from the prior art, above-mentioned technical proposal allows user by option of the character string reversal of identification with level, and It does not need to reselect.
Allow high further device adjustable strategies according to their own needs, without be transformed device, high further device according to The needs of oneself obtain oneself desired value by changing threshold values.
Allow to contain when Text region referred to as, also known as.
When allowing Text region, part hierarchical information, such as only province and county are neglected.Even directly write counties and districts (needing without duplication of name) can also write full name.
1. dictionary tree: word lookup tree, Trie tree are a kind of tree structures, are a kind of mutation of Hash tree.Its advantages It is: reduces query time using the common prefix of character string, reduces meaningless character string comparison, search efficiency to the maximum extent It is higher than Hash tree.
2. matching degree threshold values: for determining that not homologous ray or module set identification option required precision, matching It is higher to spend more high then precision, when matching degree 100, then the result option obtained must exactly match
Detailed description of the invention
Fig. 1 is hierarchical information intelligent identification Method flow chart described in the specific embodiment of the invention;
Fig. 2 is level option identification device flow chart described in the specific embodiment of the invention;
Fig. 3 is hierarchical information intelligent identification device module map described in the specific embodiment of the invention;
Description of symbols:
300, receiving module,
302, keyword computing module;
304, candidate item matching module;
306, sorting module;
308, option library module;
310, conflict judgment module;
312, threshold value matching module,
314, result return module.
Specific embodiment
Technology contents, construction feature, the objects and the effects for detailed description technical solution, below in conjunction with specific reality It applies example and attached drawing is cooperated to be explained in detail.
Referring to Fig. 1, for a kind of hierarchical information intelligent identification Method flow chart of the present invention, the present invention may begin at step S102 receives character string to be identified, and the keyword of the character string is calculated according to character string to be identified by S104, and S106 is according to this The keyword of character string obtains corresponding candidate item, calculates the matching degree of each candidate item, by character string to be identified according to calculating Obtained matching degree height is ranked up.Specifically, in the present embodiment, the character string to be identified is believed with hierarchical structure The information such as the character string of breath, including international code, administrative division, corporate department level, native place, merchandise classification, in present invention side In method, equipment receives the character string to be identified of user's input, then calculates the key in character string to be identified by dictionary tree algorithm Word chooses several candidate items by keyword from option library, then calculates the matching degree of each occurrence, specifically, can be with Matching degree is calculated in the following way:
Matching degree: result=100* (log (1+A)/log (1+B)),
Wherein:
Variables A=Y*m+Z*n;
Variable B=X*p+C*q;
C=2*X-3;
C indicates the matched quantity of maximum possible, if X=1 so C is forced to 0;
X is keyword quantity, Y is coupling number, Z is continuous coupling number;
M, n, p, q are customized coefficient, can weight value, such as m=n=1 according to actual needs;P=q=0.5 etc..This Sample can meet the needs of different in different level text matches.
In some other embodiment, as shown in Figure 1, this method can also include step S100, the choosing of data level is generated Xiang Ku, the data level option library include option text, the corresponding former name of option text or abbreviation etc., construct data level Option library enables to matching result more accurate.
Further include step S108 in certain further embodiments shown in Fig. 1, judges candidate item with the presence or absence of punching It is prominent, specifically include step, candidate item arranged by matching degree inverted order, judge to make number one with deputy matching degree whether one Sample, if matching degree is the same, by the same candidate item matching degree zero setting of matching degree.Default the result returned and generally there was only one A, as conflict situations when there are two maximum values enable to the feelings for similarly matched degree occur by the way that above-mentioned steps are arranged Maximum value conflict is solved under condition, reduces the calculation amount that system needs.
Further include step S110 in some other embodiment, define matching degree threshold value, screening matching degree is located at matching degree threshold The character string to be identified of value, is added in result set, result set is returned.
By the above method, invention achieves the effect for passing through the text option of character string reversal of identification with level, solutions The problem of hierarchical information intelligent recognition of having determined.
Fig. 2 is the flow chart of level option identification device, can be specifically divided into the next stage,
1. preparing: data level option library.The complete option library of the data segment of the list is ready to, while option is literary This corresponding former name is referred to as ready to, and forms complete option library.
2. 1. dictionary tree, which is calculated, according to option library (please refers to explanation of nouns 1).
3. inputting identified character string.The word can be calculated in the character string, that is, identified option according to character string The corresponding candidate item of symbol string.
4. keyword is calculated according to dictionary tree and identified character string.
5. obtaining several candidate items according to the keyword of the character string and option library.
6. candidate item matching degree calculates sub-process.
6.1. several candidate items being matched to are obtained according to upper step.
6.2. matching degree coefficient is defined.The matching degree coefficient will do it adjustment in the process of running.
6.3. the matching degree of each candidate item is calculated.
6.4. candidate results are ranked up according to matching degree.Matching degree height comes front.
6.5. judging candidate item with the presence or absence of conflict.When conflict or candidate item matching degree are up to 0 if it does not exist, then will Candidate results collection returns;Conflict if it exists, then recycles 6.2-6.5 and screened again, until obtaining result.
7. 2. regulation matching degree threshold values (please refers to explanation of nouns 2).The matching degree according to required for different application and institute The customized matching degree threshold values of permissible accuracy.
8. iteration candidate item judge whether the matching degree of each candidate item meets the requirements, meet the requirements, is added to knot Fruit concentrates and goes, undesirable directly to remove this candidate item
9. result set (including option code value, option text and matching degree) is returned.
It is illustrated by taking administrative division as an example below in conjunction with Fig. 2
The first step
Prepare complete administrative division library, Chinese administrative division about 3000
350102 Gulou Districts, drum tower
650100 Urumqi Cities, Urumchi, Wu Shi
Etc.
Second step
Dictionary tree is calculated according to administrative division library and 1. (please refers to explanation of nouns 1), the data the first step are all quasi- It gets ready.
Third step
Input identified character string.
For example the native place that someone writes is " Fuzhou City, Fujian Province Gulou District Shanghai street 102 ".
4th step
Keyword is calculated according to dictionary tree and identified character string.
Identified character string " Fuzhou City, Fujian Province Gulou District Shanghai street 102 ", then the keyword obtained is blissful Build province, Fuzhou City, Gulou District, Shanghai.
This step uses dictionary tree algorithm, according to the ready dictionary tree of second step.Then it is matched, dictionary tree can be with Long matching as far as possible, when [Taihe County] [and county] all exists, the long keyword of preferential identification.
5th step
Several candidate items are obtained according to the keyword of character string and administrative division library.
It is as follows that several candidate items can be found:
350000 Fujian Province
350100 Fuzhou Cities
320106 Gulou Districts (this is the Gulou District of Nanjing)
320302 Gulou Districts (this is the Gulou District of Xuzhou City of Jiangsu Province)
350102 Gulou Districts (this is the Gulou District of Fuzhou City, Fujian Province)
410204 Gulou Districts (this is the Gulou District of Henan Province Kaifeng)
310000 Shanghai
6th step
6.1 obtain several candidate items being matched to according to upper step.
350000 Fujian Province
350100 Fuzhou Cities
320106 Gulou Districts (this is the Gulou District of Nanjing)
320302 Gulou Districts (this is the Gulou District of Xuzhou City of Jiangsu Province)
350102 Gulou Districts (this is the Gulou District of Fuzhou City, Fujian Province)
410204 Gulou Districts (this is the Gulou District of Henan Province Kaifeng)
310000 Shanghai
6.2 customized matching degree coefficients
Matching degree: result=100* (log (1+A)/log (1+B)), (being here bottom with 10, be as a result rounded)
Wherein:
Variables A=Y*1+Z*0.5;
// 1 and 0.5 is all coefficient, it is believed that continuous coupling is not so important so having weighted coefficient.
(X is greater than 1 to variable C=2*X-3, if 0) X=1 so C is forced to;
//C indicates the matched quantity of maximum possible
Variable B=X*1.0+C*0.5;
X: keyword quantity,
Y: coupling number,
Z: continuous coupling number.
6.3. the matching degree of each candidate item is calculated.
According to formula, we need to calculate the coupling number and continuous coupling number of each candidate item.
350000 coupling numbers 1 (can be matched to first keyword), no continuous coupling number
350100 coupling numbers 2 (are matched to first and second keyword), 1 (Fujian Province-Foochow of continuous coupling number City)
320106 320,302 410204 coupling numbers 1 (being all only matched to third keyword), no continuous coupling number
350102 coupling numbers 3 (being matched to first three keyword), continuous coupling number 3 (Fujian Province-Fuzhou City, Foochow City-Gulou District, Fujian Province-Gulou District)
310000 coupling numbers 1 (are only matched to the 4th keyword), no continuous coupling number
According to three parameters:
(it is assumed that 350102 options of selection)
X: keyword quantity -- 4,
Y: coupling number -- 3,
Z: continuous coupling number -3.
By following operation
Variables A=Y*1+Z*0.5;A=4.5;
(X is greater than 1 to variable C=2*X-3, if 0) X=1 so C is forced to;C=5;
Variable B=X*1.0+C*0.5;B=6.5;
As a result (((log (5.5)/log (7.5)), as a result takes (log (1+A)/log (1+B))=100* result=100* It is whole;
// result is then 100 if it is complete match degree generally among 0-100
As example (option 350102) obtains matching angle value 84
Other option calculations are identical
6.4. candidate results are ranked up according to matching degree.Matching degree height comes front.
Correlated results, which arranges, to be packaged, and output is returned.
The result set includes option code value, option text and corresponding matching degree
Such as the 6.3rd step obtain candidate item and finally return that result is as follows:
350102 Gulou Districts 84
350100 Fuzhou Cities 62
350000 Fujian Province 34
320106 Gulou Districts 34
320302 Gulou Districts 34
410204 Gulou Districts 34
310000 Shanghai 34
6.5 judge candidate item with the presence or absence of conflict.
If candidate results only one ignore this step
(small step A) we are arranged candidate item by matching degree inverted order,
(small step B) look to make number one with deputy matching degree whether, if equally explanation has ambiguity to need Handle conflict.
Above-mentioned example can't conflict, and change an example here:
Such as original content of text is: Jiangsu Province Gulou District
It is understood that Gulou District has 4, and there are Gulou District in the Nanjing in Jiangsu and Xuzhou City
And press the algorithm of the 6th step
320106 Gulou Districts (this is the Gulou District of Nanjing)
320302 Gulou Districts (this is the Gulou District of Xuzhou City of Jiangsu Province)
Full marks 100 will be taken, but we have found it to be repeated.At this time this Gulou District is that have ambiguity to this explanation 's.
(small step C) all candidate item matching degrees relevant with Gulou District are all set to 0 by we at this time
Then, we, which resequence, continues since small step A.
It was found that coming only one Jiangsu Province of foremost, there are one not high matching degrees.It does not repeat, stops checking
Return to the code name in Jiangsu Province.Returning to 320000 Jiangsu Province's matching degrees according to calculation formula is 34
Make an exception (one) small step B
If full score when be all set to 0 will forced interruption inspection, it is random to return to one as a result, score is 0。
Make an exception (two) small step 4 B
If first as second score, but they have inclusion relation, not zero setting.Such as Hunan is long It is husky.
There is a Changsha County in Changsha.We obtain result only before the lesser raising of range.It is characterized in two Code has inclusion relation
430100 Changshas, Changsha
430121 Changsha Counties, Changsha
7th step provides matching degree threshold values, can be minimum and is also possible to peak.
Determine how to determine matching degree threshold values according to business need using the high further device of the device
It is all calculated if at will recognized, it is 0 that minimum, which is just arranged, in that, is all recognized as long as returning the result.
If being not desired to receive as result so score of Jiangsu Province Gulou District has to be larger than 34
If it is desired to receiving the fault-tolerant as a result, so score is necessarily less than equal to 34 of Fujian Province's Huzhou City Cangshan District
If it is desired to the result so score for receiving Fuzhou City, Fujian Province Gulou District Shanghai street 102 is necessarily less than equal to 84
It is just calculated if must accurately match, score is directly limited to 100
According to the result set of return and the matching degree threshold values defined, the matching degree for returning the result concentration is greater than or equal to this A matching degree threshold values.
We, which set to return the result, in this example is greater than 80, i.e. threshold values setting minimum is 80.
8th step iteration candidate item judge whether the matching degree of each candidate item meets the requirements, and meets the requirements, and adds It is undesirable directly to remove this candidate item into result set.
Such as:
350102 Gulou District 84 of candidate item, matching degree meets for 84 returns the result the requirement for being greater than 80, therefore is added to knot Fruit is concentrated.
350100 Fuzhou City 62 of candidate item, matching degree do not meet for 62 and return the result the requirement for being greater than 80, therefore reject.
9th step returns to result set (including option code value, option text and matching degree).
Only one for meeting threshold values in this example returns to the code value, option text and matching degree of the option, as follows:
350102 Gulou Districts 84.
The above-mentioned steps intelligent recognition that just completes hierarchical information in this way.
In the embodiment shown in fig. 3, a kind of hierarchical information intelligent identification device, including receiving module 300, pass are disclosed Key word computing module 302, candidate item matching module 304, sorting module 306,
The receiving module 300 is for receiving character string to be identified;
The keyword computing module 302 is used to be calculated the keyword of the character string according to character string to be identified;
The candidate item matching module 304 is used to obtain corresponding candidate item according to the keyword of the character string, calculates every The matching degree of a candidate item;
The sorting module 306 is used to for character string to be identified being ranked up according to the matching degree height being calculated.On It states apparatus module setting and has achieved the effect that hierarchical information intelligent recognition.
It further include option library module 308 in further embodiment, the option library module is for generating the choosing of data level Xiang Ku, the data level option library include option text, the corresponding former name of option text or abbreviation.Above-mentioned apparatus module is more Solves the problems, such as the identification of hierarchical information well.
It further, further include conflict judgment module 310,
The conflict judgment module is for judging candidate item with the presence or absence of conflict;
The conflict judgment module is specifically also used to, by candidate item by matching degree inverted order arrange, judgement make number one and Whether deputy matching degree, if matching degree is the same, by the same candidate item matching degree zero setting of matching degree.
It further, further include threshold value matching module 312, result return module 314, the threshold value matching module is for fixed Adopted matching degree threshold value, screening matching degree are located to be identified character string of the matching degree in, are added in result set, the result is returned Module is returned for returning to result set.Invention achieves by the effect of the text option of character string reversal of identification with level, Solves the problems, such as hierarchical information intelligent recognition.
Specifically, the matching degree method for calculating each candidate item are as follows:
Matching degree: result=100* (log (1+A)/log (1+B)),
Wherein:
Variables A=Y*m+Z*n;
Variable B=X*p+C*q;
C=2*X-3;
C indicates the matched quantity of maximum possible, if X=1 so C is forced to 0;
X is keyword quantity, Y is coupling number, Z is continuous coupling number;
M, n, p, q are customized coefficient.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or the terminal device that include a series of elements not only include those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or end The intrinsic element of end equipment.In the absence of more restrictions, being limited by sentence " including ... " or " including ... " Element, it is not excluded that there is also other elements in process, method, article or the terminal device for including the element.This Outside, herein, " being greater than ", " being less than ", " being more than " etc. are interpreted as not including this number;" more than ", " following ", " within " etc. understand Being includes this number.
It should be understood by those skilled in the art that, the various embodiments described above can provide as method, apparatus or computer program production Product.Complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in these embodiments Form.The all or part of the steps in method that the various embodiments described above are related to can be instructed by program relevant hardware come It completes, the program can store in the storage medium that computer equipment can be read, for executing the various embodiments described above side All or part of the steps described in method.The computer equipment, including but not limited to: personal computer, server, general-purpose computations It is machine, special purpose computer, the network equipment, embedded device, programmable device, intelligent mobile terminal, smart home device, wearable Smart machine, vehicle intelligent equipment etc.;The storage medium, including but not limited to: RAM, ROM, magnetic disk, tape, CD, sudden strain of a muscle It deposits, USB flash disk, mobile hard disk, storage card, memory stick, webserver storage, network cloud storage etc..
The various embodiments described above are referring to the method according to embodiment, equipment (system) and computer program product Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram The combination of process and/or box in one process and/or box and flowchart and/or the block diagram.It can provide these computers Program instruction generates a machine to the processor of computer equipment, so that the finger executed by the processor of computer equipment It enables and generates to specify in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of function.
These computer program instructions, which may also be stored in, to be able to guide computer equipment computer operate in a specific manner and sets In standby readable memory, so that the instruction being stored in the computer equipment readable memory generates the manufacture including command device Product, command device realization refer in one or more flows of the flowchart and/or one or more blocks of the block diagram Fixed function.
These computer program instructions can also be loaded into computer equipment, so that executing on a computing device a series of Operating procedure is to generate computer implemented processing, so that the instruction executed on a computing device is provided for realizing in process The step of function of being specified in figure one process or multiple processes and/or block diagrams one box or multiple boxes.
Although the various embodiments described above are described, once a person skilled in the art knows basic wounds The property made concept, then additional changes and modifications can be made to these embodiments, so the above description is only an embodiment of the present invention, It is not intended to limit scope of patent protection of the invention, it is all to utilize equivalent structure made by description of the invention and accompanying drawing content Or equivalent process transformation, being applied directly or indirectly in other relevant technical fields, similarly includes in patent of the invention Within protection scope.

Claims (8)

1. a kind of hierarchical information intelligent identification Method, which is characterized in that include the following steps, receives character string to be identified, according to The keyword of the character string is calculated in character string to be identified, obtains corresponding candidate item according to the keyword of the character string, meter Character string to be identified is ranked up by the matching degree for calculating each candidate item according to the matching degree height being calculated,
The matching degree method for calculating each candidate item are as follows:
Matching degree: result=100* (log (1+A)/log (1+B)),
Wherein:
Variables A=Y*m+Z*n;
Variable B=X*p+C*q;
C=2*X-3;
C indicates the matched quantity of maximum possible, if X=1 so C is forced to 0;
X is keyword quantity, Y is coupling number, Z is continuous coupling number;
M, n, p, q are customized coefficient.
2. hierarchical information intelligent identification Method according to claim 1, which is characterized in that further include step, generate data Level option library, the data level option library include option text, the corresponding former name of option text or abbreviation.
3. hierarchical information intelligent identification Method according to claim 1, which is characterized in that further include step, judge candidate With the presence or absence of conflict, specifically include step, by candidate item by matching degree inverted order arrange, judge make number one with it is deputy Whether matching degree, if matching degree is the same, by the same candidate item matching degree zero setting of matching degree.
4. hierarchical information intelligent identification Method according to claim 1, which is characterized in that it further include step, definition matching Threshold value is spent, screening matching degree is greater than or equal to the character string to be identified of matching degree threshold values, is added in result set, result set is returned It returns.
5. a kind of hierarchical information intelligent identification device, which is characterized in that including receiving module, keyword computing module, candidate item Matching module, sorting module,
The receiving module is for receiving character string to be identified;
The keyword computing module is used to be calculated the keyword of the character string according to character string to be identified;
The candidate item matching module is used to obtain corresponding candidate item according to the keyword of the character string, calculates each candidate item Matching degree;
The sorting module is used to for character string to be identified being ranked up according to the matching degree height being calculated,
The candidate item matching module is used to calculate the matching degree of each candidate item in the following way:
Matching degree: result=100* (log (1+A)/log (1+B)),
Wherein:
Variables A=Y*m+Z*n;
Variable B=X*p+C*q;
C=2*X-3;
C indicates the matched quantity of maximum possible, if X=1 so C is forced to 0;
X is keyword quantity, Y is coupling number, Z is continuous coupling number;
M, n, p, q are customized coefficient.
6. hierarchical information intelligent identification device according to claim 5, which is characterized in that further include option library module, institute Option library module is stated for generating data level option library, the data level option library includes option text, option text pair The former name or abbreviation answered.
7. hierarchical information intelligent identification device according to claim 5, which is characterized in that it further include conflict judgment module,
The conflict judgment module is for judging candidate item with the presence or absence of conflict;
The conflict judgment module is specifically also used to, and candidate item is arranged by matching degree inverted order, judgement makes number one and second Whether the matching degree of position, if matching degree is the same, by the same candidate item matching degree zero setting of matching degree.
8. hierarchical information intelligent identification device according to claim 5, which is characterized in that further include threshold value matching module, Result return module, the threshold value matching module are greater than or equal to matching bottom valve for defining matching degree threshold value, screening matching degree The character string to be identified of value, is added in result set, and the result return module is for returning to result set.
CN201611079236.4A 2016-11-30 2016-11-30 A kind of hierarchical information intelligent identification Method and device Active CN106502995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611079236.4A CN106502995B (en) 2016-11-30 2016-11-30 A kind of hierarchical information intelligent identification Method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611079236.4A CN106502995B (en) 2016-11-30 2016-11-30 A kind of hierarchical information intelligent identification Method and device

Publications (2)

Publication Number Publication Date
CN106502995A CN106502995A (en) 2017-03-15
CN106502995B true CN106502995B (en) 2019-10-15

Family

ID=58327950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611079236.4A Active CN106502995B (en) 2016-11-30 2016-11-30 A kind of hierarchical information intelligent identification Method and device

Country Status (1)

Country Link
CN (1) CN106502995B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147433B (en) * 2019-05-21 2021-01-29 北京鸿联九五信息产业有限公司 Text template extraction method based on dictionary tree

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206121A (en) * 2006-09-20 2008-06-25 高德软件有限公司 Placename retrieval device
CN101393544A (en) * 2008-10-07 2009-03-25 南京师范大学 Chinese address semantic parsing method facing address encode
CN102637180A (en) * 2011-02-14 2012-08-15 汉王科技股份有限公司 Character post processing method and device based on regular expression
JP2012155356A (en) * 2011-01-21 2012-08-16 Zenrin Datacom Co Ltd Address search device and address search method
CN104484790A (en) * 2014-12-26 2015-04-01 清华大学深圳研究生院 Address match method and device of logistics business
JP2015162004A (en) * 2014-02-26 2015-09-07 日本電信電話株式会社 Inter-development document trace link generation support device and method and program
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1473639A1 (en) * 2002-02-04 2004-11-03 Celestar Lexico-Sciences, Inc. Document knowledge management apparatus and method
CN102129431B (en) * 2010-01-13 2014-04-02 阿里巴巴集团控股有限公司 Search method and system applied to online trading platform
CN102880302A (en) * 2012-07-17 2013-01-16 重庆优腾信息技术有限公司 Word identification method, device and system on basis of multi-word continuous input
WO2015112755A1 (en) * 2014-01-22 2015-07-30 AI Squared Emphasizing a portion of the visible content elements of a markup language document
CN105630751A (en) * 2015-12-28 2016-06-01 厦门优芽网络科技有限公司 Method and system for rapidly comparing text content

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206121A (en) * 2006-09-20 2008-06-25 高德软件有限公司 Placename retrieval device
CN101393544A (en) * 2008-10-07 2009-03-25 南京师范大学 Chinese address semantic parsing method facing address encode
JP2012155356A (en) * 2011-01-21 2012-08-16 Zenrin Datacom Co Ltd Address search device and address search method
CN102637180A (en) * 2011-02-14 2012-08-15 汉王科技股份有限公司 Character post processing method and device based on regular expression
JP2015162004A (en) * 2014-02-26 2015-09-07 日本電信電話株式会社 Inter-development document trace link generation support device and method and program
CN104484790A (en) * 2014-12-26 2015-04-01 清华大学深圳研究生院 Address match method and device of logistics business
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于规则的模糊中文地址分词匹配方法;程昌秀 等;《地理与地理信息科学》;20110531;第27卷(第3期);第26-29页 *
一种适于地理编码的地址数据规范化方法;彭颖霞 等;《测绘科学技术学报》;20131231;第30卷(第5期);第521-524页 *

Also Published As

Publication number Publication date
CN106502995A (en) 2017-03-15

Similar Documents

Publication Publication Date Title
CN101694668B (en) Method and device for confirming web structure similarity
CN106156082B (en) A kind of ontology alignment schemes and device
US8683389B1 (en) Method and apparatus for dynamic information visualization
CN102023989B (en) Information retrieval method and system thereof
CN111460311A (en) Search processing method, device and equipment based on dictionary tree and storage medium
Binev et al. Fast high-dimensional approximation with sparse occupancy trees
CN106528648B (en) In conjunction with the distributed RDF keyword proximity search method of Redis memory database
US20090327259A1 (en) Automatic concept clustering
US20080082531A1 (en) Clustering system and method
CN106598949B (en) A kind of determination method and device of word to text contribution degree
KR102371437B1 (en) Method and apparatus for recommending entity, electronic device and computer readable medium
CN105956148A (en) Resource information recommendation method and apparatus
US11928879B2 (en) Document analysis using model intersections
CN110532547A (en) Building of corpus method, apparatus, electronic equipment and medium
CN108875065B (en) Indonesia news webpage recommendation method based on content
WO2014050774A1 (en) Document classification assisting apparatus, method and program
CN115618113A (en) Search recall method and system based on knowledge graph representation learning
CN108182182A (en) Document matching process, device and computer readable storage medium in translation database
CN114706987B (en) Text category prediction method, device, equipment, storage medium and program product
WO2020228536A1 (en) Icon generation method and apparatus, method for acquiring icon, electronic device, and storage medium
CN105243064A (en) Subgraph matching method and device
CN109308311A (en) A kind of multi-source heterogeneous data fusion system
US11163831B2 (en) Organizing hierarchical data for improved data locality
US20120076416A1 (en) Determining correlations between slow stream and fast stream information
CN106502995B (en) A kind of hierarchical information intelligent identification Method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant