CN109871536A - Place name identification method and apparatus - Google Patents

Place name identification method and apparatus Download PDF

Info

Publication number
CN109871536A
CN109871536A CN201910087977.4A CN201910087977A CN109871536A CN 109871536 A CN109871536 A CN 109871536A CN 201910087977 A CN201910087977 A CN 201910087977A CN 109871536 A CN109871536 A CN 109871536A
Authority
CN
China
Prior art keywords
place name
cutting
results list
alternative
proved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910087977.4A
Other languages
Chinese (zh)
Other versions
CN109871536B (en
Inventor
陈奇宁
牟小峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910087977.4A priority Critical patent/CN109871536B/en
Publication of CN109871536A publication Critical patent/CN109871536A/en
Application granted granted Critical
Publication of CN109871536B publication Critical patent/CN109871536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of place name identification method and apparatus, this method comprises: being based respectively on dictionary of place name and language rule, treat identification string using corresponding participle mode and are segmented, and obtain first cutting the results list and second cutting the results list;According to described first cutting the results list and described second cutting the results list, identifies the place name in the character string to be identified, the accuracy rate of place name identification can be improved.

Description

Place name identification method and apparatus
Technical field
The present invention relates to, but not limited to natural language processing technique fields, and in particular to a kind of place name identification method and dress It sets.
Background technique
With the continuous development of the mutual network information technology, the demand of the information service based on information of place names increasingly increases It is long.Place name identification is the key that obtain information of place names.Currently, domestic place name administrative division graduation is clear, most of study plot Name has apparent suffix, for example, Beijing, Heilongjiang Province etc., prior art place name identification method is needed according to place name It is identified with word rule.And in true text, many place names are occurred in the form of standard, are usually present as follows Problem:
1. place name referred to as lacks suffix (for example, Heilungkiang, Inner Mongol, Kaifeng), suffix is non-dedicated place name suffix, Such as, " inner ", " door " etc. (for example, in safety, being good for moral door) or suffix are non-place name suffix (for example, on Xiao Yingzi, lid level ground);
2. a word is multi-purpose, some place names are in addition to being used for place name noun, it is also possible to for common noun (for example, southern exposure, leads to It is commonly used for common noun, and when text appearance " Beijing southern exposure ", southern exposure is then likely to be place name);
3. the place name that commonly used word obtains is (for example, Geju City, it may be possible to the Gejiu in Yunnan, it is also possible to " Geju City school bag " In two commonly used words).
The above place name appears in the text, and existing place name identification technology is difficult accurately to recognize, and misclassification rate compares It is high.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of place name identification method and apparatus, can be improved place name The accuracy rate of identification.
The embodiment of the invention provides a kind of place name identification methods, comprising:
It is based respectively on dictionary of place name and language rule, identification string is treated using corresponding participle mode and is segmented, Obtain first cutting the results list and second cutting the results list;
According to described first cutting the results list and described second cutting the results list, identify in the character string to be identified Place name.
Place name identification method provided in this embodiment, by being based respectively on dictionary of place name and language rule, using corresponding Participle mode is treated identification string and is segmented, and first cutting the results list and second cutting the results list are obtained, according to the One cutting the results list and second cutting the results list, identify the place name in character string to be identified, improve the standard of place name identification True rate
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, Identify the place name in the character string, comprising:
Described first cutting the results list is traversed, the alternative place name in the character string to be identified is obtained;
According to described first cutting the results list and described second cutting the results list, the alternative place name is tested Card;
According to verification result, judge whether the alternative place name is place name.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, The alternative place name is verified, comprising:
It obtains in described first cutting the results list, the segment before the length of the alternative place name and the alternative place name Length;
It obtains in described second cutting the results list, segment before the length of the alternative place name and the alternative place name Length;
When the following conditions all meet, then the alternative place name is proved to be successful;It is described when following either condition is unsatisfactory for Alternative place name authentication failed:
Described in the length of alternative place name described in described first cutting the results list and described second cutting the results list The length of alternative place name matches;
The length of segment before alternative place name described in described first cutting the results list and the second cutting result The length of segment before alternative place name described in list matches.
In a kind of exemplary embodiment, according to verification result, identify whether the alternative place name is place name, comprising:
The certainty factor for the alternative place name being proved to be successful described in acquisition;
According to described second cutting the results list, the place name context probability for the alternative place name being proved to be successful described in calculating;
According to the certainty factor of the alternative place name being proved to be successful and the place name context probability, the time is identified Whether selection of land name is place name.
In a kind of exemplary embodiment, the certainty factor for the alternative place name being proved to be successful described in acquisition, comprising:
Classified according to pre-set place name, classification belonging to the alternative place name being proved to be successful described in inquiry;
The corresponding relationship of the classification according to belonging to place name and certainty factor, the alternative place name being proved to be successful described in acquisition are firmly believed Degree.
In a kind of exemplary embodiment, according to the certainty factor of the alternative place name being proved to be successful and the place name Context probability identifies whether the alternative place name is place name, comprising:
When the sum of the certainty factor of the alternative place name being proved to be successful and the place name context probability are greater than or wait It is place name by the candidate place name identification being proved to be successful when preset threshold value.
It is described according to first cutting the results list and second cutting the results list, identification in a kind of exemplary embodiment After place name in the character string to be identified, further includes:
The adjacent text of the place name identified is identified with number recognition mode using preset road number;
It will be place name with the road number and the matched text identification of number recognition mode;
The character string to be identified is traversed, the place name that will identify that is merged according to default rule, is obtained completely Name.
It is described to be based respectively on dictionary of place name and language rule in a kind of exemplary embodiment, using corresponding participle side Formula is treated identification string and is segmented, and obtains first cutting the results list and second cutting the results list includes:
Based on dictionary of place name, identification string is treated using maximum forward matching participle mode and is segmented, obtains first Cutting the results list;Based on language rule, second cutting the results list is obtained using condition random field CRF participle mode.
The embodiment of the invention also provides a kind of place name identification devices, comprising: memory and processor;
The memory, for storing computer-readable instruction;
The processor, for executing the computer-readable instruction, to perform the following operations:
It is based respectively on dictionary of place name and language rule, identification string is treated using corresponding participle mode and is segmented, Obtain first cutting the results list and second cutting the results list;
According to described first cutting the results list and described second cutting the results list, identify in the character string to be identified Place name.
Place name identification device provided in this embodiment, by being based respectively on dictionary of place name and language rule, using corresponding Participle mode is treated identification string and is segmented, and first cutting the results list and second cutting the results list are obtained, according to the One cutting the results list and second cutting the results list, identify the place name in character string to be identified, improve the standard of place name identification True rate
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, Identify the place name in the character string, comprising:
Described first cutting the results list is traversed, the alternative place name in the character string to be identified is obtained;
According to described first cutting the results list and described second cutting the results list, the alternative place name is tested Card;
According to verification result, judge whether the alternative place name is place name.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, The alternative place name is verified, comprising:
It obtains in described first cutting the results list, the segment before the length of the alternative place name and the alternative place name Length;
It obtains in described second cutting the results list, segment before the length of the alternative place name and the alternative place name Length;
When the following conditions all meet, then the alternative place name is proved to be successful;It is described when following either condition is unsatisfactory for Alternative place name authentication failed:
Described in the length of alternative place name described in described first cutting the results list and described second cutting the results list The length of alternative place name matches;
The length of segment before alternative place name described in described first cutting the results list and the second cutting result The length of segment before alternative place name described in list matches.
In a kind of exemplary embodiment, according to verification result, identify whether the alternative place name is place name, comprising:
The certainty factor for the alternative place name being proved to be successful described in acquisition;
According to described second cutting the results list, the place name context probability for the alternative place name being proved to be successful described in calculating;
According to the certainty factor of the alternative place name being proved to be successful and the place name context probability, the time is identified Whether selection of land name is place name.
In a kind of exemplary embodiment, which is characterized in that the certainty factor for the alternative place name being proved to be successful described in acquisition, packet It includes:
Classified according to pre-set place name, classification belonging to the alternative place name being proved to be successful described in inquiry;
The corresponding relationship of the classification according to belonging to place name and certainty factor, the alternative place name being proved to be successful described in acquisition are firmly believed Degree.
In a kind of exemplary embodiment, according to the certainty factor of the alternative place name being proved to be successful and the place name Context probability identifies whether the alternative place name is place name, comprising:
When the sum of the certainty factor of the alternative place name being proved to be successful and the place name context probability are greater than or wait It is place name by the candidate place name identification being proved to be successful when preset threshold value.
It is described according to first cutting the results list and second cutting the results list, identification in a kind of exemplary embodiment After place name in the character string to be identified, further includes:
The adjacent text of the place name identified is identified with number recognition mode using preset road number;
It will be place name with the road number and the matched text identification of number recognition mode;
The character string to be identified is traversed, the place name that will identify that is merged according to default rule, is obtained completely Name.
It is described to be based respectively on dictionary of place name and language rule in a kind of exemplary embodiment, using corresponding participle side Formula is treated identification string and is segmented, and obtains first cutting the results list and second cutting the results list includes:
Based on dictionary of place name, identification string is treated using maximum forward matching participle mode and is segmented, obtains first Cutting the results list;Based on language rule, second cutting the results list is obtained using condition random field CRF participle mode.
Place name identification method and apparatus provided by the invention, by being based respectively on dictionary of place name and language rule, using phase The participle mode answered is treated identification string and is segmented, and first cutting the results list and second cutting the results list, root are obtained According to first cutting the results list and second cutting the results list, alternative place name is verified, in conjunction with place name certainty factor and up and down Literary probability identifies the place name in character string to be identified, improves the accuracy rate of place name identification
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide to further understand technical solution of the present invention, and constitutes part of specification, with this The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is the flow chart of place name identification method in exemplary embodiment of the present;
Fig. 2 is the schematic diagram of place name identification device in exemplary embodiment of the present.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature can mutual any combination.
Fig. 1 is the flow chart of place name identification method in exemplary embodiment of the present.As shown in Figure 1, this exemplary implementation Example provides a kind of place name identification method, including step S101-S102:
Step S101: being based respectively on dictionary of place name and language rule, treats identification string using corresponding participle mode It is segmented, obtains first cutting the results list and second cutting the results list.
In the present example embodiment, when needing to carry out place name identification, character string to be identified can be first obtained, it should be wait know The source of other character string can be any text comprising text, such as: short message, wechat, Webpage etc..Character string to be identified Acquisition modes other than the source of above-mentioned restriction, can also using other modes obtain, the application is without limitation.
In the present example embodiment, dictionary of place name may include the place name, link name, village by administrative division Place name abbreviation of the name to be obtained after suffix is handled, for example, Beijing, Beijing, Inner Mongolia Autonomous Region, the Inner Mongol, Inner Mongol etc.. Wherein, dictionary of place name can be the forms such as Chinese dictionary, address database, and the application is without limitation.Language rule may include Syntax, grammer, semanteme etc..
Step S102: according to described first cutting the results list and described second cutting the results list, identification is described wait know Place name in other character string.
The place name identification method that the present exemplary embodiment provides is used by being based respectively on dictionary of place name and language rule Corresponding participle mode is treated identification string and is segmented, and first cutting the results list and second cutting the results list are obtained, According to first cutting the results list and second cutting the results list, the place name in character string to be identified is identified, improve place name knowledge Other accuracy rate.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, Identify the place name in the character string to be identified, comprising:
Described first cutting the results list is traversed, the alternative place name in the character string to be identified is obtained;
According to described first cutting the results list and described second cutting the results list, the alternative place name is tested Card;
According to verification result, judge whether the alternative place name is place name.
Wherein it is possible to be common noun (non-place name noun) by the candidate place name identification of authentication failed, to what is be proved to be successful Alternative place name carries out subsequent judgement.
It should be noted that first cutting the results list is to carry out participle acquisition based on dictionary of place name, therefore, first is cut It include the word of be likely to be place name in character string to be identified in point the results list (that is, in the presence of matchingly in dictionary of place name The word of name), it may be the word of place name as alternative place name using what is be syncopated as in first cutting the results list.Such as: it is based on place name Dictionary segments " they have found that suspect carries Geju City school bag in a restaurant of Beijing southern exposure ", the first of acquisition Cutting the results list is as follows: [he | | | Beijing | southern exposure | | one | family | meal | shop |, | hair | existing | dislike | doubt | people | back | | One | Geju City | book | packet]
" they have found that suspect carries Geju City school bag in a restaurant of Beijing southern exposure " is carried out based on language rule Participle, second cutting the results list of acquisition are as follows:
[they | | Beijing | southern exposure | | one | restaurant |, | discovery | suspect | carry | one | old | school bag]
First cutting the results list achieved above is traversed, obtains alternative place name are as follows: Beijing, southern exposure and Geju City.
It needs according to the first cutting result and the second cutting as a result, verified to alternative place name Beijing, southern exposure and Geju City, According to verification result, judge whether the above alternative place name is place name.
In another exemplary embodiment, arranged according to described first cutting the results list and the second cutting result Table, can be realized using other way, be such as but not limited to: ratio the step for identifying the place name in the character string to be identified To first cutting the results list and second cutting the results list, the ground in the character string to be identified is identified according to comparison result Name.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, The alternative place name is verified, comprising:
It obtains in described first cutting the results list, the segment before the length of the alternative place name and the alternative place name Length;
It obtains in described second cutting the results list, segment before the length of the alternative place name and the alternative place name Length;
When the following conditions all meet, then the alternative place name is proved to be successful;It is described when following either condition is unsatisfactory for Alternative place name authentication failed:
Described in the length of alternative place name described in described first cutting the results list and described second cutting the results list The length of alternative place name matches;
The length of segment before alternative place name described in described first cutting the results list and the second cutting result The length of segment before alternative place name described in list matches.
In the present example embodiment, fragment length can be the sum of the length for each word for including in segment.For example, false If the length of each word is 1, then above-mentioned first cutting the results list [he | | in | Beijing | southern exposure | | one | family | meal | shop |, | hair | it is existing | dislike | doubt | people | back | | one | Geju City | book | packet] in include 3 words in segment before alternative place name " Beijing ", it is long Degree is 3;In second cutting the results list [they | | Beijing | southern exposure | | one | restaurant |, | discovery | suspect | carry | one It is a | it is old | school bag] in have 2 words, length 3 before alternative place name " Beijing ".The division of fragment length can also be using others Mode, the application is without limitation.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, The alternative place name is verified, can be realized using other modes, for example be not limited to:
First cutting the results list and second cutting the results list are compared, identical time in two cutting the results lists is obtained Selection of land name, and obtain occurring in described first cutting the results list but there is no appearance in described second cutting the results list Alternative place name;
Determine that identical alternative place name is the alternative place name being proved to be successful in described two cutting the results lists;
It determines and occurs in described first cutting the results list but there is no the time occurred in described second cutting the results list Selection of land name authentication failed.
In a kind of exemplary embodiment, according to verification result, identify whether the alternative place name is place name, comprising:
The certainty factor for the alternative place name being proved to be successful described in acquisition;
According to described second cutting the results list, the place name context probability for the alternative place name being proved to be successful described in calculating;
According to the certainty factor of the alternative place name being proved to be successful and the place name context probability, the time is identified Whether selection of land name is place name.
In the present example embodiment, alternative place name certainty factor can be obtained by inquiring preset place name certainty factor table. In another exemplary embodiment, alternative place name certainty factor can pass through the correspondence of preset alternative place name type and certainty factor Relationship obtains, and in other embodiments, alternative place name degree of certainty can obtain by other means, and the application is without limitation.
In the present example embodiment, the context probability of alternative place name is by inquiring the alternative place name in People's Daily Context probability in corpus obtains.In People's Daily's mark corpus, there are the part of speech and entity mark for marking different nouns Note etc..It is as follows:
During/t in 1997 and/the c New Year's Day of/t in 1998/t ,/w Spring Festival/t/and f ,/w fund/n ,/w is special/and d protects for/v Barrier/v difficulty/the area a/n and/c difficulty/a/u is basic/a life/vn./ w they/r also/d sets an example by personally taking part/i ,/w in person/d / v expression of sympathy/v worker/n the masses/n is visited with/c difficulty/a worker/n family/n ,/w, just/two/m times/q of d to/vBeijing/nsIt examines Examine/v enterprise/n ,/w visits/v, and again/d emphasizes/vd points out/v ,/w we/r says/v leader/n topic/n from/p it is basic/n is upper/f Saying/v ,/w is exactly/and d is right/the p people/n masses/n/u relationship/vn problem/n./ w is at different levels/r cadre/n especially/d is /v leader/ N cadre/n.
When calculating the context probability of each alternative place name, take place name or so one participle as alternative place name respectively Context, the context probability of alternative place name calculated by following formula and obtained:
Wherein, ClrIndicate the left and right context of alternative place name, NS is alternative place name, Count (Clr, NS) and Count (Clr) it is illustrated respectively in time that context occurs with the number of the appearance of the alternative place name and the context in training corpus Number.Wherein, if context is punctuation mark, context is unified the contextual tagging to be PUNC;If context is number, The context is then designated generally as NUM;If context is place name, which is designated generally as NS.
For example, " Beijing " in above content, the participle on the left side is " arriving ", and the participle on the right is " investigation ", then " north The context of Jing Shi " is " [arriving, investigate] ".According to above-mentioned formula, molecule Count (Clr, NS) and it is [being investigated to Beijing] entire The number occurred in corpus, denominator be " [to ... investigate] number that occurs in entire corpus, ellipsis therein is Any word, for example, [to antarctic investigation], [being investigated to Shanghai], [being investigated to Sina] etc..In the corpus of People's Daily, The number that occurs in entire corpus by calculating [being investigated to Beijing] with [to ... investigate] occur in entire corpus The context probability that the ratio between number obtains " Beijing " in above content is 0.66.
Context probability of the identical alternative place name in different contexts is different, for example, in " in Beijing railway station " The context probability in " Beijing " is 0.82.
The context probability of alternative place name is upper and lower in the corpus of People's Daily except through inquiring the alternative place name Except literary probability obtains, it can also be obtained by other corpus, for example, Microsoft's corpus, Chinese Academy of Social Sciences's corpus, CTB corpus Library etc., the application is without limitation.
In a kind of exemplary embodiment, the certainty factor for the alternative place name being proved to be successful described in acquisition, comprising:
Classified according to pre-set place name, classification belonging to the alternative place name being proved to be successful described in inquiry;
The corresponding relationship of the classification according to belonging to place name and certainty factor, the alternative place name being proved to be successful described in acquisition are firmly believed Degree.
In the present example embodiment, the noun in dictionary of place name is divided by four classes according to the certainty factor of place name:
Noun in the first kind is specific place name, which once occurs being essentially all place name, for example, Beijing, Kazakhstan Your shore etc.;
Noun in second class is place name in most cases, but in the text once in a while possibly also as noun, for example, greatly Same, Nan Ping etc.;
Noun in third class is that the probability of place name and common noun (non-ground noun) is substantially equivalent, for example, steady hilllock, it is big newly Deng.
Noun in 4th class is common noun in most cases, but is likely to be place name once in a while in the text, for example, Taikan, Xinghua etc..
The classification of the above four classes place name can be through suffix filtering, universaling dictionary filtering or artificial screening acquisition.? It, can also be using other place name mode classifications, in the application without limitation in other embodiments.
According to default rule, 1.0 are set by the certainty factor of first kind place name, the certainty factor of the second class place name is set as 0.5, the certainty factor of third class place name be set as 0.4, the certainty factor of the 4th class place name is set as 0.1.To certainty factor in the application Value setting without limitation.
In a kind of exemplary embodiment, the certainty factor for the alternative place name being proved to be successful is obtained, other modes can be used It has been shown that, for example be not limited to: according to the corresponding relationship of preset place name noun and certainty factor, obtain the candidate ground being proved to be successful The certainty factor of name, wherein include each place name noun correspondence in dictionary of place name in the corresponding relationship of place name noun and degree of certainty Certainty factor.
In a kind of exemplary embodiment, according to the certainty factor of the alternative place name being proved to be successful and the place name Context probability identifies whether the alternative place name is place name, comprising:
When the sum of the certainty factor of the alternative place name being proved to be successful and the place name context probability are greater than or wait It is place name by the candidate place name identification being proved to be successful when preset threshold value.
In another embodiment, according to the certainty factor for the alternative place name being proved to be successful and place name context probability, identification Whether alternative place name is place name, can be realized using other modes, for example be not limited to:
The certainty factor for the alternative place name being proved to be successful described in comparison and default certainty factor threshold value, and comparison it is described verifying at The context probability of the alternative place name of function and default context probability threshold value, when the certainty factor of the alternative place name being proved to be successful When being respectively greater than the default certainty factor threshold value and the default context probability threshold value with context probability, by it is described verifying at The time candidate place name identification of function is place name.
It is described according to first cutting the results list and second cutting the results list, identification in a kind of exemplary embodiment After place name in the character string to be identified, further includes:
The adjacent text of the place name identified is identified with number recognition mode using preset road number;
It will be place name with the road number and the matched text identification of number recognition mode;
The character string to be identified is traversed, the place name that will identify that is merged according to default rule, is obtained completely Name.
In the present example embodiment, road number and the recognition mode of number can be digital string pattern and suffix famous model The combination of formula.
Specifically, it can be combined according to common place name use habit, digital string pattern includes Arabic numerals, Chinese Number, letter, first and second third gradegrade Cs often use label.Number and road number have specific suffix.The road suffix number of including:, lane, Shelves, area, group, team.The number number of including: building, building, unit, row, building, room etc..
In the concrete realization, digital string pattern and suffix pattern can be combined.Wherein, road number can be single hop Combination, it may be assumed that digital string pattern+suffix pattern.Number can be four Duan Zuhe, three Duan Zuhe, two sections of combinations or single hop combination. Two sections are combined i.e.: digital string pattern+suffix pattern+number string pattern+suffix pattern.
With the different subsequent text fragments of the identified place name of pattern match.Such as " they stay in the garden Yun Qu cell 3 Unit two, building ", first identification place name, obtain result " they stay in { garden NS Yun Qu cell } No. 3 Unit two, building ", then use number Text fragments after pattern match { garden NS Yun Qu cell }, i.e. " No. 3 Unit two, building ", identify the number, then finally obtain Result be " they stay in { No. 3 Unit two, building of the garden NS Yun Qu cell } ".
It is described to be based respectively on dictionary of place name and language rule in a kind of exemplary embodiment, using corresponding participle side Formula is treated identification string and is segmented, and obtains first cutting the results list and second cutting the results list includes:
Based on dictionary of place name, identification string is treated using maximum forward matching participle mode and is segmented, obtains first Cutting the results list;Based on language rule, second cutting the results list is obtained using condition random field CRF mode.
In other embodiments, first, second cutting the results list can be obtained using other participle modes, for example, N is first Grammatical slit mode N-gram, hidden Markov slit mode HMM, maximum entropy slit mode ME, to participle mode in the application Without limitation.
Above-mentioned place name identification method is further described with concrete application example below.
Application example one:
Step 1: obtaining character string to be identified, " they have found that suspect carries a Geju City in a restaurant of Beijing southern exposure School bag " segments above-mentioned character string to be identified using maximum forward matching participle mode based on dictionary of place name, wherein will The word segmentation occurred in dictionary of place name in the character comes out, and will not have the word segmentation occurred at single word in dictionary of place name, obtains First cutting the results list are as follows: [he | | | Beijing | southern exposure | | one | family | meal | shop |, | hair | existing | dislike | doubt | people | back | | one | Geju City | book | packet];For above-mentioned same character string, cutting is carried out using CRF participle mode based on language rule, obtains the Two cutting the results lists are as follows: [they | | Beijing | southern exposure | | one | restaurant |, | discovery | suspect | carry | one | old | School bag].
Step 2: the individual character Direct Recognition being syncopated as is generic word (non-ground noun) by first cutting the results list of traversal, It regard the place name noun " Beijing " in first cutting the results list, being syncopated as, " southern exposure ", " Geju City " as alternative place name.
Step 3: the length for providing each word is 1, is verified to alternative place name " Beijing ", " southern exposure ", " Geju City ".Its In:
Verifying to " Beijing " includes:
It calculates in the first cutting result, the length of alternative place name " Beijing " is 2, the length of the segment of 3 words before Beijing Degree is 3;It calculates in first cutting the results list, the length of alternative place name " Beijing " is 2, and the length of 2 words before Beijing is 3.First cutting the results list neutralizes the length of alternative place name " Beijing " and the segment before it in second cutting the results list Length is all equal, and " Beijing " is proved to be successful.
Verifying to " southern exposure " includes: that the verification process of " southern exposure " is identical as the verification process in " Beijing ", " southern exposure " verifying Success, does not do repetition herein.
Verifying to " Geju City " includes: to calculate in the first cutting result a upper place name candidate word to next place name candidate The length of segment between word, that is, calculate from " southern exposure " to " Geju City " segment " [| one | family | meal | shop |, | send out | show | Dislike | doubt | people | back | | one] " have 14 words, length 14 altogether;It calculates in the second cutting result " southern exposure " and arrives between " Geju City " Segment " [| one | restaurant |, | discovery | suspect | carry] ", 7 words altogether, length 13, in addition next word " one It is a " fragment length is 15, the fragment length before " Geju City " in first cutting the results list and second cutting the results list is not Matching, " Geju City " authentication failed, therefore " Geju City " is identified as non-place name, for common combinatorics on words.
Step 4: the corresponding relationship of base area noun classification and certainty factor obtains " Beijing ", " court being proved to be successful respectively The degree of certainty of sun " is respectively 1.0 and 0.5.
Step 5: according to second cutting the results list, the context in " Beijing " and " southern exposure ", the inquiry people day are obtained respectively Corpus is reported, obtains the context probability in " Beijing " and " southern exposure " respectively are as follows: 0.6 and 0.4.
Step 6: the certainty factor and context probability addition result for obtaining " Beijing " and " southern exposure " respectively are 1.6 and 0.9, Both it is greater than threshold value 0.6, " Beijing " and " southern exposure " in above-mentioned character string is identified as place name.
Application example two:
Step 1: obtaining character string to be identified, " my family is in the six road junction East the Jing Shu building x x unit of Haidian District, Beijing City xxx”。
Step 2-step 6: to the place name " Beijing " in word string to be identified, " Haidian District ", " six road junctions ", " quiet refined east In " identified, identification process is consistent with the verification process in application example one, is not repeated herein.
Step 7: to the adjacent noun " building x x unit xxx " of the place name " quiet refined East " identified using road number and Number recognition mode is identified that above " building x x unit xxx " meets road number and number identifies condition, by " the building x X unit xxx " is identified as place name.
Step 8: the above-mentioned place name identified is merged from a large scale to small-scale mode using place name, is obtained Place name identification result is " { the six road junction East the Jing Shu building x x unit xxx of the Haidian District, Beijing City NS } ".
Fig. 2 is a kind of place name identification device of the embodiment of the present invention, including memory 10 and processor 20;
Wherein:
The memory 10, for storing computer-readable instruction;
The processor 20, for executing the computer-readable instruction, to perform the following operations: being based respectively on ground noun Allusion quotation and language rule are treated identification string using corresponding participle mode and are segmented, obtain first cutting the results list and Second cutting the results list;
According to described first cutting the results list and described second cutting the results list, identify in the character string to be identified Place name.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, Identify the place name in the character string, comprising:
Described first cutting the results list is traversed, the alternative place name in the character string to be identified is obtained;
According to described first cutting the results list and described second cutting the results list, the alternative place name is tested Card;
According to verification result, judge whether the alternative place name is place name.
In a kind of exemplary embodiment, according to described first cutting the results list and described second cutting the results list, The alternative place name is verified, comprising:
It obtains in described first cutting the results list, the segment before the length of the alternative place name and the alternative place name Length;
It obtains in described second cutting the results list, segment before the length of the alternative place name and the alternative place name Length;
When the following conditions all meet, then the alternative place name is proved to be successful;It is described when following either condition is unsatisfactory for Alternative place name authentication failed:
Described in the length of alternative place name described in described first cutting the results list and described second cutting the results list The length of alternative place name matches;
The length of segment before alternative place name described in described first cutting the results list and the second cutting result The length of segment before alternative place name described in list matches.
In a kind of exemplary embodiment, according to verification result, identify whether the alternative place name is place name, comprising:
The certainty factor for the alternative place name being proved to be successful described in acquisition;
According to described second cutting the results list, the place name context probability for the alternative place name being proved to be successful described in calculating;
According to the certainty factor of the alternative place name being proved to be successful and the place name context probability, the time is identified Whether selection of land name is place name.
In a kind of exemplary embodiment, the certainty factor for the alternative place name being proved to be successful described in acquisition, comprising:
Classified according to pre-set place name, classification belonging to the alternative place name being proved to be successful described in inquiry;
The corresponding relationship of the classification according to belonging to place name and certainty factor, the alternative place name being proved to be successful described in acquisition are firmly believed Degree.
In a kind of exemplary embodiment, according to the certainty factor of the alternative place name being proved to be successful and the place name Context probability identifies whether the alternative place name is place name, comprising:
When the sum of the certainty factor of the alternative place name being proved to be successful and the place name context probability are greater than or wait It is place name by the candidate place name identification being proved to be successful when preset threshold value.
It is described according to first cutting the results list and second cutting the results list, identification in a kind of exemplary embodiment After place name in the character string to be identified, further includes:
The adjacent text of the place name identified is identified with number recognition mode using preset road number;
It will be place name with the road number and the matched text identification of number recognition mode;
The character string to be identified is traversed, the place name that will identify that is merged according to default rule, is obtained completely Name.
It is described to be based respectively on dictionary of place name and language rule in a kind of exemplary embodiment, using corresponding participle side Formula is treated identification string and is segmented, and obtains first cutting the results list and second cutting the results list includes:
Based on dictionary of place name, identification string is treated using maximum forward matching participle mode and is segmented, obtains first Cutting the results list;Based on language rule, second cutting the results list is obtained using condition random field CRF participle mode.
Other realization details of Installation practice can be found in embodiment of the method above.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program Related hardware is completed, and described program can store in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, one or more integrated circuits can be used also to realize in all or part of the steps of above-described embodiment.Accordingly Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
The above is only a preferred embodiment of the present invention, and certainly, the invention may also have other embodiments, without departing substantially from this In the case where spirit and its essence, those skilled in the art make various corresponding changes in accordance with the present invention And deformation, but these corresponding changes and modifications all should fall within the scope of protection of the appended claims of the present invention.

Claims (16)

1. a kind of place name identification method characterized by comprising
It is based respectively on dictionary of place name and language rule, identification string is treated using corresponding participle mode and is segmented, is obtained First cutting the results list and second cutting the results list;
According to described first cutting the results list and described second cutting the results list, the ground in the character string to be identified is identified Name.
2. according to the method described in claim 1, which is characterized in that according to described first cutting the results list and described Two cutting the results lists, identify the place name in the character string, comprising:
Described first cutting the results list is traversed, the alternative place name in the character string to be identified is obtained;
According to described first cutting the results list and described second cutting the results list, the alternative place name is verified;
According to verification result, judge whether the alternative place name is place name.
3. the method according to claim 1, wherein being cut according to described first cutting the results list and described second Divide the results list, the alternative place name verified, comprising:
It obtains in described first cutting the results list, the length of the segment before the length of the alternative place name and the alternative place name Degree;
It obtains in described second cutting the results list, the length of segment before the length of the alternative place name and the alternative place name Degree;
When the following conditions all meet, then the alternative place name is proved to be successful;When following either condition is unsatisfactory for, the candidate Place name authentication failed:
Candidate described in the length of alternative place name described in described first cutting the results list and described second cutting the results list The length of place name matches;
The length of segment before alternative place name described in described first cutting the results list and described second cutting the results list Described in segment before alternative place name length matching.
4. according to the method described in claim 3, it is characterized in that, according to verification result, identify the alternative place name whether be Place name, comprising:
The certainty factor for the alternative place name being proved to be successful described in acquisition;
According to described second cutting the results list, the place name context probability for the alternative place name being proved to be successful described in calculating;
According to the certainty factor of the alternative place name being proved to be successful and the place name context probability, the candidate ground is identified Whether name is place name.
5. according to the method described in claim 4, it is characterized in that, the certainty factor of alternative place name being proved to be successful described in obtaining, Include:
Classified according to pre-set place name, classification belonging to the alternative place name being proved to be successful described in inquiry;
The corresponding relationship of the classification according to belonging to place name and certainty factor, the certainty factor for the alternative place name being proved to be successful described in acquisition.
6. method according to claim 4 or 5, which is characterized in that according to the alternative place name being proved to be successful Certainty factor and the place name context probability identify whether the alternative place name is place name, comprising:
When the sum of the certainty factor of the alternative place name being proved to be successful and the place name context probability are greater than or equal in advance If threshold value when, by the candidate place name identification being proved to be successful be place name.
7. the method according to claim 1, wherein described according to first cutting the results list and the second cutting knot Fruit list, after identifying the place name in the character string to be identified, further includes:
The adjacent text of the place name identified is identified with number recognition mode using preset road number;
It will be place name with the road number and the matched text identification of number recognition mode;
The character string to be identified is traversed, the place name that will identify that is merged according to default rule, obtains complete place name.
8. method according to claim 1 or 2, which is characterized in that it is described to be based respectively on dictionary of place name and language rule, it adopts Identification string is treated with corresponding participle mode to be segmented, and first cutting the results list and second cutting the results list are obtained Include:
Based on dictionary of place name, identification string is treated using maximum forward matching participle mode and is segmented, the first cutting is obtained The results list;Based on language rule, second cutting the results list is obtained using condition random field CRF participle mode.
9. a kind of place name identification device, comprising: memory and processor;It is characterized by:
The memory, for storing computer-readable instruction;
The processor, for executing the computer-readable instruction, to perform the following operations:
It is based respectively on dictionary of place name and language rule, identification string is treated using corresponding participle mode and is segmented, is obtained First cutting the results list and second cutting the results list;
According to described first cutting the results list and described second cutting the results list, the ground in the character string to be identified is identified Name.
10. according to the device as claimed in claim 9, which is characterized in that according to described first cutting the results list and described Second cutting the results list, identifies the place name in the character string, comprising:
Described first cutting the results list is traversed, the alternative place name in the character string to be identified is obtained;
According to described first cutting the results list and described second cutting the results list, the alternative place name is verified;
According to verification result, judge whether the alternative place name is place name.
11. device according to claim 9, which is characterized in that according to described first cutting the results list and described second Cutting the results list verifies the alternative place name, comprising:
It obtains in described first cutting the results list, the length of the segment before the length of the alternative place name and the alternative place name Degree;
It obtains in described second cutting the results list, the length of segment before the length of the alternative place name and the alternative place name Degree;
When the following conditions all meet, then the alternative place name is proved to be successful;When following either condition is unsatisfactory for, the candidate Place name authentication failed:
Candidate described in the length of alternative place name described in described first cutting the results list and described second cutting the results list The length of place name matches;
The length of segment before alternative place name described in described first cutting the results list and described second cutting the results list Described in segment before alternative place name length matching.
12. device according to claim 11, which is characterized in that according to verification result, whether identify the alternative place name For place name, comprising:
The certainty factor for the alternative place name being proved to be successful described in acquisition;
According to described second cutting the results list, the place name context probability for the alternative place name being proved to be successful described in calculating;
According to the certainty factor of the alternative place name being proved to be successful and the place name context probability, the candidate ground is identified Whether name is place name.
13. device according to claim 12, which is characterized in that the alternative place name being proved to be successful described in acquisition is firmly believed Degree, comprising:
Classified according to pre-set place name, classification belonging to the alternative place name being proved to be successful described in inquiry;
The corresponding relationship of the classification according to belonging to place name and certainty factor, the certainty factor for the alternative place name being proved to be successful described in acquisition.
14. device according to claim 12 or 13, which is characterized in that according to the institute of the alternative place name being proved to be successful Certainty factor and the place name context probability are stated, identifies whether the alternative place name is place name, comprising:
When the sum of the certainty factor of the alternative place name being proved to be successful and the place name context probability are greater than or equal in advance If threshold value when, by the candidate place name identification being proved to be successful be place name.
15. device according to claim 9, which is characterized in that described according to first cutting the results list and the second cutting The results list, after identifying the place name in the character string to be identified, further includes:
The adjacent text of the place name identified is identified with number recognition mode using preset road number;
It will be place name with the road number and the matched text identification of number recognition mode;
The character string to be identified is traversed, the place name that will identify that is merged according to default rule, obtains complete place name.
16. device according to claim 9 or 10, which is characterized in that it is described to be based respectively on dictionary of place name and language rule, Identification string is treated using corresponding participle mode to be segmented, and first cutting the results list and the second cutting result column are obtained Table includes:
Based on dictionary of place name, identification string is treated using maximum forward matching participle mode and is segmented, the first cutting is obtained The results list;Based on language rule, second cutting the results list is obtained using condition random field CRF participle mode.
CN201910087977.4A 2019-01-29 2019-01-29 Place name recognition method and device Active CN109871536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910087977.4A CN109871536B (en) 2019-01-29 2019-01-29 Place name recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910087977.4A CN109871536B (en) 2019-01-29 2019-01-29 Place name recognition method and device

Publications (2)

Publication Number Publication Date
CN109871536A true CN109871536A (en) 2019-06-11
CN109871536B CN109871536B (en) 2022-12-30

Family

ID=66918288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910087977.4A Active CN109871536B (en) 2019-01-29 2019-01-29 Place name recognition method and device

Country Status (1)

Country Link
CN (1) CN109871536B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329469A (en) * 2020-11-05 2021-02-05 新华智云科技有限公司 Administrative region entity identification method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186524A (en) * 2011-12-30 2013-07-03 高德软件有限公司 Address name identification method and device
CN103324607A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for word segmentation of Thai texts
CN107305540A (en) * 2016-04-20 2017-10-31 顺丰科技有限公司 Address cutting recognition methods
CN108595435A (en) * 2018-05-03 2018-09-28 鹏元征信有限公司 A kind of organization names identifying processing method, intelligent terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186524A (en) * 2011-12-30 2013-07-03 高德软件有限公司 Address name identification method and device
CN103324607A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for word segmentation of Thai texts
CN107305540A (en) * 2016-04-20 2017-10-31 顺丰科技有限公司 Address cutting recognition methods
CN108595435A (en) * 2018-05-03 2018-09-28 鹏元征信有限公司 A kind of organization names identifying processing method, intelligent terminal and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王凡秀: "基于条件随机场的中文地名识别", 《中国西部科技》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329469A (en) * 2020-11-05 2021-02-05 新华智云科技有限公司 Administrative region entity identification method and system
CN112329469B (en) * 2020-11-05 2023-12-19 新华智云科技有限公司 Administrative region entity identification method and system

Also Published As

Publication number Publication date
CN109871536B (en) 2022-12-30

Similar Documents

Publication Publication Date Title
EP3153978B1 (en) Address search method and device
US9558179B1 (en) Training a probabilistic spelling checker from structured data
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN103186524B (en) A kind of place name identification method and apparatus
Almeman et al. Automatic building of arabic multi dialect text corpora by bootstrapping dialect words
EP2557511B1 (en) Information processing device, information processing method, information processing programme, and recording medium
JP2005182817A (en) Query recognizer
CN104462126A (en) Entity linkage method and device
WO2007143914A1 (en) Method, device and inputting system for creating word frequency database based on web information
US9798776B2 (en) Systems and methods for parsing search queries
CN102289467A (en) Method and device for determining target site
WO2019227581A1 (en) Interest point recognition method, apparatus, terminal device, and storage medium
JP6816421B2 (en) Learning programs, learning methods and learning devices
CN107203526A (en) A kind of query string semantic requirement analysis method and device
CN101923556A (en) Method and device for searching webpages according to sentence serial numbers
CN103377224B (en) Identify the method and device of problem types, set up the method and device identifying model
EP3972192B1 (en) Method and system for layered detection of phishing websites
CN109871536A (en) Place name identification method and apparatus
Chang et al. Enhancing POI search on maps via online address extraction and associated information segmentation
JP5688754B2 (en) Information retrieval apparatus and computer program
CN112579713B (en) Address recognition method, address recognition device, computing equipment and computer storage medium
CN109446424B (en) Invalid address webpage filtering method and system
CN111831867A (en) Address query method and device, electronic equipment and computer readable storage medium
JP3616507B2 (en) Information extraction device
CN104537041B (en) A kind of definite user's query word whether the method and system of invocation map interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant