CN102915299A - Word segmentation method and device - Google Patents

Word segmentation method and device Download PDF

Info

Publication number
CN102915299A
CN102915299A CN2012104075296A CN201210407529A CN102915299A CN 102915299 A CN102915299 A CN 102915299A CN 2012104075296 A CN2012104075296 A CN 2012104075296A CN 201210407529 A CN201210407529 A CN 201210407529A CN 102915299 A CN102915299 A CN 102915299A
Authority
CN
China
Prior art keywords
matching result
value
phrase
numerical value
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104075296A
Other languages
Chinese (zh)
Other versions
CN102915299B (en
Inventor
李成华
王勇进
王峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Co Ltd
Original Assignee
Hisense Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Co Ltd filed Critical Hisense Group Co Ltd
Priority to CN201210407529.6A priority Critical patent/CN102915299B/en
Priority to CN201510179584.8A priority patent/CN104765838A/en
Priority to CN201510179858.3A priority patent/CN104765724A/en
Publication of CN102915299A publication Critical patent/CN102915299A/en
Application granted granted Critical
Publication of CN102915299B publication Critical patent/CN102915299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a word segmentation method, which is used for improving word segmentation accuracy. The method comprises the following steps of: acquiring a character string to be processed; matching the character string to be processed with a universal dictionary library according to a forward maximum matching method, thus obtaining a first matching result; matching the character string to be processed with the universal dictionary library according to a reverse maximum matching method, thus obtaining a second matching result; and judging whether the first matching result is consistent with the second matching result, and if so, outputting the first matching result or the second matching result to serve as a word segmentation result. The invention also discloses a device for implementing the method.

Description

A kind of segmenting method and device
Technical field
The present invention relates to the participle field, particularly a kind of segmenting method and device.
Background technology
Along with popularizing and the maturation of electronic technology of network, make televisor progressively trend towards " high Qinghua ", " networking ", " intellectuality ".
Carry out the video request program search by the internet and become demand main in the intelligent television and application.Want the video content seen and will from the internet mass video, search out exactly the user, just need to effectively extract text message, therefore, how effectively to extract the major issue that text message also just becomes information retrieval field.Chinese word segmentation is subject to extensive concern as a major technique of information processing and retrieval, particularly participle is required more and more highlyer in the different application of different field, can say that the quality of participle technique has also directly had influence on the result of information processing and retrieval.
Multiple segmenting method is arranged in the prior art, wherein based on the segmenting method of character string because comparatively simple and more common.
Existing segmenting method based on character string probably can comprise Forward Maximum Method method and reverse maximum matching method.For example having a kind of segmenting method based on character string mainly to adopt Forward Maximum Method method or reverse maximum matching method that the character string that needs participle is carried out mechanical Chinese word segmentation processes, the unidentified individual character that goes out has been realized the participle identification of place name and street name, its purpose is to identify place name, street name etc., has expanded the ground thesaurus.
The present application people finds to exist in the prior art following technical matters at least in the process that realizes the embodiment of the present application technical scheme:
1, existing Words partition system only adopts a kind of segmenting method (Forward Maximum Method method or reverse maximum matching method) to carry out participle, and the participle process is comparatively coarse, causes the word segmentation result that obtains not accurate enough, has reduced the participle accuracy;
2, existing segmenting method only relates to the participle in place name field, still can't effectively identify for the character string of other field.
Summary of the invention
The embodiment of the invention provides a kind of segmenting method and device, is used for solving the not high technical matters of prior art participle accuracy, has realized improving the technique effect of participle accuracy.
An aspect of of the present present invention provides a kind of segmenting method, may further comprise the steps:
Obtain pending character string;
According to the Forward Maximum Method method described pending character string and universaling dictionary storehouse are mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result;
Judge whether described the first matching result is consistent with described the second matching result;
When consistent, export described the first matching result or described the second matching result as word segmentation result.
Another aspect of the present invention provides a kind of participle device, comprising:
Acquisition module is used for obtaining pending character string;
Matching module, be used for according to the Forward Maximum Method method described pending character string and universaling dictionary storehouse being mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result;
Judge module is used for judging whether described the first matching result is consistent with described the second matching result;
Output module is used for exporting described the first matching result or described the second matching result as word segmentation result when consistent.
Segmenting method in the embodiment of the invention comprises: obtain pending character string; According to the Forward Maximum Method method described pending character string and universaling dictionary storehouse are mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result; Judge whether described the first matching result is consistent with described the second matching result; When consistent, export described the first matching result or described the second matching result as word segmentation result.
In the embodiment of the invention, adopt Forward Maximum Method method and reverse maximum matching method respectively same pending character string to be mated, to be matched complete after, if matching result is identical, direct Output rusults then, so, at first be to adopt two kinds of matching process, compare afterwards matching result, if identically export again, obviously improved the accuracy of participle.And in the embodiment of the invention, if matching result is different, can also carries out certain ambiguity to matching result and eliminate, thereby can guarantee as far as possible that the result who obtains is comparatively accurate, guarantee from many aspects the accuracy of participle.
Description of drawings
Fig. 1 is the main process flow diagram of segmenting method in the embodiment of the invention;
Fig. 2 is the detailed structure view of participle device in the embodiment of the invention.
Embodiment
Segmenting method in the embodiment of the invention comprises: obtain pending character string; According to the Forward Maximum Method method described pending character string and universaling dictionary storehouse are mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result; Judge whether described the first matching result is consistent with described the second matching result; When consistent, export described the first matching result or described the second matching result as word segmentation result.
In the embodiment of the invention, adopt Forward Maximum Method method and reverse maximum matching method respectively same pending character string to be mated, to be matched complete after, if matching result is identical, direct Output rusults then, so, at first be to adopt two kinds of matching process, compare afterwards matching result, if identically export again, obviously improved the accuracy of participle.And in the embodiment of the invention, if matching result is different, can also carries out certain ambiguity to matching result and eliminate, thereby can guarantee as far as possible that the result who obtains is comparatively accurate, guarantee from many aspects the accuracy of participle.
Referring to Fig. 1, the segmenting method in the embodiment of the invention can may further comprise the steps:
Step 101: obtain pending character string.
In the embodiment of the invention, passage can be at first obtained, after obtaining passage, dictionary can be loaded first.In the prior art, the dictionary that loads can be common universaling dictionary storehouse, in the embodiment of the invention, can make up voluntarily a special dictionary storehouse, this special dictionary storehouse can for the special dictionary storehouse in any field, for example can be the special dictionary storehouse in video display field, perhaps can be the special dictionary storehouse of building field, perhaps can be the special dictionary storehouse of electric field, etc., the special dictionary storehouse take described special dictionary storehouse as the video display field in the embodiment of the invention describes as example.In the special dictionary storehouse in this video display field, can include the different information relevant with video display such as each actor name, director names, video display title, video display type, video display language, by in the special dictionary storehouse in this video display field, searching for and mating, can make the participle device better at the effect in video search field.
In the embodiment of the invention, can also make up voluntarily a stop words extension dictionary storehouse, in described stop words extension dictionary storehouse, include multiple vocabulary, such as auxiliary words of mood, conjunction etc. can be arranged, the vocabulary that comprises in described stop words extension dictionary storehouse all is to understanding whole sentence without the vocabulary that helps.For example, have in short: " I and you come along and have a meal." subject is " I, you ", predicate is " going ", object is " having a meal ", and wherein " with " be exactly conjunction, be exactly insignificant phrase concerning understanding whole sentence, then this " with " word just can be included in the described stop words extension dictionary storehouse.
In the embodiment of the invention, the described special dictionary storehouse of structure and described stop words extension dictionary storehouse can be included in the universaling dictionary storehouse.But the universaling dictionary storehouse described in the embodiment of the invention is different from universaling dictionary of the prior art storehouse, and the universaling dictionary storehouse in the embodiment of the invention is the universaling dictionary storehouse that has comprised described special dictionary storehouse and described stop words extension dictionary storehouse.For example be that special dictionary storehouse take described special dictionary storehouse as the video display field describes as example in the embodiment of the invention, then the described universaling dictionary storehouse in the embodiment of the invention can be to have comprised the special dictionary storehouse in described video display field and the universaling dictionary storehouse in described stop words extension dictionary storehouse.
Comprised the described universaling dictionary storehouse in described special dictionary storehouse and described stop words extension dictionary storehouse in loading after, can carry out rough lumber according to information such as punctuates to the passage that obtains first and divide, can be a plurality of sentences with its cutting.Wherein, each sentence can be described pending character string.
Step 102: described pending character string and universaling dictionary storehouse are mated according to the Forward Maximum Method method, obtain the first matching result, with according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result.
In the embodiment of the invention, can at first mate described pending character string according to the Forward Maximum Method method, obtain described the first matching result, described the first matching result can be corresponding to the first individual phrase of the first numerical value.After according to the Forward Maximum Method method described pending character string being mated, can continue according to reverse maximum matching method described pending character string to be mated, obtain described the second matching result, described the second matching result can be corresponding to the second individual phrase of second value.Wherein, described the first numerical value is the quantity of described the first phrase of comprising in described the first matching result, described second value is the quantity of described the second phrase of comprising in described the second matching result, be that described the first numerical value can be determined according to described the first matching result, described second value can be determined according to described the second matching result.Phrase in the embodiment of the invention can comprise multiword phrase and individual character.Described the first numerical value can be obtained according to described the first matching result, described second value can be obtained according to described the second matching result.
Perhaps, in the embodiment of the invention, can at first mate described pending character string according to reverse maximum matching method, obtain described the second matching result, described the second matching result can be corresponding to a described second value phrase.After according to reverse maximum matching method described pending character string being mated, can continue according to the Forward Maximum Method method described pending character string to be mated, obtain described the first matching result, described the first matching result can be corresponding to the first individual phrase of the first numerical value.
Perhaps, in the embodiment of the invention, also can mate described pending character string respectively according to Forward Maximum Method method and reverse maximum matching method simultaneously, obtain respectively described the first matching result and described the second matching result.That is, in the embodiment of the invention, the sequencing that adopts Forward Maximum Method method and reverse maximum matching method that described pending character string is mated can be any.
Wherein, the process of Forward Maximum Method method (MM) can be as follows:
It is long at first to set major term, and this long length of major term need to be not more than the length of described pending character string, and is better, and the long length of this major term is less than the length of described pending character string.In general, the long length of this major term can rule of thumb be set.The described major term of for example setting is long to be n, then can get from left to right n character to described pending character string, mate with described universaling dictionary storehouse, if there is this entry in the described universaling dictionary storehouse, then the match is successful, the cutting from described pending character string of this n character is gone out, continue from remaining described pending character string, to get from left to right n character and mate, until described pending string processing is complete; An if wherein not success of entry coupling, then from this n character, remove last character, again with described universaling dictionary storehouse in entry mate, if coupling or unsuccessful, then from this n-1 character, remove last character again, again with described universaling dictionary storehouse in entry mate so re-treatment.Wherein, suppose that the length of described pending character string is m, then n should be greater than 1 and is not more than the natural number of m.
The ultimate principle of reverse maximum matching method (RMM) is identical with the Forward Maximum Method method, the direction of different is minute word segmentation is opposite with the Forward Maximum Method method, can begin from the end of described pending character string coupling scanning, get the long character of major term of least significant end as matching field at every turn, if it fails to match, then remove the top word of matching field, continue coupling.
The below illustrates the forward matching method.
For example, a pending character string is: " of me has a meal ".
The first step, at first setting major term length is 5.The character that then at first is syncopated as is " of me eats ", these 5 characters and described universaling dictionary storehouse are mated, discovery can't be mated, then last character with these 5 characters removes, become " of me ", these 4 characters and described universaling dictionary storehouse are mated, discovery can't be mated, and then last character with these 4 characters removes, and becomes " I have one ", these 3 characters and described universaling dictionary storehouse are mated, discovery can't be mated, and then last character with these 3 characters removes, and becomes " I one ", these 2 characters and described universaling dictionary storehouse are mated, discovery can't be mated, and then last character with these 2 characters removes, and becomes " I ", this 1 character and described universaling dictionary storehouse are mated, and the match is successful.
Second step carries out cutting with remaining described pending character string, obtains " people has a meal ".These 5 characters and described universaling dictionary storehouse are mated, discovery can't be mated, then last character with these 5 characters removes, become " people eats ", these 4 characters and described universaling dictionary storehouse are mated, discovery can't be mated, then last character with these 4 characters removes, become " people ", these 3 characters and described universaling dictionary storehouse are mated, discovery can't be mated, then last character with these 3 characters removes, become " one ", these 2 characters and described universaling dictionary storehouse are mated, the match is successful.
The 3rd step, remaining described pending character string is carried out cutting, obtain " people has a meal ".These 3 characters and described universaling dictionary storehouse are mated, discovery can't be mated, then last character with these 3 characters removes, become " people eats ", these 2 characters and described universaling dictionary storehouse are mated, discovery can't be mated, then last character with these 2 characters removes, become " people ", this 1 character and described universaling dictionary storehouse are mated, the match is successful.
The 4th step, remaining described pending character string is carried out cutting, obtain " having a meal ".These 2 characters and described universaling dictionary storehouse are mated, and the match is successful.
Then, adopt the Forward Maximum Method method to the word segmentation result that " of me has a meal " the words carries out obtaining behind the participle to be: I/one/people/have a meal, namely obtained four phrases, comprising two individual characters.
Adopt reverse maximum matching method that " of me has a meal " the words is carried out participle, the word segmentation result that obtains is again: I/one/individual/have a meal.
After according to the Forward Maximum Method method described pending character string being mated, can obtain described the first matching result, described the first matching result can be corresponding to the first individual phrase of described the first numerical value, for example in the above-described embodiments, described the first numerical value is 4, after according to reverse maximum matching method described pending character string being mated, can obtain described the second matching result, described the second matching result can be corresponding to the second individual phrase of described second value, for example in the above-described embodiments, described second value is 4.
Step 103: judge whether described the first matching result is consistent with described the second matching result.
In the embodiment of the invention, after obtaining described the first matching result and described the second matching result, can judge whether described the first matching result is consistent with described the second matching result.Consistent finger herein to be not only phrase quantity consistent, and the phrase content that obtains is also in full accord.For example, for " of me has a meal " the words, described the first matching result that adopts the Forward Maximum Method method to obtain is: I/one/people/have a meal, if and adopt reverse maximum matching method, described the second matching result that then obtains can be: I/one/individual/have a meal, described the first numerical value is 4, described second value also is 4, although the described second value that described the first numerical value that described the first matching result is corresponding is corresponding with described the second matching result equates, but the phrase that obtains is also incomplete same, judges still that therefore definite described the first matching result and described the second matching result are inconsistent.
For example, judge that whether described the first matching result is consistent with described the second matching result, specifically can be:
Judge whether described the first numerical value equates with described second value.
When described the first numerical value and described second value were unequal, can show between described the first matching result and described the second matching result had ambiguity.
When described the first numerical value equated with described second value, whether individual the second phrase of the first phrase of judging described the first numerical value and described second value was identical.Wherein, whether the content of individual the second phrase of herein identical the first phrase of referring to described the first numerical value and described second value is in full accord.For example, described the first numerical value is 4, described the first phrase is respectively: I/one/people/have a meal, described second value is 4, described the second phrase is respectively: I/one/individual/have a meal, therefore although described the first numerical value equates that with described second value the content of described the first phrase and described the second phrase is not quite identical, individual the second phrase of the first phrase of described the first numerical value and described second value is incomplete same.And, if, described the first numerical value is 4, described the first phrase is respectively: I/one/people/have a meal, described second value is 4, described the second phrase is respectively: I/one/people/have a meal, individual the second phrase of the first phrase that then can determine described the first numerical value and described second value is identical.
When the second individual phrase of the first phrase of described the first numerical value and described second value is identical, show between described the first matching result and described the second matching result and do not have ambiguity, when the second individual phrase of the first phrase of described the first numerical value and described second value was incomplete same, showing between described the first matching result and described the second matching result had ambiguity.
Better, in the embodiment of the invention, before step 101, can at first load the described universaling dictionary storehouse that comprises described special dictionary storehouse, wherein, before loading described universaling dictionary storehouse, can at first classify to described special dictionary storehouse.Like this, after judging that described the first matching result and described the second matching result be whether consistent, the phrase that comprises in described the first matching result or described the second matching result can be mated according to the phrase in classification and the sorted described special dictionary storehouse respectively.Because judging that whether consistent described the first matching result and described the second matching result can determine matching result to be exported after, for example, if matching result described to be exported is described the first matching result, then the phrase that comprises in described the first matching result can be mated according to the phrase in classification and the sorted described special dictionary storehouse respectively, if matching result described to be exported is described the second matching result, then the phrase that comprises in described the second matching result can be mated according to the phrase in classification and the sorted described special dictionary storehouse respectively.
Step 104: when consistent, export described the first matching result or described the second matching result as word segmentation result.
Determine that described the first matching result is consistent with described the second matching result if judge, namely, described the first numerical value equates with described second value, and the content of the second phrase that the first phrase of described the first numerical value and described second value are individual is identical, then can export described the first matching result or described the second matching result with as word segmentation result.
In the embodiment of the invention, if judge that definite described the first matching result and described the second matching result are inconsistent, then can carry out the ambiguity elimination to described the first matching result and described the second matching result, described the first matching result after eliminating through ambiguity with output or described the second matching result are as word segmentation result.
In the embodiment of the invention, the process that ambiguity is eliminated can be as follows:
Can judge at first whether described the first numerical value and described second value be unequal, if judge that definite described the first numerical value and described second value are unequal, can continue then to judge that whether described the first numerical value is greater than described second value, determine that described the first numerical value is greater than described second value if judge, what then can determine needs output is a described second value phrase, the phrase that namely obtains according to reverse maximum matching method, if and judgement determines that described the first numerical value is less than described second value, what then can determine needs output is described the first numerical value phrase, the phrase that namely obtains according to the Forward Maximum Method method.
And if judge that definite described the first numerical value equates with described second value, then can continue other determining step.For example, can determine to comprise a third value individual character in described the first numerical value phrase, can comprise the 4th a numerical value individual character in the described second value phrase, can continue to judge whether described third value is unequal with described the 4th numerical value.If judge that definite described third value and described the 4th numerical value are unequal, can judge that then whether described third value is greater than described the 4th numerical value, determine that described third value is greater than described the 4th numerical value if judge, what then can determine needs output is a described second value phrase, namely export the phrase that obtains according to reverse maximum matching method, if and judgement determines that described third value is less than described the 4th numerical value, what then can determine needs output is described the first numerical value phrase, namely exports the phrase that obtains according to the Forward Maximum Method method.Wherein, described third value is the quantity of the individual character that comprises in described the first matching result, described the 4th numerical value is the quantity of the individual character that comprises in described the second matching result, be that described third value can be determined according to described the first matching result, described the 4th numerical value can be determined according to described the second matching result.Described third value can be obtained according to described the first matching result, described the 4th numerical value can be obtained according to described the second matching result.
If judge and determine that described the first numerical value equates with described second value, described third value equates also that with described the 4th numerical value what then can determine needs output is described the first numerical value phrase, namely exports the phrase that obtains according to the Forward Maximum Method method.
Namely, in the embodiment of the invention, if the described second value that described the first numerical value corresponding to described the first matching result and described the second matching result are corresponding is different, that then can determine needs output is the result of phrase negligible amounts, if the described second value that described the first numerical value corresponding to described the first matching result and described the second matching result are corresponding is identical, and described third value is different from described the 4th numerical value, and that then can determine needs output is the result of individual character negligible amounts.Adopting this disposal route in the embodiment of the invention, mainly is the accuracy of eliminating in order to improve ambiguity.
In the embodiment of the invention, described the first matching result and described the second matching result are carried out the ambiguity elimination, described the first matching result after eliminating through ambiguity with output or described the second matching result are as word segmentation result.
Better, in the embodiment of the invention, before step 101, can at first load the described universaling dictionary storehouse that comprises described special dictionary storehouse, wherein, before loading described universaling dictionary storehouse, can at first classify to described special dictionary storehouse.Like this, described the first matching result and described the second matching result are being carried out after ambiguity eliminates, the phrase that comprises in the word segmentation result after ambiguity can being eliminated mates according to the phrase in classification and the sorted described special dictionary storehouse respectively.Because after carrying out the ambiguity elimination, determining matching result to be exported, for example, if described matching result wait exporting is described the first matching result after eliminating through ambiguity, the phrase that comprises in then can described the first matching result after ambiguity is eliminated mates according to the phrase in classification and the sorted described special dictionary storehouse respectively, if described matching result wait exporting is described the second matching result after ambiguity is eliminated, the phrase that comprises in then can described the second matching result after ambiguity is eliminated mates according to the phrase in classification and the sorted described special dictionary storehouse respectively.
For example, if the special dictionary storehouse in described video display field is divided for 5 classifications, be respectively actor name, director names, video display title, video display type and video display language, then when coupling, can respectively each phrase and each classification be mated successively.First concrete and which classification is mated, and rear and which classification is mated, and order can set up on their own, and perhaps order can be any.
For example, if the special dictionary storehouse in described video display field is divided for 5 classifications, be respectively actor name, director names, video display title, video display type and video display language, the matching order of setting is: actor name-video display title-director names-video display type-video display language.And a phrase that comprises in the word segmentation result after the ambiguity elimination is " hiding ", then this phrase at first can be mated with this classification of actor name, discovery does not have entry to match, then continue this phrase and this classification of video display title are mated, the match is successful, word segmentation result after then can output matching, and can be clear and definite when output, this phrase is the video display title.
In the embodiment of the invention, before judging that described the first matching result and described the second matching result be whether consistent, can also the phrase of the first kind in described the first matching result and described the second matching result all be deleted according to described stop words extension dictionary storehouse.Because be described the first matching result or described the second matching result what judge whether consistent described the first matching result and described the second matching result can't determine before to need output, therefore can the phrase of the first kind in described the first matching result and described the second matching result all be deleted according to described stop words extension dictionary storehouse.
In the embodiment of the invention, after judging that described the first matching result and described the second matching result be whether consistent, can also according to described stop words extension dictionary storehouse will be to be exported matching result described in the phrase of the first kind delete, wherein, matching result described to be exported is described the first matching result or described the second matching result.Because after judging that described the first matching result and described the second matching result be whether consistent, what can determine needs output is described the first matching result or described the second matching result, if determine that then matching result described to be exported is described the first matching result, can the phrase of the first kind described in described the first matching result be deleted according to described stop words extension dictionary storehouse, need not described the second matching result is processed, if determine that matching result described to be exported is described the second matching result, can the phrase of the first kind described in described the second matching result be deleted according to described stop words extension dictionary storehouse, need not described the first matching result is processed, so also can save step.
In the embodiment of the invention, the phrase of the described first kind can refer to understanding the insignificant phrase of implication of described pending character string.For example, have a word segmentation result to be "/I/do not know ", then wherein " " be auxiliary words of mood, and be obvious nonsensical to understanding described pending character string, the match is successful when itself and described stop words extension dictionary storehouse are mated, and it can be deleted.Concrete, in the embodiment of the invention, the phrase of the described first kind can be the function word phrase, for example, the phrase of the described first kind can be auxiliary word phrase, conjunction phrase, adverbial idiom, preposition phrase, interjection phrase, onomatopoeia phrase, etc.Better, the kind of the phrase that comprises in the described stop words extension dictionary storehouse can change to some extent according to the difference in field under the described pending character string, the phrase that comprises What kind do in the concrete described stop words extension dictionary storehouse can determine that according to real needs the present invention does not limit this.
Namely, in the embodiment of the invention, the first individual phrase of described first numerical value that described the first matching result can be obtained mates with described stop words extension dictionary storehouse respectively, if phrase is arranged, and the match is successful, then with this phrase deletion, the second individual phrase of the described second value that also described the second matching result can be obtained mates with described stop words extension dictionary storehouse respectively, and the match is successful if phrase is arranged, then with this phrase deletion.
Referring to Fig. 2, the present invention also provides a kind of participle device, and described device can comprise acquisition module 201, matching module 202, judge module 203 and output module 204.Described device can also comprise disambiguation module 205, load-on module 206, sort module 207 and processing module 208.
Acquisition module 201 can be used for obtaining pending character string.
Matching module 202 can be used for according to the Forward Maximum Method method described pending character string and universaling dictionary storehouse being mated, obtain the first matching result, with according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result.
Matching module 202 can also be used for the phrase that described the first matching result or described the second matching result comprise is mated according to the phrase in classification and the sorted described special dictionary storehouse respectively.
Matching module 202 can also be used for carrying out the phrase that the first matching result after ambiguity is eliminated or described the second matching result comprise and mate according to the phrase in classification and the sorted described special dictionary storehouse respectively.
Judge module 203 can be used for judging whether described the first matching result is consistent with described the second matching result.
Include the first individual phrase of the first numerical value in described the first matching result, include the second individual phrase of second value in described the second matching result, described the first numerical value is the quantity according to described the first phrase that comprises in definite described the first matching result of described the first matching result, and described second value is the quantity according to described the second phrase that comprises in definite described the second matching result of described the second matching result.Judge module 203 specifically can be used for: judge whether described the first numerical value equates with described second value; When described the first numerical value and described second value were unequal, showing between described the first matching result and described the second matching result had ambiguity; When described the first numerical value equated with described second value, whether individual the second phrase of the first phrase of judging described the first numerical value and described second value was identical; When the second individual phrase of the first phrase of described the first numerical value and described second value is identical, show between described the first matching result and described the second matching result and do not have ambiguity, when the second individual phrase of the first phrase of described the first numerical value and described second value was incomplete same, showing between described the first matching result and described the second matching result had ambiguity.
Output module 204 can be used for exporting described the first matching result or described the second matching result as word segmentation result when consistent.
Described the first matching result after output module 204 can also be eliminated through ambiguity for output or described the second matching result are as word segmentation result.
Output module 204 specifically can be used for: when described the first numerical value during greater than described second value, export a described second value phrase; When described the first numerical value during less than described second value, export described the first numerical value phrase.
Output module 204 specifically can be used for: when described third value during greater than described the 4th numerical value, export a described second value phrase; When described third value during less than described the 4th numerical value, export described the first numerical value phrase; When described third value equals described the 4th numerical value, export described the first numerical value phrase.
Disambiguation module 205 can be used for when inconsistent, and described the first matching result and described the second matching result are carried out the ambiguity elimination, and described the first matching result after eliminating through ambiguity with output or described the second matching result are as word segmentation result.
Disambiguation module 205 specifically can be used for judging that stating the first numerical value and described second value when unequal whether described the first numerical value is greater than described second value.
Include the individual individual character of third value in described the first matching result, include the individual individual character of the 4th numerical value in described the second matching result, described third value is the quantity according to the individual character that comprises in definite described the first matching result of described the first matching result, and described the 4th numerical value is the quantity according to the individual character that comprises in definite described the second matching result of described the second matching result.Disambiguation module 205 specifically can be used for: when stating the first numerical value and equate with described second value, judge that whether described third value is greater than described the 4th numerical value.
Load-on module 206 can be used for loading described universaling dictionary storehouse, and described universaling dictionary comprises the special dictionary storehouse in the storehouse.
Load-on module 206 can be used for loading described universaling dictionary storehouse, comprises stop words extension dictionary storehouse in the described universaling dictionary storehouse.
Sort module 207 can be used for being classified in described special dictionary storehouse.
Processing module 208 can be used for according to described stop words extension dictionary storehouse, and the phrase of the first kind in described the first matching result and described the second matching result is all deleted.
Processing module 208 can be used for according to described stop words extension dictionary storehouse, and the phrase of the first kind in the matching result to be exported is deleted, and matching result described to be exported is described the first matching result or described the second matching result.
In the embodiment of the invention, the phrase of the described first kind can be the function word phrase, and for example, the phrase of the described first kind can be auxiliary word phrase, conjunction phrase, adverbial idiom, preposition phrase, interjection phrase, onomatopoeia phrase, etc.Better, the kind of the phrase that comprises in the described stop words extension dictionary storehouse can change to some extent according to the difference in field under the described pending character string, and the phrase that comprises What kind do in the concrete described stop words extension dictionary storehouse can be determined according to real needs.
Segmenting method in the embodiment of the invention comprises: obtain pending character string; According to the Forward Maximum Method method described pending character string and universaling dictionary storehouse are mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result; Judge whether described the first matching result is consistent with described the second matching result; When consistent, export described the first matching result or described the second matching result as word segmentation result.
In the embodiment of the invention, adopt Forward Maximum Method method and reverse maximum matching method respectively same pending character string to be mated, to be matched complete after, if matching result is identical, direct Output rusults then, so, at first be to adopt two kinds of matching process, compare afterwards matching result, if identically export again, obviously improved the accuracy of participle.And in the embodiment of the invention, if matching result is different, can also carries out certain ambiguity to matching result and eliminate, thereby can guarantee as far as possible that the result who obtains is comparatively accurate, guarantee from many aspects the accuracy of participle.
In the embodiment of the invention, describe the process that ambiguity is eliminated in detail, the content that those skilled in the art describe according to the embodiment of the invention can be easy to realize technical scheme of the present invention, and is open comparatively abundant.And the ambiguity removing method in the employing embodiment of the invention can improve the accuracy of participle.
The embodiment of the invention has made up the special dictionary storehouse specially, can mate word segmentation result according to described special dictionary storehouse, makes the word segmentation result of output more targeted.Described special dictionary storehouse can be the special dictionary storehouse of every field, thereby can make the participle device in the embodiment of the invention carry out participle to the described pending character string in each field better.For example, if described special dictionary storehouse is the special dictionary storehouse in described video display field, described participle device can be applied in the video search process better.
The embodiment of the invention has also made up stop words extension dictionary storehouse specially, can at first delete insignificant phrase in the phrase before the output matching result, neither affects the result of participle output, has also reduced follow-up operating process, has saved step.
Those skilled in the art should understand that embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt complete hardware implementation example, complete implement software example or in conjunction with the form of the embodiment of software and hardware aspect.And the present invention can adopt the form of the computer program of implementing in one or more computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) that wherein include computer usable program code.
The present invention is that reference is described according to process flow diagram and/or the block scheme of method, equipment (system) and the computer program of the embodiment of the invention.Should understand can be by the flow process in each flow process in computer program instructions realization flow figure and/or the block scheme and/or square frame and process flow diagram and/or the block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device producing a machine, so that the instruction of carrying out by the processor of computing machine or other programmable data processing device produces the device of the function that is used for being implemented in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame appointments.
These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work, so that the instruction that is stored in this computer-readable memory produces the manufacture that comprises command device, this command device is implemented in the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.
These computer program instructions also can be loaded on computing machine or other programmable data processing device, so that carry out the sequence of operations step producing computer implemented processing at computing machine or other programmable devices, thereby be provided for being implemented in the step of the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame in the instruction that computing machine or other programmable devices are carried out.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. a segmenting method is characterized in that, may further comprise the steps:
Obtain pending character string;
According to the Forward Maximum Method method described pending character string and universaling dictionary storehouse are mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result;
Judge whether described the first matching result is consistent with described the second matching result;
When consistent, export described the first matching result or described the second matching result as word segmentation result.
2. the method for claim 1 is characterized in that, is judging that whether consistent described the first matching result and described the second matching result also comprise step after:
When inconsistent, described the first matching result and described the second matching result are carried out the ambiguity elimination, described the first matching result after eliminating through ambiguity with output or described the second matching result are as word segmentation result.
3. method as claimed in claim 1 or 2, it is characterized in that, include the first individual phrase of the first numerical value in described the first matching result, include the second individual phrase of second value in described the second matching result, described the first numerical value is the quantity according to described the first phrase that comprises in definite described the first matching result of described the first matching result, described second value is the quantity according to described the second phrase that comprises in definite described the second matching result of described the second matching result, describedly judge that whether described the first matching result is consistent with described the second matching result, is specially:
Judge whether described the first numerical value equates with described second value;
When described the first numerical value and described second value were unequal, showing between described the first matching result and described the second matching result had ambiguity;
When described the first numerical value equated with described second value, whether individual the second phrase of the first phrase of judging described the first numerical value and described second value was identical;
When the second individual phrase of the first phrase of described the first numerical value and described second value is identical, show between described the first matching result and described the second matching result and do not have ambiguity, when the second individual phrase of the first phrase of described the first numerical value and described second value was incomplete same, showing between described the first matching result and described the second matching result had ambiguity.
4. method as claimed in claim 3, it is characterized in that, described the first matching result and described the second matching result are carried out the ambiguity elimination, and described the first matching result after eliminating through ambiguity with output or described the second matching result comprise as the step of word segmentation result:
When described the first numerical value and described second value are unequal, judge that whether described the first numerical value is greater than described second value;
When described the first numerical value during greater than described second value, export a described second value phrase;
When described the first numerical value during less than described second value, export described the first numerical value phrase.
5. method as claimed in claim 3, it is characterized in that, include the individual individual character of third value in described the first matching result, include the individual individual character of the 4th numerical value in described the second matching result, described third value is the quantity according to the individual character that comprises in definite described the first matching result of described the first matching result, described the 4th numerical value is the quantity according to the individual character that comprises in definite described the second matching result of described the second matching result, described the first matching result and described the second matching result are carried out the ambiguity elimination, and described the first matching result after eliminating through ambiguity with output or described the second matching result comprise as the step of word segmentation result:
When described the first numerical value equates with described second value, judge that whether described third value is greater than described the 4th numerical value;
When described third value during greater than described the 4th numerical value, export a described second value phrase;
When described third value during less than described the 4th numerical value, export described the first numerical value phrase;
When described third value equals described the 4th numerical value, export described the first numerical value phrase.
6. a participle device is characterized in that, comprising:
Acquisition module is used for obtaining pending character string;
Matching module, be used for according to the Forward Maximum Method method described pending character string and universaling dictionary storehouse being mated, obtain the first matching result, and according to reverse maximum matching method described pending character string and universaling dictionary storehouse are mated, obtain the second matching result;
Judge module is used for judging whether described the first matching result is consistent with described the second matching result;
Output module is used for exporting described the first matching result or described the second matching result as word segmentation result when consistent.
7. device as claimed in claim 6, it is characterized in that, described device also comprises disambiguation module, be used for when inconsistent, described the first matching result and described the second matching result are carried out the ambiguity elimination, and described the first matching result after eliminating through ambiguity with output or described the second matching result are as word segmentation result;
Described the first matching result after described output module is also eliminated through ambiguity for output or described the second matching result are as word segmentation result.
8. such as claim 6 or 7 described devices, it is characterized in that, include the first individual phrase of the first numerical value in described the first matching result, include the second individual phrase of second value in described the second matching result, described the first numerical value is the quantity according to described the first phrase that comprises in definite described the first matching result of described the first matching result, and described second value is the quantity according to described the second phrase that comprises in definite described the second matching result of described the second matching result;
Described judge module specifically is used for:
Judge whether described the first numerical value equates with described second value;
When described the first numerical value and described second value were unequal, showing between described the first matching result and described the second matching result had ambiguity;
When described the first numerical value equated with described second value, whether individual the second phrase of the first phrase of judging described the first numerical value and described second value was identical;
When the second individual phrase of the first phrase of described the first numerical value and described second value is identical, show between described the first matching result and described the second matching result and do not have ambiguity, when the second individual phrase of the first phrase of described the first numerical value and described second value was incomplete same, showing between described the first matching result and described the second matching result had ambiguity.
9. device as claimed in claim 8 is characterized in that, described disambiguation module specifically is used for:
When described the first numerical value and described second value are unequal, judge that whether described the first numerical value is greater than described second value;
Described output module specifically is used for:
When described the first numerical value during greater than described second value, export a described second value phrase;
When described the first numerical value during less than described second value, export described the first numerical value phrase.
10. device as claimed in claim 8, it is characterized in that, include the individual individual character of third value in described the first matching result, include the individual individual character of the 4th numerical value in described the second matching result, described third value is the quantity according to the individual character that comprises in definite described the first matching result of described the first matching result, described the 4th numerical value is the quantity according to the individual character that comprises in definite described the second matching result of described the second matching result, and described disambiguation module specifically is used for:
When described the first numerical value equates with described second value, judge that whether described third value is greater than described the 4th numerical value;
Described output module specifically is used for:
When described third value during greater than described the 4th numerical value, export a described second value phrase;
When described third value during less than described the 4th numerical value, export described the first numerical value phrase;
When described third value equals described the 4th numerical value, export described the first numerical value phrase.
CN201210407529.6A 2012-10-23 2012-10-23 Word segmentation method and device Active CN102915299B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201210407529.6A CN102915299B (en) 2012-10-23 2012-10-23 Word segmentation method and device
CN201510179584.8A CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201510179858.3A CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210407529.6A CN102915299B (en) 2012-10-23 2012-10-23 Word segmentation method and device

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN201510179858.3A Division CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201510179584.8A Division CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Publications (2)

Publication Number Publication Date
CN102915299A true CN102915299A (en) 2013-02-06
CN102915299B CN102915299B (en) 2015-04-08

Family

ID=47613670

Family Applications (3)

Application Number Title Priority Date Filing Date
CN201510179584.8A Pending CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201210407529.6A Active CN102915299B (en) 2012-10-23 2012-10-23 Word segmentation method and device
CN201510179858.3A Pending CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201510179584.8A Pending CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201510179858.3A Pending CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Country Status (1)

Country Link
CN (3) CN104765838A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544309A (en) * 2013-11-04 2014-01-29 北京中搜网络技术股份有限公司 Splitting method for search string of Chinese vertical search
CN103593338A (en) * 2013-11-15 2014-02-19 北京锐安科技有限公司 Information processing method and device
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN104461056A (en) * 2014-12-22 2015-03-25 联想(北京)有限公司 Information processing method and electronic equipment
CN105138514A (en) * 2015-08-24 2015-12-09 昆明理工大学 Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
CN105243055A (en) * 2015-09-28 2016-01-13 北京橙鑫数据科技有限公司 Multi-language based word segmentation method and apparatus
CN105335488A (en) * 2015-10-16 2016-02-17 中国南方电网有限责任公司电网技术研究中心 Knowledge base construction method
CN105630807A (en) * 2014-10-31 2016-06-01 高德软件有限公司 Analysis method and apparatus for associative relationships between unknown roads and known roads
CN106202040A (en) * 2016-06-28 2016-12-07 邓力 A kind of Chinese word cutting method of PDA translation system
CN106649251A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for Chinese word segmentation
CN107092590A (en) * 2017-03-17 2017-08-25 贵州恒昊软件科技有限公司 A kind of sentence segmenting method and system
WO2018010579A1 (en) * 2016-07-13 2018-01-18 阿里巴巴集团控股有限公司 Character string segmentation method, apparatus and device
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
CN108009153A (en) * 2017-12-08 2018-05-08 北京明朝万达科技股份有限公司 A kind of searching method and system based on search statement cutting word result
CN110222335A (en) * 2019-05-20 2019-09-10 平安科技(深圳)有限公司 A kind of text segmenting method and device
CN112215010A (en) * 2019-07-10 2021-01-12 北京猎户星空科技有限公司 Semantic recognition method and equipment
CN112287108A (en) * 2020-10-29 2021-01-29 四川长虹电器股份有限公司 Intention recognition optimization method in field of Internet of things
WO2021127987A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Polyphonic character prediction method and disambiguation method, apparatuses, device and computer readable storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550170B (en) * 2015-12-14 2018-10-12 北京锐安科技有限公司 A kind of Chinese word cutting method and device
CN107220300B (en) * 2017-05-05 2018-07-20 平安科技(深圳)有限公司 Information mining method, electronic device and readable storage medium storing program for executing
CN113342989B (en) * 2021-05-24 2022-12-20 北京航空航天大学 Knowledge graph construction method and device of patent data, storage medium and terminal

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122900A (en) * 2007-09-25 2008-02-13 中兴通讯股份有限公司 Words partition system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042692B (en) * 2006-03-24 2010-09-22 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
CN102394061B (en) * 2011-11-08 2013-01-02 中国农业大学 Text-to-speech method and system based on semantic retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122900A (en) * 2007-09-25 2008-02-13 中兴通讯股份有限公司 Words partition system and method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HFGANG: "中文分词基础原则及正向最大匹配法、逆向最大匹配法、双向最大匹配法的分析", 《新浪微博》 *
张冬慧 等: "文本自动分类关键技术研究", 《微计算机信息》 *
张旭: "一个基于词典与统计的中文分词算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
罗杰 等: "基于新的关键词提取方法的快速文本分类系统", 《计算机应用研究》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544309A (en) * 2013-11-04 2014-01-29 北京中搜网络技术股份有限公司 Splitting method for search string of Chinese vertical search
CN103544309B (en) * 2013-11-04 2017-03-15 北京中搜网络技术股份有限公司 A kind of retrieval string method for splitting of Chinese vertical search
CN103593338B (en) * 2013-11-15 2016-05-11 北京锐安科技有限公司 A kind of information processing method and device
CN103593338A (en) * 2013-11-15 2014-02-19 北京锐安科技有限公司 Information processing method and device
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN105630807A (en) * 2014-10-31 2016-06-01 高德软件有限公司 Analysis method and apparatus for associative relationships between unknown roads and known roads
CN105630807B (en) * 2014-10-31 2020-02-07 高德软件有限公司 Method and device for analyzing incidence relation between unknown road and known road
CN104461056A (en) * 2014-12-22 2015-03-25 联想(北京)有限公司 Information processing method and electronic equipment
CN105138514A (en) * 2015-08-24 2015-12-09 昆明理工大学 Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
CN105138514B (en) * 2015-08-24 2018-11-09 昆明理工大学 It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
CN105243055A (en) * 2015-09-28 2016-01-13 北京橙鑫数据科技有限公司 Multi-language based word segmentation method and apparatus
CN105243055B (en) * 2015-09-28 2018-07-31 北京橙鑫数据科技有限公司 Based on multilingual segmenting method and device
CN105335488A (en) * 2015-10-16 2016-02-17 中国南方电网有限责任公司电网技术研究中心 Knowledge base construction method
CN106649251A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for Chinese word segmentation
CN106202040A (en) * 2016-06-28 2016-12-07 邓力 A kind of Chinese word cutting method of PDA translation system
WO2018010579A1 (en) * 2016-07-13 2018-01-18 阿里巴巴集团控股有限公司 Character string segmentation method, apparatus and device
CN107092590A (en) * 2017-03-17 2017-08-25 贵州恒昊软件科技有限公司 A kind of sentence segmenting method and system
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
CN108009153A (en) * 2017-12-08 2018-05-08 北京明朝万达科技股份有限公司 A kind of searching method and system based on search statement cutting word result
CN110222335A (en) * 2019-05-20 2019-09-10 平安科技(深圳)有限公司 A kind of text segmenting method and device
WO2020232881A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Text word segmentation method and apparatus
CN112215010A (en) * 2019-07-10 2021-01-12 北京猎户星空科技有限公司 Semantic recognition method and equipment
WO2021127987A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Polyphonic character prediction method and disambiguation method, apparatuses, device and computer readable storage medium
CN112287108A (en) * 2020-10-29 2021-01-29 四川长虹电器股份有限公司 Intention recognition optimization method in field of Internet of things
CN112287108B (en) * 2020-10-29 2022-08-16 四川长虹电器股份有限公司 Intention recognition optimization method in field of Internet of things

Also Published As

Publication number Publication date
CN104765838A (en) 2015-07-08
CN104765724A (en) 2015-07-08
CN102915299B (en) 2015-04-08

Similar Documents

Publication Publication Date Title
CN102915299B (en) Word segmentation method and device
CN110543574B (en) Knowledge graph construction method, device, equipment and medium
CN105095204B (en) The acquisition methods and device of synonym
US20020174095A1 (en) Very-large-scale automatic categorizer for web content
CN105930362B (en) Search for target identification method, device and terminal
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN102789464B (en) Natural language processing methods, devices and systems based on semantics identity
CN107844493B (en) File association method and system
CN107577663B (en) Key phrase extraction method and device
CN106777261A (en) Data query method and device based on multi-source heterogeneous data set
Sunitha et al. A study on abstractive summarization techniques in Indian languages
Ye et al. Unknown Chinese word extraction based on variety of overlapping strings
CN106055539A (en) Name disambiguation method and apparatus
CN102339294A (en) Searching method and system for preprocessing keywords
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN109446313B (en) Sequencing system and method based on natural language analysis
US20040122660A1 (en) Creating taxonomies and training data in multiple languages
CN103914569B (en) Input creation method, the device of reminding method, device and dictionary tree-model
CN112527948A (en) Data real-time duplicate removal method and system based on sentence-level index
CN101872363B (en) Method for extracting keywords
Biba et al. Boosting text classification through stemming of composite words
Watrin et al. An N-gram frequency database reference to handle MWE extraction in NLP applications
JP2003281165A (en) Document summarization method and system
CN106776590A (en) A kind of method and system for obtaining entry translation
JP3090233B2 (en) A method for identifying associations between complex information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant