CN105138514B - It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method - Google Patents

It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method Download PDF

Info

Publication number
CN105138514B
CN105138514B CN201510522091.XA CN201510522091A CN105138514B CN 105138514 B CN105138514 B CN 105138514B CN 201510522091 A CN201510522091 A CN 201510522091A CN 105138514 B CN105138514 B CN 105138514B
Authority
CN
China
Prior art keywords
word
dictionary
cutting
participle
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510522091.XA
Other languages
Chinese (zh)
Other versions
CN105138514A (en
Inventor
彭艺
苏黎韡
邵玉斌
龙华
宋浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201510522091.XA priority Critical patent/CN105138514B/en
Publication of CN105138514A publication Critical patent/CN105138514A/en
Application granted granted Critical
Publication of CN105138514B publication Critical patent/CN105138514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The forward direction based on dictionary that the present invention relates to a kind of is gradually plus a word maximum matches Chinese word cutting method, belongs to computer Chinese text-processing technical field.The present invention includes step:Text to be slit is read in first, and the text of input is carried out by thick cutting according to the apparent separator such as punctuate, number, western language, chart, is divided into short text one by one;Using the short text of thick cutting as further cutting object, further participle search length is set;The short text after thick cutting is taken gradually plus in the way of a word to be segmented with dictionary pattern matching by positive, until all short texts participle terminates.The invention avoids conventional forward maximums to match the shortcomings that participle rate-accuracy rate is difficult to balance, and all increases than conventional forward and reverse Max Match word segmentation arithmetic in terms of cutting word speed and participle accuracy rate.

Description

It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
Technical field
The forward direction based on dictionary that the present invention relates to a kind of is gradually plus a word maximum matches Chinese word cutting method, belongs to computer Chinese text processing technical field.
Background technology
With the development of science and technology, human society has come into the information age.Computer is allowed " to understand " the natural language of the mankind Speech, realizes that free human-computer interaction has become fine vision.For human language, word is minimum, energy independent activities , significant linguistic unit.There is very big differences for the western languages such as Chinese and English, French, between the word and word of western language There is apparent space as separator, computer is easy to understand the meaning of a word according to these spaces;And Chinese sentence Middle word and word closely come together, computer understanding get up will be difficult it is more.Chinese word segmentation is the key that Chinese information processing And premise, only handle Chinese word segmentation well, could allow computer understanding Chinese, carry out subsequent Chinese information processing, and from sea Useful information is extracted in the information of amount and provides service for the mankind, realizes Computerized intelligent.With the development of Chinese information processing, Chinese words segmentation is widely used, and is deeply applied in generally main three fields below, is played crucial work With.1) computer and artificial intelligence field:It is engaged in natural language understanding and treatment research using Chinese word segmentation achievement, such as semantic point Analysis, autoabstract, knowledge engineering, machine translation, expert system and intelligent computer etc.;2) information field:Under study for action In the combination of technologies such as text participle and automatic indexing, Chinese word segmentation and information retrieval and search engine, achieve many gratifying Achievement.3) Chinese linguistics research field:Promote Chinese (Han)language to study using Chinese word segmentation, such as studies the spy of Chinese Point, the comparison with other Languages, the specification etc. of Chinese.
Chinese word segmentation is the basic link of Chinese information processing, and restricts one serious " bottleneck " of its development.In recent years Come, Chinese words segmentation causes the attention and research of various circles of society especially company and colleges and universities, occurs various points Word method:Two-way maximum matching method, by word traversal, set up cutting notation, Word-frequency, augmented transition network method, double To Markov chain method, fuzzy clustering algorithm, expert system approach, at least segment a variety of segmenting methods such as method, neuroid method.It is different Segmenting method simulates the not ipsilateral that the mankind segment behavior, serves the Chinese information processing system of different purposes.Total comes It says, these methods are all the extension, extension and improvement of three basic skills.These three basic skills are respectively:Based on dictionary Segmenting method, the segmenting method based on statistics and the segmenting method based on understanding, they have respectively represented current segmenting method Three great development directions.
Forward Maximum Method method (Forward Maximum Matching Method), so-called " maximum " refers to the algorithm The word string as long as possible started with a certain Chinese character is always regarded as a word, that is, is embodied " priority of long word ".When in word When can not find the word string in allusion quotation (when matching unsuccessful), then removes the last one Chinese character and continue to search for matching.This method is general Referred to as FMM methods.Its algorithm idea is:If D is dictionary, L indicates that the most major term in D is long, and S is word string to be slit.Every time from S Middle length of taking out is matched for the substring M of L with the word in D.If successful match, go out substring M as a word segmentation Come, while moving L character after pointer and continuing to match;Otherwise the last character of substring M is removed, then is carried out by identical method Matching, until being syncopated as all words.Conventional forward and reverse Max Match word segmentation arithmetic need that a matching length is previously set M is spent, is generally grown as matching length and is segmented using the most major term in dictionary for word segmentation.It is it is emphasised that " priority of long word ", every time It will be matched since M character.If M is long, to search repeatedly can just be syncopated as a word, cause unnecessary time wave Take, participle speed is not high.And M is too short, having some words length to be more than the long word of M cannot correctly be cut out, and can not be ensured The accuracy rate of participle.
In order to solve the deficiency that above-mentioned conventional forward matching algorithm occurs, it is based on positive matching algorithm herein and proposes forward direction Gradually plus a word maximum matching algorithm, the preferably perfect deficiency of traditional algorithm.
Invention content
The forward direction based on dictionary that the present invention provides a kind of is gradually plus a word maximum matches Chinese word cutting method, for solution The problems such as speed is slow, and word segmentation result is inaccurate is certainly segmented caused by conventional forward maximum match segmentation, this method does not need It presets that maximum matching word is long, avoids traditional maximum matching method because the maximum matching word length of setting is long, and carry out more Secondary useless matching, participle speed are slower;Maximum matching word length is too short, but can not correct cutting the case where.
The technical scheme is that:It is a kind of based on dictionary it is positive gradually plus word maximum matching Chinese word cutting method It is as follows:
Step1, read in text to be slit, according to punctuate, number, western language, chart as separator by the text of input into The thick cutting of row, is divided into short text one by one;
Step2, using the short text of thick cutting as further cutting object, set further participle search length L, wherein L takes the length grown less than most major term in dictionary;
Step3, two words of starting for taking a short text after thick cutting search matching in dictionary;
If there is no two words currently inputted, then it represents that first character is individual character, its cutting is gone out;Then word is read to refer to It is moved after needle, two words next is taken to carry out the lookup matching of a new round;
If in the presence of two words currently inputted, the length pointers for searching text are increased into a word backward, increase to three A word, continuation are matched in dictionary;
If this three words is not present, shows that the first two word is a word, its cutting is gone out, the knot as a cutting Fruit;Then participle moves after searching pointer, and two words next is taken to carry out the lookup matching of a new round;
If this three words exist, continue to increase a word backward, constitute four words, search this four words whether there is in In dictionary, and so on, matched and searched is carried out, to be segmented;
Step4, when find search length be L when, since the character late of L, again according in step Step3 with This method analogized carries out searching matching and participle, until all short texts participle terminates.
The beneficial effects of the invention are as follows:
1, matched and searched mechanism of this method based on dictionary carries out lookup matching, to determine to the text to be slit of input Word segmentation result.It is long that maximum matching word is not preset when participle, but a summary is set according to maximum entry length in dictionary Less than most major term length corresponding search length L, avoid traditional maximum matching method because setting maximum matching word length it is long, And repeatedly useless matching is carried out, participle speed is slower;Maximum matching word length is too short, but can not correct cutting the case where;
2, this method is improved well in terms of participle response time and participle accuracy.For test text, Positive using the present invention gradually adds word matching segmenting method to be segmented with traditional Forward Maximum Method based on dictionary, and Reverse maximum match segmentation is compared in participle aspect of performance, and either accuracy, which still segments the time, must all show Good advantage is gone out.
Description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2, which is that embodiment 1 is positive in the present invention, gradually adds a word to match segmenting method flow chart;
Fig. 3 is that the forward direction based on dictionary is gradually in the present invention plus a word matches segmenting method and traditional participle based on dictionary The accuracy comparison diagram of method.
Specific implementation mode
Embodiment 1:As shown in Figs. 1-3, a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method, The step of the method is:
Step 1: thick cutting;Rejecting punctuation mark, space, date, number, English alphabet are carried out to text to be slit Deng label, pending text is set as A, is divided into N number of short text sequence SiSet (0 < i≤N), i.e., cutting be SiA short essay This, A={ S1,S2,S3,...SN};
Step 2: as shown in Fig. 2, read in the short text after thick cutting one by one in order successively, it is denoted as SiIf each sentence Subsequence SiBy m word Wij(0 < j≤m) is formed, i.e. Si=<Wi1Wi2Wi3...Wim>;
Step 3: by the text S after thick cuttingiIt is segmented.As shown in Fig. 2, text is carried out word segmentation processing.
1) participle search length L, L that one is slightly less than most major term length in dictionary are set and is generally slightly smaller than in dictionary most major term It is long;
2) in short text SiThe character W that middle sequence takes starting the first two adjacentijWi(j+1), it is W when initiali1Wi2, in dictionary Middle lookup matching, if the two word W currently inputtedijWi(j+1)It is not the word in dictionary, then turns (3);Otherwise, turn (4);
If 3) the two word W currently inputtedijWi(j+1)It is not present in dictionary, then shows the first character in the first two word It is a word, by WijFrom sentence SiMiddle cutting is gone out.Judge whether to SiSentence tail, if so, SiParticiple terminates;Otherwise j=j+1, Turn again (2);
If 4) in the presence of the two word W currently inputtedijWi(j+1), then the length pointers for searching text are increased into a word backward, That is WijWi(j+1)Afterwards plus a word, increase to three words, obtain Sk=WijWi(j+1)Wik(o < k≤L) continues the progress in dictionary Match, judges that the word newly read in whether there is in dictionary.If in the presence of turning (5), otherwise, turning (6);
If 5) this three words Sk=WijWi(j+1)WikIn the presence of if this three words exists, continuing pointer toward Sk= WijWi(j+1)WikIncrease a word afterwards, constitutes four words Sk+1=WijWi(j+1)...WikWi(k+1), search this four words Sk+1= WijWi(j+1)...WikWi(k+1)With the presence or absence of in dictionary, if so, continuing gradually a word to be added again to judge backward, turn (7);If no It is, then SkCutting is gone out, and word segmentation result is put into;
If 6) this three words Sk=WijWi(j+1)WikIt is not present, then shows the first two word WijWi(j+1)It is a word, it will WijWi(j+1)From SiIn its cutting go out, then participle moves after searching pointer, makes pointer j=j+2, then two words next is taken to carry out The lookup of a new round matches.If j≤m, show the also incomplete cutting of current short text, turns (2), if pointer j=m, short text SiParticiple terminates;
7) the rest may be inferred, judges whether the mobile current word number k≤L read in later is true when mobile participle pointer every time, If so, then continue in Sk+1=WijWi(j+1)...WikWi(k+1)Gradually a word is added to be judged afterwards;Otherwise from Wi(k+1)Place starts It takes two word characters to carry out next round and searches matching.
Step 4: judge whether reading textual data i≤N is true, if so, show that current text does not segment also and terminate, then It segments pointer and increases by one, i=i+1, read in next sentence and carry out searching matching and participle according to program above again, into Row participle terminates until entirely inputting text participle;Otherwise, illustrate that entire text participle terminates.
Embodiment 2:As shown in Figs. 1-3, a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method, The step of the method is:
Setting one is slightly less than the participle search length L that most major term is grown in dictionary;If character string to be slit is S= s1s2s3s4...si.Subordinate clause head starts, and takes the first two character s1s2, judge s1s2Whether it is a word in dictionary, if it is not, then Illustrate s1It is monosyllabic word, its cutting is gone out, then the length pointers for searching text are increased into a word backward, increases to third Word takes the s in dictionary2s3Carry out the lookup matching of a new round;If s1s2It is the word in dictionary, then increases a word backward, judges s1s2s3Whether at word, if s1s2s3It is not the word in dictionary, then shows s1s2It is a word, its cutting is gone out;If s1s2s3It is A word in dictionary then continues to increase a word backward, searches s1s2s3s4Whether it is word in dictionary, if not word, then will s1s2s3It goes out as a word segmentation, if the word in dictionary, then continues to increase a word backward matching again.The rest may be inferred, Until entire sentence S=s1s2s3s4...siCutting finishes.
Embodiment 3:As shown in Figs. 1-3, a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method, The step of the method is:
Step1, read in text to be slit, according to punctuate, number, western language, chart as separator by the text of input into The thick cutting of row, is divided into short text one by one;Such as it is divided into a text " today, weather was particularly good ";
Step2, using the short text of thick cutting as further cutting object, set further participle search length L=7, Wherein L takes the length grown less than most major term in dictionary, wherein most major term a length of 12;
Step3, two words " today " of starting for taking a short text after thick cutting search matching in dictionary;Through It is present in dictionary with " today ", then the length pointers for searching text increase a word backward, increases to three word " today It ", continuation is matched in dictionary;Matched " today day " is not present, then shows that " today " is a word, then " modern It " cutting goes out, the result as a cutting;Then participle moves after searching pointer, takes two words " weather " next to carry out new The lookup matching of one wheel;Matched " weather " exists, then the length pointers for searching text increase a word backward, increases to three A word " weather is special ", continuation is matched in dictionary;Matched " weather is special " is not present, then shows that " weather " is a word, So " weather " cutting is gone out, the result as a cutting;And so on, matched and searched is carried out, to be segmented, point The result of word be/today/weather/especially// it is good/;The process specifically segmented is shown in Table 1;
Table 1 is positive gradually to add a word maximum to match participle process
Matching field Matching is passed through Matching result
Today Exist in dictionary Today
Weather Exist in dictionary Weather
Especially Exist in dictionary Especially
It is good It is not present in dictionary
's Monosyllabic word 's
It is good Monosyllabic word It is good
In order to verify the advantageous effect of this method, with this method and traditional Forward Maximum Method segmenting method, it is reverse most Big matching segmenting method (primary maximum matching character length is 4) is compared, traditional Forward Maximum Method segmenting method, inverse To maximum match segmentation participle process as shown in table 2, table 3;
1) Forward Maximum Method segmenting method:
2 Forward Maximum Method of table segments process
Matching field Matching is passed through Matching result
Today weather It is not present in dictionary
Today day It is not present in dictionary
Today Exist in dictionary Today
Weather is special It is not present in dictionary
Weather is special It is not present in dictionary
Weather Exist in dictionary Weather
It is special good It is not present in dictionary
Particularly It is not present in dictionary
Especially Exist in dictionary Especially
It is good It is not present in dictionary
's Monosyllabic word 's
It is good Monosyllabic word It is good
Forward Maximum Method the result is that:/ today/weather/especially// good/
2) reverse maximum match segmentation:Substring is taken to be matched from character string to be slit from right to left;
The reverse maximum matching participle process of table 3
Match word field Matching is passed through Matching result
It is special good It is not present in dictionary
It is other good It is not present in dictionary
It is good It is not present in dictionary
It is good Monosyllabic word It is good
Gas is special It is not present in dictionary
Particularly It is not present in dictionary
It is other It is not present in dictionary
's Monosyllabic word 's
Weather is special It is not present in dictionary
Gas is special It is not present in dictionary
Today weather It is not present in dictionary
Its weather It is not present in dictionary
Weather Exist in dictionary Weather
Today Exist in dictionary Today
It is reverse maximum matched the result is that:/ today/weather/especially// good/
Although can be seen that final word segmentation result all from the participle process of above-mentioned three kinds of methods and be it is identical, correct, But it can be clearly seen that the participle of traditional positive, reverse maximum matching process based on dictionary from the process of participle Journey all occurs reading in the repeated matching step that word is not present, and wastes the time of participle, causes dictionary matching, ambiguity after participle The workload of judgement.And forward direction proposed by the present invention gradually adds a word maximum matching process, almost each two-character phrase is obtained for As soon as quick, the accurate participle of step participle, the whole efficiency segmented in this way are greatly improved, the conclusion of test simulation This point is demonstrated, as shown in table 4 below.
The average cutting rate of 4 three kinds of segmenting methods of table compares
Segmenting method Average cutting speed (word/s)
Traditional Forward Maximum Method method 52000
Traditional reverse maximum matching method 103000
It is positive gradually to add a word matching method 113000
Three kinds of methods are applied in the experimental enviroment of the present invention, with a complete dictionary for including 270,000 entries As dictionary for word segmentation, in hardware using calculator memory 1G or more, software Windows7 uses JAVA development languages, My Simulated experiment is carried out under the running environment of 8.5 developing instruments of Eclipse.Have chosen economy, science and technology, social news, military affairs four Aspect size is the article of 0.02M or so, is segmented using three kinds of different segmentation methods, obtained result such as Fig. 3 institutes Show, ordinate indicates that participle accuracy rate, abscissa indicate the field of participle, it can be seen that in these three segmenting methods, herein The positive of proposition gradually adds a word matching process to be compared with traditional positive, reverse maximum match segmentation, and accuracy rate obtains Raising is arrived.
The experiment conclusion table 4 of above example, Fig. 3 can be shown that a kind of forward direction based on dictionary of the present invention gradually adds one Word maximum match segmentation is more traditional to be had based on the segmenting method of dictionary in terms of participle cutting speed, participle accuracy rate Significant improvement.
The specific implementation mode of the present invention is explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (1)

1. a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method, it is characterised in that:It is described word-based The positive of allusion quotation gradually adds word maximum matching Chinese word cutting method to be as follows:
Step1, text to be slit is read in, is carried out the text of input slightly as separator according to punctuate, number, western language, chart Cutting is divided into short text one by one;
Step2, using the short text of thick cutting as further cutting object, set further participle search length L, wherein L and take The length grown less than most major term in dictionary;
Step3, two words of starting for taking a short text after thick cutting search matching in dictionary;
If there is no two words currently inputted, then it represents that first character is individual character, its cutting is gone out;Then after reading word pointer It moves, two words next is taken to carry out the lookup matching of a new round;
If in the presence of two words currently inputted, word pointer will be read and increase a word backward, increase to three words, continue in dictionary In matched;
If this three words is not present, shows that the first two word is a word, its cutting is gone out, the result as a cutting; Then it is moved after reading word pointer, two words next is taken to carry out the lookup matching of a new round;
If this three words exists, continue to increase a word backward, constitute four words, searches this four words and whether there is in dictionary In, and so on, matched and searched is carried out, to be segmented;
Step4, when find search length be L when, since the character late of L, again according in step Step3 with such The method pushed away carries out searching matching and participle, until all short texts participle terminates.
CN201510522091.XA 2015-08-24 2015-08-24 It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method Active CN105138514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510522091.XA CN105138514B (en) 2015-08-24 2015-08-24 It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510522091.XA CN105138514B (en) 2015-08-24 2015-08-24 It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method

Publications (2)

Publication Number Publication Date
CN105138514A CN105138514A (en) 2015-12-09
CN105138514B true CN105138514B (en) 2018-11-09

Family

ID=54723865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510522091.XA Active CN105138514B (en) 2015-08-24 2015-08-24 It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method

Country Status (1)

Country Link
CN (1) CN105138514B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975454A (en) * 2016-04-21 2016-09-28 广州精点计算机科技有限公司 Chinese word segmentation method and device of webpage text
CN106126496B (en) * 2016-06-17 2019-01-18 联动优势科技有限公司 A kind of information segmenting method and device
CN106202040A (en) * 2016-06-28 2016-12-07 邓力 A kind of Chinese word cutting method of PDA translation system
CN107092590A (en) * 2017-03-17 2017-08-25 贵州恒昊软件科技有限公司 A kind of sentence segmenting method and system
CN108304367B (en) * 2017-04-07 2021-11-26 腾讯科技(深圳)有限公司 Word segmentation method and device
CN107357784B (en) * 2017-07-05 2021-01-26 东南大学 Intelligent analysis method for data model of relay protection device equipment
CN109284763A (en) * 2017-07-19 2019-01-29 阿里巴巴集团控股有限公司 A kind of method and server generating participle training data
CN107608968A (en) * 2017-09-22 2018-01-19 深圳市易图资讯股份有限公司 Chinese word cutting method, the device of text-oriented big data
CN108052508B (en) * 2017-12-29 2021-11-09 北京嘉和海森健康科技有限公司 Information extraction method and device
CN108363686A (en) * 2018-01-12 2018-08-03 中国平安人寿保险股份有限公司 A kind of character string segmenting method, device, terminal device and storage medium
CN108197315A (en) * 2018-02-01 2018-06-22 中控技术(西安)有限公司 A kind of method and apparatus for establishing participle index database
CN110688835B (en) * 2019-09-03 2023-03-31 重庆邮电大学 Word feature value-based law-specific field word discovery method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063424A (en) * 2010-12-24 2011-05-18 上海电机学院 Method for Chinese word segmentation
CN102915299A (en) * 2012-10-23 2013-02-06 海信集团有限公司 Word segmentation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063424A (en) * 2010-12-24 2011-05-18 上海电机学院 Method for Chinese word segmentation
CN102915299A (en) * 2012-10-23 2013-02-06 海信集团有限公司 Word segmentation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
正向最大匹配中文分词算法;佚名;《http://blog.csdn.net/yangyan19870319/article/details/6399871》;20110506;1-9 *

Also Published As

Publication number Publication date
CN105138514A (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN105138514B (en) It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
TWI662425B (en) A method of automatically generating semantic similar sentence samples
JP2021096812A (en) Method, apparatus, electronic device and storage medium for processing semantic representation model
CN108681574B (en) Text abstract-based non-fact question-answer selection method and system
CN114065758B (en) Document keyword extraction method based on hypergraph random walk
CN106126620A (en) Method of Chinese Text Automatic Abstraction based on machine learning
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN106569993A (en) Method and device for mining hypernym-hyponym relation between domain-specific terms
EP3483747A1 (en) Preserving and processing ambiguity in natural language
CN107133212B (en) Text implication recognition method based on integrated learning and word and sentence comprehensive information
CN112948543A (en) Multi-language multi-document abstract extraction method based on weighted TextRank
CN111191464A (en) Semantic similarity calculation method based on combined distance
CN113761890B (en) Multi-level semantic information retrieval method based on BERT context awareness
CN106383814A (en) Word segmentation method of English social media short text
CN113032541A (en) Answer extraction method based on bert and fusion sentence cluster retrieval
Sarmah et al. Development of Assamese rule based stemmer using WordNet
CN110413972A (en) A kind of table name field name intelligence complementing method based on NLP technology
Sarhan et al. Arabic relation extraction: A survey
CN111178009B (en) Text multilingual recognition method based on feature word weighting
Zhang et al. Domain-specific term extraction from free texts
CN108255818B (en) Combined machine translation method using segmentation technology
Maheswari et al. Rule based morphological variation removable stemming algorithm
Tukur et al. Parts-of-speech tagging of Hausa-based texts using hidden Markov model
Sarkar et al. Bengali noun phrase chunking based on conditional random fields
Wang Research on cultural translation based on neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant