CN105138514B - It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method - Google Patents
It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method Download PDFInfo
- Publication number
- CN105138514B CN105138514B CN201510522091.XA CN201510522091A CN105138514B CN 105138514 B CN105138514 B CN 105138514B CN 201510522091 A CN201510522091 A CN 201510522091A CN 105138514 B CN105138514 B CN 105138514B
- Authority
- CN
- China
- Prior art keywords
- word
- dictionary
- cutting
- participle
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The forward direction based on dictionary that the present invention relates to a kind of is gradually plus a word maximum matches Chinese word cutting method, belongs to computer Chinese text-processing technical field.The present invention includes step:Text to be slit is read in first, and the text of input is carried out by thick cutting according to the apparent separator such as punctuate, number, western language, chart, is divided into short text one by one;Using the short text of thick cutting as further cutting object, further participle search length is set;The short text after thick cutting is taken gradually plus in the way of a word to be segmented with dictionary pattern matching by positive, until all short texts participle terminates.The invention avoids conventional forward maximums to match the shortcomings that participle rate-accuracy rate is difficult to balance, and all increases than conventional forward and reverse Max Match word segmentation arithmetic in terms of cutting word speed and participle accuracy rate.
Description
Technical field
The forward direction based on dictionary that the present invention relates to a kind of is gradually plus a word maximum matches Chinese word cutting method, belongs to computer
Chinese text processing technical field.
Background technology
With the development of science and technology, human society has come into the information age.Computer is allowed " to understand " the natural language of the mankind
Speech, realizes that free human-computer interaction has become fine vision.For human language, word is minimum, energy independent activities
, significant linguistic unit.There is very big differences for the western languages such as Chinese and English, French, between the word and word of western language
There is apparent space as separator, computer is easy to understand the meaning of a word according to these spaces;And Chinese sentence
Middle word and word closely come together, computer understanding get up will be difficult it is more.Chinese word segmentation is the key that Chinese information processing
And premise, only handle Chinese word segmentation well, could allow computer understanding Chinese, carry out subsequent Chinese information processing, and from sea
Useful information is extracted in the information of amount and provides service for the mankind, realizes Computerized intelligent.With the development of Chinese information processing,
Chinese words segmentation is widely used, and is deeply applied in generally main three fields below, is played crucial work
With.1) computer and artificial intelligence field:It is engaged in natural language understanding and treatment research using Chinese word segmentation achievement, such as semantic point
Analysis, autoabstract, knowledge engineering, machine translation, expert system and intelligent computer etc.;2) information field:Under study for action
In the combination of technologies such as text participle and automatic indexing, Chinese word segmentation and information retrieval and search engine, achieve many gratifying
Achievement.3) Chinese linguistics research field:Promote Chinese (Han)language to study using Chinese word segmentation, such as studies the spy of Chinese
Point, the comparison with other Languages, the specification etc. of Chinese.
Chinese word segmentation is the basic link of Chinese information processing, and restricts one serious " bottleneck " of its development.In recent years
Come, Chinese words segmentation causes the attention and research of various circles of society especially company and colleges and universities, occurs various points
Word method:Two-way maximum matching method, by word traversal, set up cutting notation, Word-frequency, augmented transition network method, double
To Markov chain method, fuzzy clustering algorithm, expert system approach, at least segment a variety of segmenting methods such as method, neuroid method.It is different
Segmenting method simulates the not ipsilateral that the mankind segment behavior, serves the Chinese information processing system of different purposes.Total comes
It says, these methods are all the extension, extension and improvement of three basic skills.These three basic skills are respectively:Based on dictionary
Segmenting method, the segmenting method based on statistics and the segmenting method based on understanding, they have respectively represented current segmenting method
Three great development directions.
Forward Maximum Method method (Forward Maximum Matching Method), so-called " maximum " refers to the algorithm
The word string as long as possible started with a certain Chinese character is always regarded as a word, that is, is embodied " priority of long word ".When in word
When can not find the word string in allusion quotation (when matching unsuccessful), then removes the last one Chinese character and continue to search for matching.This method is general
Referred to as FMM methods.Its algorithm idea is:If D is dictionary, L indicates that the most major term in D is long, and S is word string to be slit.Every time from S
Middle length of taking out is matched for the substring M of L with the word in D.If successful match, go out substring M as a word segmentation
Come, while moving L character after pointer and continuing to match;Otherwise the last character of substring M is removed, then is carried out by identical method
Matching, until being syncopated as all words.Conventional forward and reverse Max Match word segmentation arithmetic need that a matching length is previously set
M is spent, is generally grown as matching length and is segmented using the most major term in dictionary for word segmentation.It is it is emphasised that " priority of long word ", every time
It will be matched since M character.If M is long, to search repeatedly can just be syncopated as a word, cause unnecessary time wave
Take, participle speed is not high.And M is too short, having some words length to be more than the long word of M cannot correctly be cut out, and can not be ensured
The accuracy rate of participle.
In order to solve the deficiency that above-mentioned conventional forward matching algorithm occurs, it is based on positive matching algorithm herein and proposes forward direction
Gradually plus a word maximum matching algorithm, the preferably perfect deficiency of traditional algorithm.
Invention content
The forward direction based on dictionary that the present invention provides a kind of is gradually plus a word maximum matches Chinese word cutting method, for solution
The problems such as speed is slow, and word segmentation result is inaccurate is certainly segmented caused by conventional forward maximum match segmentation, this method does not need
It presets that maximum matching word is long, avoids traditional maximum matching method because the maximum matching word length of setting is long, and carry out more
Secondary useless matching, participle speed are slower;Maximum matching word length is too short, but can not correct cutting the case where.
The technical scheme is that:It is a kind of based on dictionary it is positive gradually plus word maximum matching Chinese word cutting method
It is as follows:
Step1, read in text to be slit, according to punctuate, number, western language, chart as separator by the text of input into
The thick cutting of row, is divided into short text one by one;
Step2, using the short text of thick cutting as further cutting object, set further participle search length L, wherein
L takes the length grown less than most major term in dictionary;
Step3, two words of starting for taking a short text after thick cutting search matching in dictionary;
If there is no two words currently inputted, then it represents that first character is individual character, its cutting is gone out;Then word is read to refer to
It is moved after needle, two words next is taken to carry out the lookup matching of a new round;
If in the presence of two words currently inputted, the length pointers for searching text are increased into a word backward, increase to three
A word, continuation are matched in dictionary;
If this three words is not present, shows that the first two word is a word, its cutting is gone out, the knot as a cutting
Fruit;Then participle moves after searching pointer, and two words next is taken to carry out the lookup matching of a new round;
If this three words exist, continue to increase a word backward, constitute four words, search this four words whether there is in
In dictionary, and so on, matched and searched is carried out, to be segmented;
Step4, when find search length be L when, since the character late of L, again according in step Step3 with
This method analogized carries out searching matching and participle, until all short texts participle terminates.
The beneficial effects of the invention are as follows:
1, matched and searched mechanism of this method based on dictionary carries out lookup matching, to determine to the text to be slit of input
Word segmentation result.It is long that maximum matching word is not preset when participle, but a summary is set according to maximum entry length in dictionary
Less than most major term length corresponding search length L, avoid traditional maximum matching method because setting maximum matching word length it is long,
And repeatedly useless matching is carried out, participle speed is slower;Maximum matching word length is too short, but can not correct cutting the case where;
2, this method is improved well in terms of participle response time and participle accuracy.For test text,
Positive using the present invention gradually adds word matching segmenting method to be segmented with traditional Forward Maximum Method based on dictionary, and
Reverse maximum match segmentation is compared in participle aspect of performance, and either accuracy, which still segments the time, must all show
Good advantage is gone out.
Description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2, which is that embodiment 1 is positive in the present invention, gradually adds a word to match segmenting method flow chart;
Fig. 3 is that the forward direction based on dictionary is gradually in the present invention plus a word matches segmenting method and traditional participle based on dictionary
The accuracy comparison diagram of method.
Specific implementation mode
Embodiment 1:As shown in Figs. 1-3, a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method,
The step of the method is:
Step 1: thick cutting;Rejecting punctuation mark, space, date, number, English alphabet are carried out to text to be slit
Deng label, pending text is set as A, is divided into N number of short text sequence SiSet (0 < i≤N), i.e., cutting be SiA short essay
This, A={ S1,S2,S3,...SN};
Step 2: as shown in Fig. 2, read in the short text after thick cutting one by one in order successively, it is denoted as SiIf each sentence
Subsequence SiBy m word Wij(0 < j≤m) is formed, i.e. Si=<Wi1Wi2Wi3...Wim>;
Step 3: by the text S after thick cuttingiIt is segmented.As shown in Fig. 2, text is carried out word segmentation processing.
1) participle search length L, L that one is slightly less than most major term length in dictionary are set and is generally slightly smaller than in dictionary most major term
It is long;
2) in short text SiThe character W that middle sequence takes starting the first two adjacentijWi(j+1), it is W when initiali1Wi2, in dictionary
Middle lookup matching, if the two word W currently inputtedijWi(j+1)It is not the word in dictionary, then turns (3);Otherwise, turn (4);
If 3) the two word W currently inputtedijWi(j+1)It is not present in dictionary, then shows the first character in the first two word
It is a word, by WijFrom sentence SiMiddle cutting is gone out.Judge whether to SiSentence tail, if so, SiParticiple terminates;Otherwise j=j+1,
Turn again (2);
If 4) in the presence of the two word W currently inputtedijWi(j+1), then the length pointers for searching text are increased into a word backward,
That is WijWi(j+1)Afterwards plus a word, increase to three words, obtain Sk=WijWi(j+1)Wik(o < k≤L) continues the progress in dictionary
Match, judges that the word newly read in whether there is in dictionary.If in the presence of turning (5), otherwise, turning (6);
If 5) this three words Sk=WijWi(j+1)WikIn the presence of if this three words exists, continuing pointer toward Sk=
WijWi(j+1)WikIncrease a word afterwards, constitutes four words Sk+1=WijWi(j+1)...WikWi(k+1), search this four words Sk+1=
WijWi(j+1)...WikWi(k+1)With the presence or absence of in dictionary, if so, continuing gradually a word to be added again to judge backward, turn (7);If no
It is, then SkCutting is gone out, and word segmentation result is put into;
If 6) this three words Sk=WijWi(j+1)WikIt is not present, then shows the first two word WijWi(j+1)It is a word, it will
WijWi(j+1)From SiIn its cutting go out, then participle moves after searching pointer, makes pointer j=j+2, then two words next is taken to carry out
The lookup of a new round matches.If j≤m, show the also incomplete cutting of current short text, turns (2), if pointer j=m, short text
SiParticiple terminates;
7) the rest may be inferred, judges whether the mobile current word number k≤L read in later is true when mobile participle pointer every time,
If so, then continue in Sk+1=WijWi(j+1)...WikWi(k+1)Gradually a word is added to be judged afterwards;Otherwise from Wi(k+1)Place starts
It takes two word characters to carry out next round and searches matching.
Step 4: judge whether reading textual data i≤N is true, if so, show that current text does not segment also and terminate, then
It segments pointer and increases by one, i=i+1, read in next sentence and carry out searching matching and participle according to program above again, into
Row participle terminates until entirely inputting text participle;Otherwise, illustrate that entire text participle terminates.
Embodiment 2:As shown in Figs. 1-3, a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method,
The step of the method is:
Setting one is slightly less than the participle search length L that most major term is grown in dictionary;If character string to be slit is S=
s1s2s3s4...si.Subordinate clause head starts, and takes the first two character s1s2, judge s1s2Whether it is a word in dictionary, if it is not, then
Illustrate s1It is monosyllabic word, its cutting is gone out, then the length pointers for searching text are increased into a word backward, increases to third
Word takes the s in dictionary2s3Carry out the lookup matching of a new round;If s1s2It is the word in dictionary, then increases a word backward, judges
s1s2s3Whether at word, if s1s2s3It is not the word in dictionary, then shows s1s2It is a word, its cutting is gone out;If s1s2s3It is
A word in dictionary then continues to increase a word backward, searches s1s2s3s4Whether it is word in dictionary, if not word, then will
s1s2s3It goes out as a word segmentation, if the word in dictionary, then continues to increase a word backward matching again.The rest may be inferred,
Until entire sentence S=s1s2s3s4...siCutting finishes.
Embodiment 3:As shown in Figs. 1-3, a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method,
The step of the method is:
Step1, read in text to be slit, according to punctuate, number, western language, chart as separator by the text of input into
The thick cutting of row, is divided into short text one by one;Such as it is divided into a text " today, weather was particularly good ";
Step2, using the short text of thick cutting as further cutting object, set further participle search length L=7,
Wherein L takes the length grown less than most major term in dictionary, wherein most major term a length of 12;
Step3, two words " today " of starting for taking a short text after thick cutting search matching in dictionary;Through
It is present in dictionary with " today ", then the length pointers for searching text increase a word backward, increases to three word " today
It ", continuation is matched in dictionary;Matched " today day " is not present, then shows that " today " is a word, then " modern
It " cutting goes out, the result as a cutting;Then participle moves after searching pointer, takes two words " weather " next to carry out new
The lookup matching of one wheel;Matched " weather " exists, then the length pointers for searching text increase a word backward, increases to three
A word " weather is special ", continuation is matched in dictionary;Matched " weather is special " is not present, then shows that " weather " is a word,
So " weather " cutting is gone out, the result as a cutting;And so on, matched and searched is carried out, to be segmented, point
The result of word be/today/weather/especially// it is good/;The process specifically segmented is shown in Table 1;
Table 1 is positive gradually to add a word maximum to match participle process
Matching field | Matching is passed through | Matching result |
Today | Exist in dictionary | Today |
Weather | Exist in dictionary | Weather |
Especially | Exist in dictionary | Especially |
It is good | It is not present in dictionary | |
's | Monosyllabic word | 's |
It is good | Monosyllabic word | It is good |
In order to verify the advantageous effect of this method, with this method and traditional Forward Maximum Method segmenting method, it is reverse most
Big matching segmenting method (primary maximum matching character length is 4) is compared, traditional Forward Maximum Method segmenting method, inverse
To maximum match segmentation participle process as shown in table 2, table 3;
1) Forward Maximum Method segmenting method:
2 Forward Maximum Method of table segments process
Matching field | Matching is passed through | Matching result |
Today weather | It is not present in dictionary | |
Today day | It is not present in dictionary | |
Today | Exist in dictionary | Today |
Weather is special | It is not present in dictionary | |
Weather is special | It is not present in dictionary | |
Weather | Exist in dictionary | Weather |
It is special good | It is not present in dictionary | |
Particularly | It is not present in dictionary | |
Especially | Exist in dictionary | Especially |
It is good | It is not present in dictionary | |
's | Monosyllabic word | 's |
It is good | Monosyllabic word | It is good |
Forward Maximum Method the result is that:/ today/weather/especially// good/
2) reverse maximum match segmentation:Substring is taken to be matched from character string to be slit from right to left;
The reverse maximum matching participle process of table 3
Match word field | Matching is passed through | Matching result |
It is special good | It is not present in dictionary | |
It is other good | It is not present in dictionary | |
It is good | It is not present in dictionary | |
It is good | Monosyllabic word | It is good |
Gas is special | It is not present in dictionary | |
Particularly | It is not present in dictionary | |
It is other | It is not present in dictionary | |
's | Monosyllabic word | 's |
Weather is special | It is not present in dictionary | |
Gas is special | It is not present in dictionary | |
Today weather | It is not present in dictionary | |
Its weather | It is not present in dictionary | |
Weather | Exist in dictionary | Weather |
Today | Exist in dictionary | Today |
It is reverse maximum matched the result is that:/ today/weather/especially// good/
Although can be seen that final word segmentation result all from the participle process of above-mentioned three kinds of methods and be it is identical, correct,
But it can be clearly seen that the participle of traditional positive, reverse maximum matching process based on dictionary from the process of participle
Journey all occurs reading in the repeated matching step that word is not present, and wastes the time of participle, causes dictionary matching, ambiguity after participle
The workload of judgement.And forward direction proposed by the present invention gradually adds a word maximum matching process, almost each two-character phrase is obtained for
As soon as quick, the accurate participle of step participle, the whole efficiency segmented in this way are greatly improved, the conclusion of test simulation
This point is demonstrated, as shown in table 4 below.
The average cutting rate of 4 three kinds of segmenting methods of table compares
Segmenting method | Average cutting speed (word/s) |
Traditional Forward Maximum Method method | 52000 |
Traditional reverse maximum matching method | 103000 |
It is positive gradually to add a word matching method | 113000 |
Three kinds of methods are applied in the experimental enviroment of the present invention, with a complete dictionary for including 270,000 entries
As dictionary for word segmentation, in hardware using calculator memory 1G or more, software Windows7 uses JAVA development languages, My
Simulated experiment is carried out under the running environment of 8.5 developing instruments of Eclipse.Have chosen economy, science and technology, social news, military affairs four
Aspect size is the article of 0.02M or so, is segmented using three kinds of different segmentation methods, obtained result such as Fig. 3 institutes
Show, ordinate indicates that participle accuracy rate, abscissa indicate the field of participle, it can be seen that in these three segmenting methods, herein
The positive of proposition gradually adds a word matching process to be compared with traditional positive, reverse maximum match segmentation, and accuracy rate obtains
Raising is arrived.
The experiment conclusion table 4 of above example, Fig. 3 can be shown that a kind of forward direction based on dictionary of the present invention gradually adds one
Word maximum match segmentation is more traditional to be had based on the segmenting method of dictionary in terms of participle cutting speed, participle accuracy rate
Significant improvement.
The specific implementation mode of the present invention is explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (1)
1. a kind of forward direction based on dictionary is gradually plus a word maximum matches Chinese word cutting method, it is characterised in that:It is described word-based
The positive of allusion quotation gradually adds word maximum matching Chinese word cutting method to be as follows:
Step1, text to be slit is read in, is carried out the text of input slightly as separator according to punctuate, number, western language, chart
Cutting is divided into short text one by one;
Step2, using the short text of thick cutting as further cutting object, set further participle search length L, wherein L and take
The length grown less than most major term in dictionary;
Step3, two words of starting for taking a short text after thick cutting search matching in dictionary;
If there is no two words currently inputted, then it represents that first character is individual character, its cutting is gone out;Then after reading word pointer
It moves, two words next is taken to carry out the lookup matching of a new round;
If in the presence of two words currently inputted, word pointer will be read and increase a word backward, increase to three words, continue in dictionary
In matched;
If this three words is not present, shows that the first two word is a word, its cutting is gone out, the result as a cutting;
Then it is moved after reading word pointer, two words next is taken to carry out the lookup matching of a new round;
If this three words exists, continue to increase a word backward, constitute four words, searches this four words and whether there is in dictionary
In, and so on, matched and searched is carried out, to be segmented;
Step4, when find search length be L when, since the character late of L, again according in step Step3 with such
The method pushed away carries out searching matching and participle, until all short texts participle terminates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510522091.XA CN105138514B (en) | 2015-08-24 | 2015-08-24 | It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510522091.XA CN105138514B (en) | 2015-08-24 | 2015-08-24 | It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105138514A CN105138514A (en) | 2015-12-09 |
CN105138514B true CN105138514B (en) | 2018-11-09 |
Family
ID=54723865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510522091.XA Active CN105138514B (en) | 2015-08-24 | 2015-08-24 | It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105138514B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975454A (en) * | 2016-04-21 | 2016-09-28 | 广州精点计算机科技有限公司 | Chinese word segmentation method and device of webpage text |
CN106126496B (en) * | 2016-06-17 | 2019-01-18 | 联动优势科技有限公司 | A kind of information segmenting method and device |
CN106202040A (en) * | 2016-06-28 | 2016-12-07 | 邓力 | A kind of Chinese word cutting method of PDA translation system |
CN107092590A (en) * | 2017-03-17 | 2017-08-25 | 贵州恒昊软件科技有限公司 | A kind of sentence segmenting method and system |
CN108304367B (en) * | 2017-04-07 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Word segmentation method and device |
CN107357784B (en) * | 2017-07-05 | 2021-01-26 | 东南大学 | Intelligent analysis method for data model of relay protection device equipment |
CN109284763A (en) * | 2017-07-19 | 2019-01-29 | 阿里巴巴集团控股有限公司 | A kind of method and server generating participle training data |
CN107608968A (en) * | 2017-09-22 | 2018-01-19 | 深圳市易图资讯股份有限公司 | Chinese word cutting method, the device of text-oriented big data |
CN108052508B (en) * | 2017-12-29 | 2021-11-09 | 北京嘉和海森健康科技有限公司 | Information extraction method and device |
CN108363686A (en) * | 2018-01-12 | 2018-08-03 | 中国平安人寿保险股份有限公司 | A kind of character string segmenting method, device, terminal device and storage medium |
CN108197315A (en) * | 2018-02-01 | 2018-06-22 | 中控技术(西安)有限公司 | A kind of method and apparatus for establishing participle index database |
CN110688835B (en) * | 2019-09-03 | 2023-03-31 | 重庆邮电大学 | Word feature value-based law-specific field word discovery method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063424A (en) * | 2010-12-24 | 2011-05-18 | 上海电机学院 | Method for Chinese word segmentation |
CN102915299A (en) * | 2012-10-23 | 2013-02-06 | 海信集团有限公司 | Word segmentation method and device |
-
2015
- 2015-08-24 CN CN201510522091.XA patent/CN105138514B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063424A (en) * | 2010-12-24 | 2011-05-18 | 上海电机学院 | Method for Chinese word segmentation |
CN102915299A (en) * | 2012-10-23 | 2013-02-06 | 海信集团有限公司 | Word segmentation method and device |
Non-Patent Citations (1)
Title |
---|
正向最大匹配中文分词算法;佚名;《http://blog.csdn.net/yangyan19870319/article/details/6399871》;20110506;1-9 * |
Also Published As
Publication number | Publication date |
---|---|
CN105138514A (en) | 2015-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105138514B (en) | It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method | |
TWI662425B (en) | A method of automatically generating semantic similar sentence samples | |
JP2021096812A (en) | Method, apparatus, electronic device and storage medium for processing semantic representation model | |
CN108681574B (en) | Text abstract-based non-fact question-answer selection method and system | |
CN114065758B (en) | Document keyword extraction method based on hypergraph random walk | |
CN106126620A (en) | Method of Chinese Text Automatic Abstraction based on machine learning | |
WO2008107305A2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN106569993A (en) | Method and device for mining hypernym-hyponym relation between domain-specific terms | |
EP3483747A1 (en) | Preserving and processing ambiguity in natural language | |
CN107133212B (en) | Text implication recognition method based on integrated learning and word and sentence comprehensive information | |
CN112948543A (en) | Multi-language multi-document abstract extraction method based on weighted TextRank | |
CN111191464A (en) | Semantic similarity calculation method based on combined distance | |
CN113761890B (en) | Multi-level semantic information retrieval method based on BERT context awareness | |
CN106383814A (en) | Word segmentation method of English social media short text | |
CN113032541A (en) | Answer extraction method based on bert and fusion sentence cluster retrieval | |
Sarmah et al. | Development of Assamese rule based stemmer using WordNet | |
CN110413972A (en) | A kind of table name field name intelligence complementing method based on NLP technology | |
Sarhan et al. | Arabic relation extraction: A survey | |
CN111178009B (en) | Text multilingual recognition method based on feature word weighting | |
Zhang et al. | Domain-specific term extraction from free texts | |
CN108255818B (en) | Combined machine translation method using segmentation technology | |
Maheswari et al. | Rule based morphological variation removable stemming algorithm | |
Tukur et al. | Parts-of-speech tagging of Hausa-based texts using hidden Markov model | |
Sarkar et al. | Bengali noun phrase chunking based on conditional random fields | |
Wang | Research on cultural translation based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |