CN108664642A - Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm - Google Patents
Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm Download PDFInfo
- Publication number
- CN108664642A CN108664642A CN201810466451.2A CN201810466451A CN108664642A CN 108664642 A CN108664642 A CN 108664642A CN 201810466451 A CN201810466451 A CN 201810466451A CN 108664642 A CN108664642 A CN 108664642A
- Authority
- CN
- China
- Prior art keywords
- frequent
- candidate
- item
- support
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm, including:Step 1, transaction database is inputted;Step 2, the set L1 of frequent 1 item collection is calculated;Step 3, the set L6 of frequent 6 item collection is generated;Step 4:Correlation rule is obtained based on frequent 6 item collection.
Description
Technical field
The present invention relates to the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm.
Background technology
Data mining (bibliography:The Fujian research [J] the computer of Jiang sea elder brother's data mining processes, 2007.3:67-74)
It is from a large amount of extracting data or " excavation " knowledge.Specifically, data mining be exactly from it is a large amount of, random, fuzzy,
In incomplete, noisy data, extraction lies in therein, potentially useful, road unknown by the people in advance knowledge and letter
Process (the bibliography of breath:ZhaoHui Tang. data minings principles and publishing house of application [M] Tsinghua University, 2007.).Word
Property mark be natural language processing an important link, task be in sentence each word mark a correct word
Property, the mistake that this link occurs will be amplified that (Maihemuti buys in the processing such as subsequent syntactic analysis, machine translation
It proposes Uighur part-of-speech tagging researchs of the based on statistics and realizes [D] Xinjiang Universitys, 2009.).Part-of-speech tagging is so far
There are many methods, has there is the method (bibliography that rule-based, statistics and rule are combined with statistics:Liu S, Chen
L et al.Automatic part-of-speech tagging for Chinese corpus.Computer
Progressing of Chinese and Oriental Languages, 1955.9 (1):31-47).
The acquisition of rule is generally integrated by manual sorting, but this has following both sides (bibliography:Li Xiao
Multitude, Shi Zhong plant data mining methods and obtain regular [J] the Journal of Computer Research and Development of Part of Speech Tagging, 2000.37 (2):
1409-1414):1. from the application range of rule, method manually is only possible to generate some general character rule, it is impossible to produce
The raw persona rules for individual cases, and persona rules are although application range is small, are also the important means for improving accuracy;
2. since the regular accuracy rate that manual method obtains is still to be tested, before being not easy to improve again based on statistical method accuracy
It puts, can automatically and efficiently obtain rule be the critical issue realized in part-of-speech tagging.
Invention content
The present invention in view of the deficiencies of the prior art, discloses the Rules for Part of Speech Tagging based on Apriori algorithm and obtains automatically
Method includes the following steps:
Step 1, transaction database is inputted, and it is I={ A that the project set that transaction database includes, which is arranged,1, A2, A3, A4,
A5, A6, A1~A6Indicate 6 subsets, i.e., one shares 6 projects, wherein A in expression transaction database1Indicate the collection of previous word
It closes, A2Indicate previous word part of speech set, A3Indicate the set of current word, A4Indicate current word part of speech set, A5Indicate latter word
Set, A6Indicate latter word part of speech set;
Step 2, the set L1 of frequent 1- item collections is calculated in scan item set I;Data in L1 indicate each word,
The number that part of speech occurs.
Step 3, candidate frequent item set set C2 is generated by L1 connections, beta pruning, each Candidate Set in C2 is counted, will be less than
The Candidate Set of minimum support abandons, to generate the set L2 of frequent 2- item collections;Data in L2 indicate that word, part of speech connect two-by-two
The number occurred after connecing;
Candidate frequent item set set C3 is generated by L2 connections, beta pruning, each Candidate Set in C3 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L3 of frequent 3- item collections;
Candidate frequent item set set C4 is generated by L3 connections, beta pruning, each Candidate Set in C4 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L4 of frequent 4- item collections;
Candidate frequent item set set C5 is generated by L4 connections, beta pruning, each Candidate Set in C5 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L5 of frequent 5- item collections;
Candidate frequent item set set C6 is generated by L5 connections, beta pruning, each Candidate Set in C6 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L6 of frequent 6- item collections;(in step 3, it is described connection, beta pruning method be existing
There are technology, bibliography:Liu S, Chen L et al.Automatic part-of-speech tagging for
Chinese corpus.Computer progressing of Chinese and Oriental Languages, 1955.9
(1): 31-47).
Step 4:Correlation rule is obtained based on frequent 6- item collections.
Step 2 includes:
Use NiIndicate i-th of project A in project set IiThe number of appearance, i values are 1~6, are calculated according to following formula
Obtain i-th of project support degree sup (Ai):
sup(Ai)=Ni/ | D |,
Wherein | D | the number of transactions that transaction database includes is indicated, by projects support and set minimum support
Min_support (being traditionally arranged to be 10) is compared, and deletes the program member that support is less than minimum support, obtains frequency
The set L of numerous 1- item collections1。
Step 4 includes:For each frequent item set Lx, x values are 1~6, find out wherein all nonvoid subsets, are counted
The confidence level of each nonvoid subset a is calculated, if frequent item set LxSupport sup (Lx) with the support sup of nonvoid subset a
(a) ratio be more than Minimum support4 (Minimum support4 oneself is configured according to demand by user, for example is set as 0.8),
Then there is correlation rule a==>(Lx- a), correlation rule is otherwise not present, correlation rule is Rules for Part of Speech Tagging.
Using createTransRule () function creation correlation rule, using createL1 (), createL2 (),
Six createL3 (), createL4 (), createL5 (), createL6 () function creation Frequent Sets, six function difference
Corresponding set L1, L2, L3, L4, L5 and L6, use getMinusCollect (String [] a, String [] Lx) function seeks a
With LxDifference set.
X=>Y, meaning is the appearance of X also leads to the appearance of Y simultaneously.For correlation rule X=>Y, the table of support
Existing form is sup (X=>Y)=sup (X ∪ Y) includes the transaction amount of X, Y simultaneously that is, in transaction set in All Activity sum
Shared ratio;Confidence level conf (X=>Y the form of expression) is conf (X=>Y)=sup (X ∪ Y)/sup (X), i.e., simultaneously
Include the ratio of the transaction amount of X, Y and the transaction amount only comprising X.Wherein support is one kind to correlation rule importance
Indicate, and confidence level can be described as confidence level, be a kind of expression to correlation rule accuracy, value range 0 to 1 it
Between.
The present invention need not carry out dimension and step analysis for the acquisition of Rules for Part of Speech Tagging, also need not use point and
Control method, but use most basic Apriori algorithm (Agrawal et al. first proposed in 1993 excavation care for
Correlation rule problem in objective transaction data base between item collection devises the Apriori algorithm based on Frequent Set theory (with reference to text
It offers:Yang Guang Studies on Algorithms of Association [D] Dalian University Of Communications, 2005.).Apriori algorithm is that one kind most has an impact
The algorithm of the Mining Boolean Association Rules frequent item set of power.Its core is that the recursion based on two stage frequent item set thought is calculated
The design of method, the algorithm is decomposed into two sub-problems:1. finding the item collection that all supports are more than minimum support
(itemset), these item collections are known as Frequent Set (frequent itemset);2. according to minimum confidence level and finding frequent
Item collection generates correlation rule.), shadow of the mode sequences to part of speech of part of speech and word is studied from the language material manually marked
It rings.This method and people are consistent using information such as word, parts of speech in language material context come the method judged part of speech.
In the case where statistics language material is larger, after giving minimum support and Minimum support4, excavates be more than minimum support first
The common pattern collection of degree, then produces correlation rule, if the confidence level of this rule is more than Minimum support4, obtains part of speech rule
Then.If Minimum support4 defines sufficiently high, the rule obtained can be as the supplement of probabilistic method, to preferably solve
Certainly part-of-speech tagging problem.
Advantageous effect:The present invention need not carry out dimension and step analysis for the acquisition of Rules for Part of Speech Tagging, also be not required to
The method divided and rule, experiment is used to show that the mark rule obtained automatically has good utility value, word can be improved
Property mark accuracy.
Description of the drawings
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or
Otherwise advantage will become apparent.
Fig. 1 is flow chart of the present invention.
Specific implementation mode
The present invention will be further described with reference to the accompanying drawings and embodiments.
As shown in Figure 1, the invention discloses the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm, including
Following steps:
Step 1, transaction database is inputted, and it is I={ A that the project set that transaction database includes, which is arranged,1, A2, A3, A4,
A5, A6, A1~A6Indicate 6 subsets, i.e., one shares 6 projects, wherein A in expression transaction database1Indicate the collection of previous word
It closes, A2Indicate previous word part of speech set, A3Indicate the set of current word, A4Indicate current word part of speech set, A5Indicate latter word
Set, A6Indicate latter word part of speech set;
Step 2, the set L1 of frequent 1- item collections is calculated in scan item set I;Data in L1 indicate each word,
The number that part of speech occurs.
Step 3, candidate frequent item set set C2 is generated by L1 connections, beta pruning, each Candidate Set in C2 is counted, will be less than
The Candidate Set of minimum support abandons, to generate the set L2 of frequent 2- item collections;Data in L2 indicate that word, part of speech connect two-by-two
The number occurred after connecing;
Candidate frequent item set set C3 is generated by L2 connections, beta pruning, each Candidate Set in C3 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L3 of frequent 3- item collections;
Candidate frequent item set set C4 is generated by L3 connections, beta pruning, each Candidate Set in C4 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L4 of frequent 4- item collections;
Candidate frequent item set set C5 is generated by L4 connections, beta pruning, each Candidate Set in C5 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L5 of frequent 5- item collections;
Candidate frequent item set set C6 is generated by L5 connections, beta pruning, each Candidate Set in C6 is counted, most ramuscule will be less than
The Candidate Set for degree of holding abandons, to generate the set L6 of frequent 6- item collections;
Step 4:Correlation rule is obtained based on frequent 6- item collections.
Step 2 includes:
Use NiIndicate i-th of project A in project set IiThe number of appearance, i values are 1~6, are calculated according to following formula
Obtain i-th of project support degree sup (Ai):
sup(Ai)=Ni/ | D |,
Wherein | D | the number of transactions that transaction database includes is indicated, by projects support and set minimum support
Min_support (being traditionally arranged to be 10) is compared, and deletes the program member that support is less than minimum support, obtains frequency
The set L of numerous 1- item collections1。
Step 4 includes:For each frequent item set Lx, x values are 1~6, find out wherein all nonvoid subsets, are counted
The confidence level of each nonvoid subset a is calculated, if frequent item set LxSupport sup (Lx) with the support sup of nonvoid subset a
(a) ratio be more than Minimum support4 (Minimum support4 oneself is configured according to demand by user, for example is set as 0.8),
Then there is correlation rule a==>(Lx- a), association is otherwise not present, correlation rule is Rules for Part of Speech Tagging.
Embodiment
Design following model program framework:
(1) Main functions are responsible for the overall operation of program, as caller initialization, Item Sets calculate, correlation rule is calculated
The output operation etc. of method, relevant information.
(2) Apriori () constructed fuction is for creating graphic user interface.
(3) print () function is used to return to the relevant information for needing to export.
(4) createTransRule () function is for creating correlation rule.
⑸createL1()、createL2()、createL3()、createL4()、createL5()、createL6
() six functions are for creating Frequent Set.
(6) removeNotSupportKey () function is used to delete the key that key assignments is less than minimum support.
FindKey (Set keyset, String a, String b, String c, String d, String e,
String f) function be used for it is strong integrate key value is searched in keyset as a, b, c, d, e, f's is good for.
Contain (Set keyset, String a, String b, String c, String d, String e,
String f) function is for judging integrate whether contained key value in keyset as a strong, b, c, d, e, f's is strong.
(9) getMinusCollect (String [] a, String [] L) function is used to ask the difference set of a and L.
(10) getSubSet (String setN []) function is used to obtain the subset of setN.
Language material uses《Xinjiang daily paper》Language version is tieed up, subject matter is related to politics, economy, sport, health, culture, art, amusement
Deng.Stem cutting, affixe extraction and part part-of-speech tagging is completed in the language material at present.
According to the Apriori methods in data mining, each long patterns are excavated respectively, and final pattern is set
Minimum support and confidence level are set, the rule of part-of-speech tagging is therefrom excavated.Word, part of speech are can be seen that from the rule excavated
And influence of the combination of word and part of speech to current word part of speech.
In the present embodiment, part of speech label sets Tags=Tagi | i=1,2 ..., m }, word set Dwords=Wordi | i=
1,2 ..., n }, item collection I=DwordsUTags, wherein Wordi, Tagi are respectively i-th of word part of speech label corresponding with its.
Marked text T=(Wordi, Tagi) | and Wordi ∈ Dwords, Tagi ∈ Tags }, Tagi is word Wordi
The corresponding part of speech label in the retrtieval.Partial-length pattern is illustrated below:
Pattern one:Indicate that the occurrence number of single word or part of speech, wherein occurrence number front three are:N, v, adj.Due to
Contextual information is not utilized in one pattern, because without composition rule.
Pattern two:Indicate the influence of previous word or previous part of speech to current part of speech.
The mark rule of acquisition is:If (wordi, adv) is then (word2, n), adv indicates that adverbial word, n indicate noun, this
If illustrating, previous word part of speech is adverbial word, and the part of speech of latter word is noun.
Pattern three:Influence of the combination of preceding two word of expression or part of speech to the part of speech of current word.
The rule of acquisition is:If (part of speech 1, v) then (word 2, " ") then (word 3, n), v indicate verb.
Pattern six:Indicate { " previous word ", " part of speech of previous word ", " current word ", " current word part of speech ", " the latter
Word ", " part of speech of the latter word " } go out new number.
By should be apparent that restriction effect of the word in pattern to the comparison of different length pattern.
From experimental data it can be seen that:The combination of each pattern with modal length the absolute number for being continuously increased a combination thereof
Amount is also continuously increased.Due to being reduced by the support of more context sensitivity, pattern, confidence level increases, and part of speech energy
It is enough to be also increased by the possibility of only determination.
More, institute is wanted since the number that word and its corresponding part of speech occur is far from the number that a part of speech individually occurs
With the situation that the part of speech in word contextual information does restriction correspondence is more, more complicated, is unfavorable for disappearing to conversion of parts of speech part of speech
Discrimination, and influence bigger of the word as a pair of part of speech of the factor of context, i.e., it is more accurate to the limitation of part of speech.In general,
Influence of the word to part of speech is some larger in pattern, therefore the support of the pattern containing word wants smaller.
In order to carry out experiment comparison, the present embodiment is first labeled above-mentioned language material with the method for maximum entropy, is accurately
92.01%.According to the mark of acquisition rule, on the basis of maximum entropy model marks, annotation results are optimized, accurately
It is 93.13%, better than the result marked with the maximum entropy method based on statistics merely.
The present invention provides the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm, implement the technology
There are many method and approach of scheme, the above is only a preferred embodiment of the present invention, it is noted that for the art
Those of ordinary skill for, various improvements and modifications may be made without departing from the principle of the present invention, these change
Protection scope of the present invention is also should be regarded as into retouching.The available prior art of each component part being not known in the present embodiment adds
To realize.
Claims (5)
1. the Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm, which is characterized in that include the following steps:
Step 1, transaction database is inputted, and it is I={ A that the project set that transaction database includes, which is arranged,1, A2, A3, A4, A5,
A6, A1~A6Indicate 6 subsets, i.e., one shares 6 projects, wherein A in expression transaction database1Indicate the set of previous word, A2
Indicate previous word part of speech set, A3Indicate the set of current word, A4Indicate current word part of speech set, A5Indicate the set of latter word,
A6Indicate latter word part of speech set;
Step 2, the set L1 of frequent 1- item collections is calculated in scan item set I;
Step 3, the set L6 of frequent 6- item collections is generated;
Step 4:Correlation rule is obtained based on frequent 6- item collections.
2. according to the method described in claim 1, it is characterized in that, step 2 includes:
Use NiIndicate i-th of project A in project set IiThe number of appearance, i values are 1~6, are calculated according to following formula
I-th of project support degree sup (Ai):
sup(Ai)=Ni/ | D |,
Wherein | D | the number of transactions that transaction database includes is indicated, by projects support and set minimum support min_
Support is compared, and deletes the program member that support is less than minimum support, obtains the set L of frequent 1- item collections1。
3. according to the method described in claim 2, it is characterized in that, step 3 includes:It is generated by L1 connections, beta pruning candidate frequent
Item collection set C2 counts each Candidate Set in C2, the Candidate Set less than minimum support is abandoned, 2- frequent to generate
The set L2 of collection;
Candidate frequent item set set C3 is generated by L2 connections, beta pruning, each Candidate Set in C3 is counted, minimum support will be less than
Candidate Set abandon, to generate the set L3 of frequent 3- item collections;
Candidate frequent item set set C4 is generated by L3 connections, beta pruning, each Candidate Set in C4 is counted, minimum support will be less than
Candidate Set abandon, to generate the set L4 of frequent 4- item collections;
Candidate frequent item set set C5 is generated by L4 connections, beta pruning, each Candidate Set in C5 is counted, minimum support will be less than
Candidate Set abandon, to generate the set L5 of frequent 5- item collections;
Candidate frequent item set set C6 is generated by L5 connections, beta pruning, each Candidate Set in C6 is counted, minimum support will be less than
Candidate Set abandon, to generate the set L6 of frequent 6- item collections.
4. according to the method described in claim 3, it is characterized in that, step 4 includes:For each frequent item set Lx, x values
It is 1~6, finds out wherein all nonvoid subsets, if frequent item set LxSupport sup (Lx) with the support of nonvoid subset a
The ratio of sup (a) is more than Minimum support4, then there is correlation rule a==>(Lx- a), correlation rule, association is otherwise not present
Rule is Rules for Part of Speech Tagging.
5. according to the method described in claim 4, being advised it is characterized in that, being associated with using createTransRule () function creation
Then, using createL1 (), createL2 (), createL3 (), createL4 (), createL5 (), createL6 () six
A function creation Frequent Set, six functions correspond to set L1, L2, L3, L4, L5 and L6, use getMinusCollect respectively
(String [] a, String [] Lx) function asks a and LxDifference set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810466451.2A CN108664642A (en) | 2018-05-16 | 2018-05-16 | Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810466451.2A CN108664642A (en) | 2018-05-16 | 2018-05-16 | Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108664642A true CN108664642A (en) | 2018-10-16 |
Family
ID=63779752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810466451.2A Pending CN108664642A (en) | 2018-05-16 | 2018-05-16 | Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108664642A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109739953A (en) * | 2018-12-30 | 2019-05-10 | 广西财经学院 | The text searching method extended based on chi-square analysis-Confidence Framework and consequent |
CN109767617A (en) * | 2018-12-20 | 2019-05-17 | 北京航空航天大学 | A kind of public security traffic control service exception data analysis method based on Apriori |
CN110619073A (en) * | 2019-08-30 | 2019-12-27 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN111309777A (en) * | 2020-01-14 | 2020-06-19 | 哈尔滨工业大学 | Report data mining method for improving association rule based on mutual exclusion expression |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320756A (en) * | 2015-10-15 | 2016-02-10 | 江苏省邮电规划设计院有限责任公司 | Improved Apriori algorithm based method for mining database association rule |
CN105719155A (en) * | 2015-09-14 | 2016-06-29 | 南京理工大学 | Association rule algorithm based on Apriori improved algorithm |
CN106407296A (en) * | 2016-08-30 | 2017-02-15 | 江苏省邮电规划设计院有限责任公司 | Local scan association rule computer data analysis method based on pre-judging screening |
-
2018
- 2018-05-16 CN CN201810466451.2A patent/CN108664642A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105719155A (en) * | 2015-09-14 | 2016-06-29 | 南京理工大学 | Association rule algorithm based on Apriori improved algorithm |
CN105320756A (en) * | 2015-10-15 | 2016-02-10 | 江苏省邮电规划设计院有限责任公司 | Improved Apriori algorithm based method for mining database association rule |
CN106407296A (en) * | 2016-08-30 | 2017-02-15 | 江苏省邮电规划设计院有限责任公司 | Local scan association rule computer data analysis method based on pre-judging screening |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767617A (en) * | 2018-12-20 | 2019-05-17 | 北京航空航天大学 | A kind of public security traffic control service exception data analysis method based on Apriori |
CN109739953A (en) * | 2018-12-30 | 2019-05-10 | 广西财经学院 | The text searching method extended based on chi-square analysis-Confidence Framework and consequent |
CN109739953B (en) * | 2018-12-30 | 2021-07-20 | 广西财经学院 | Text retrieval method based on chi-square analysis-confidence framework and back-part expansion |
CN110619073A (en) * | 2019-08-30 | 2019-12-27 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN110619073B (en) * | 2019-08-30 | 2022-04-22 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN111309777A (en) * | 2020-01-14 | 2020-06-19 | 哈尔滨工业大学 | Report data mining method for improving association rule based on mutual exclusion expression |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN107220295B (en) | Searching and mediating strategy recommendation method for human-human contradiction mediating case | |
CN108664642A (en) | Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm | |
Mishra et al. | MAULIK: an effective stemmer for Hindi language | |
CN106776562A (en) | A kind of keyword extracting method and extraction system | |
CN107679036A (en) | A kind of wrong word monitoring method and system | |
CN110297880B (en) | Corpus product recommendation method, apparatus, device and storage medium | |
CN103106189B (en) | A kind of method and apparatus excavating synonym attribute word | |
CN105975475A (en) | Chinese phrase string-based fine-grained thematic information extraction method | |
CN103970730A (en) | Method for extracting multiple subject terms from single Chinese text | |
CN109145260A (en) | A kind of text information extraction method | |
Béchet et al. | Discovering linguistic patterns using sequence mining | |
CN109947951A (en) | A kind of automatically updated emotion dictionary construction method for financial text analyzing | |
Sadr et al. | Unified topic-based semantic models: a study in computing the semantic relatedness of geographic terms | |
CN103678656A (en) | Unsupervised automatic extraction method of microblog new words based on repeated word strings | |
CN108363691A (en) | A kind of field term identifying system and method for 95598 work order of electric power | |
CN101957812A (en) | Verb semantic information extracting method based on event ontology | |
CN106503256B (en) | A kind of hot information method for digging based on social networks document | |
CN106776555A (en) | A kind of comment text entity recognition method and device based on word model | |
CN109299248A (en) | A kind of business intelligence collection method based on natural language processing | |
CN108491399A (en) | Chinese to English machine translation method based on context iterative analysis | |
CN104008301A (en) | Automatic construction method for hierarchical structure of domain concepts | |
Ayşe et al. | Extraction of semantic word relations in Turkish from dictionary definitions | |
Mohnot et al. | Hybrid approach for Part of Speech Tagger for Hindi language | |
CN107608959A (en) | A kind of English social media short text place name identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181016 |
|
RJ01 | Rejection of invention patent application after publication |