CN108664642A - Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm - Google Patents

Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm Download PDF

Info

Publication number
CN108664642A
CN108664642A CN201810466451.2A CN201810466451A CN108664642A CN 108664642 A CN108664642 A CN 108664642A CN 201810466451 A CN201810466451 A CN 201810466451A CN 108664642 A CN108664642 A CN 108664642A
Authority
CN
China
Prior art keywords
frequent
candidate
item
support
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810466451.2A
Other languages
Chinese (zh)
Inventor
丁福冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jurong Ma Run Seedlings Co Ltd
Original Assignee
Jurong Ma Run Seedlings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jurong Ma Run Seedlings Co Ltd filed Critical Jurong Ma Run Seedlings Co Ltd
Priority to CN201810466451.2A priority Critical patent/CN108664642A/en
Publication of CN108664642A publication Critical patent/CN108664642A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm, including:Step 1, transaction database is inputted;Step 2, the set L1 of frequent 1 item collection is calculated;Step 3, the set L6 of frequent 6 item collection is generated;Step 4:Correlation rule is obtained based on frequent 6 item collection.

Description

Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm
Technical field
The present invention relates to the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm.
Background technology
Data mining (bibliography:The Fujian research [J] the computer of Jiang sea elder brother's data mining processes, 2007.3:67-74) It is from a large amount of extracting data or " excavation " knowledge.Specifically, data mining be exactly from it is a large amount of, random, fuzzy, In incomplete, noisy data, extraction lies in therein, potentially useful, road unknown by the people in advance knowledge and letter Process (the bibliography of breath:ZhaoHui Tang. data minings principles and publishing house of application [M] Tsinghua University, 2007.).Word Property mark be natural language processing an important link, task be in sentence each word mark a correct word Property, the mistake that this link occurs will be amplified that (Maihemuti buys in the processing such as subsequent syntactic analysis, machine translation It proposes Uighur part-of-speech tagging researchs of the based on statistics and realizes [D] Xinjiang Universitys, 2009.).Part-of-speech tagging is so far There are many methods, has there is the method (bibliography that rule-based, statistics and rule are combined with statistics:Liu S, Chen L et al.Automatic part-of-speech tagging for Chinese corpus.Computer Progressing of Chinese and Oriental Languages, 1955.9 (1):31-47).
The acquisition of rule is generally integrated by manual sorting, but this has following both sides (bibliography:Li Xiao Multitude, Shi Zhong plant data mining methods and obtain regular [J] the Journal of Computer Research and Development of Part of Speech Tagging, 2000.37 (2): 1409-1414):1. from the application range of rule, method manually is only possible to generate some general character rule, it is impossible to produce The raw persona rules for individual cases, and persona rules are although application range is small, are also the important means for improving accuracy; 2. since the regular accuracy rate that manual method obtains is still to be tested, before being not easy to improve again based on statistical method accuracy It puts, can automatically and efficiently obtain rule be the critical issue realized in part-of-speech tagging.
Invention content
The present invention in view of the deficiencies of the prior art, discloses the Rules for Part of Speech Tagging based on Apriori algorithm and obtains automatically Method includes the following steps:
Step 1, transaction database is inputted, and it is I={ A that the project set that transaction database includes, which is arranged,1, A2, A3, A4, A5, A6, A1~A6Indicate 6 subsets, i.e., one shares 6 projects, wherein A in expression transaction database1Indicate the collection of previous word It closes, A2Indicate previous word part of speech set, A3Indicate the set of current word, A4Indicate current word part of speech set, A5Indicate latter word Set, A6Indicate latter word part of speech set;
Step 2, the set L1 of frequent 1- item collections is calculated in scan item set I;Data in L1 indicate each word, The number that part of speech occurs.
Step 3, candidate frequent item set set C2 is generated by L1 connections, beta pruning, each Candidate Set in C2 is counted, will be less than The Candidate Set of minimum support abandons, to generate the set L2 of frequent 2- item collections;Data in L2 indicate that word, part of speech connect two-by-two The number occurred after connecing;
Candidate frequent item set set C3 is generated by L2 connections, beta pruning, each Candidate Set in C3 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L3 of frequent 3- item collections;
Candidate frequent item set set C4 is generated by L3 connections, beta pruning, each Candidate Set in C4 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L4 of frequent 4- item collections;
Candidate frequent item set set C5 is generated by L4 connections, beta pruning, each Candidate Set in C5 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L5 of frequent 5- item collections;
Candidate frequent item set set C6 is generated by L5 connections, beta pruning, each Candidate Set in C6 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L6 of frequent 6- item collections;(in step 3, it is described connection, beta pruning method be existing There are technology, bibliography:Liu S, Chen L et al.Automatic part-of-speech tagging for Chinese corpus.Computer progressing of Chinese and Oriental Languages, 1955.9 (1): 31-47).
Step 4:Correlation rule is obtained based on frequent 6- item collections.
Step 2 includes:
Use NiIndicate i-th of project A in project set IiThe number of appearance, i values are 1~6, are calculated according to following formula Obtain i-th of project support degree sup (Ai):
sup(Ai)=Ni/ | D |,
Wherein | D | the number of transactions that transaction database includes is indicated, by projects support and set minimum support Min_support (being traditionally arranged to be 10) is compared, and deletes the program member that support is less than minimum support, obtains frequency The set L of numerous 1- item collections1
Step 4 includes:For each frequent item set Lx, x values are 1~6, find out wherein all nonvoid subsets, are counted The confidence level of each nonvoid subset a is calculated, if frequent item set LxSupport sup (Lx) with the support sup of nonvoid subset a (a) ratio be more than Minimum support4 (Minimum support4 oneself is configured according to demand by user, for example is set as 0.8), Then there is correlation rule a==>(Lx- a), correlation rule is otherwise not present, correlation rule is Rules for Part of Speech Tagging.
Using createTransRule () function creation correlation rule, using createL1 (), createL2 (), Six createL3 (), createL4 (), createL5 (), createL6 () function creation Frequent Sets, six function difference Corresponding set L1, L2, L3, L4, L5 and L6, use getMinusCollect (String [] a, String [] Lx) function seeks a With LxDifference set.
X=>Y, meaning is the appearance of X also leads to the appearance of Y simultaneously.For correlation rule X=>Y, the table of support Existing form is sup (X=>Y)=sup (X ∪ Y) includes the transaction amount of X, Y simultaneously that is, in transaction set in All Activity sum Shared ratio;Confidence level conf (X=>Y the form of expression) is conf (X=>Y)=sup (X ∪ Y)/sup (X), i.e., simultaneously Include the ratio of the transaction amount of X, Y and the transaction amount only comprising X.Wherein support is one kind to correlation rule importance Indicate, and confidence level can be described as confidence level, be a kind of expression to correlation rule accuracy, value range 0 to 1 it Between.
The present invention need not carry out dimension and step analysis for the acquisition of Rules for Part of Speech Tagging, also need not use point and Control method, but use most basic Apriori algorithm (Agrawal et al. first proposed in 1993 excavation care for Correlation rule problem in objective transaction data base between item collection devises the Apriori algorithm based on Frequent Set theory (with reference to text It offers:Yang Guang Studies on Algorithms of Association [D] Dalian University Of Communications, 2005.).Apriori algorithm is that one kind most has an impact The algorithm of the Mining Boolean Association Rules frequent item set of power.Its core is that the recursion based on two stage frequent item set thought is calculated The design of method, the algorithm is decomposed into two sub-problems:1. finding the item collection that all supports are more than minimum support (itemset), these item collections are known as Frequent Set (frequent itemset);2. according to minimum confidence level and finding frequent Item collection generates correlation rule.), shadow of the mode sequences to part of speech of part of speech and word is studied from the language material manually marked It rings.This method and people are consistent using information such as word, parts of speech in language material context come the method judged part of speech. In the case where statistics language material is larger, after giving minimum support and Minimum support4, excavates be more than minimum support first The common pattern collection of degree, then produces correlation rule, if the confidence level of this rule is more than Minimum support4, obtains part of speech rule Then.If Minimum support4 defines sufficiently high, the rule obtained can be as the supplement of probabilistic method, to preferably solve Certainly part-of-speech tagging problem.
Advantageous effect:The present invention need not carry out dimension and step analysis for the acquisition of Rules for Part of Speech Tagging, also be not required to The method divided and rule, experiment is used to show that the mark rule obtained automatically has good utility value, word can be improved Property mark accuracy.
Description of the drawings
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.
Fig. 1 is flow chart of the present invention.
Specific implementation mode
The present invention will be further described with reference to the accompanying drawings and embodiments.
As shown in Figure 1, the invention discloses the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm, including Following steps:
Step 1, transaction database is inputted, and it is I={ A that the project set that transaction database includes, which is arranged,1, A2, A3, A4, A5, A6, A1~A6Indicate 6 subsets, i.e., one shares 6 projects, wherein A in expression transaction database1Indicate the collection of previous word It closes, A2Indicate previous word part of speech set, A3Indicate the set of current word, A4Indicate current word part of speech set, A5Indicate latter word Set, A6Indicate latter word part of speech set;
Step 2, the set L1 of frequent 1- item collections is calculated in scan item set I;Data in L1 indicate each word, The number that part of speech occurs.
Step 3, candidate frequent item set set C2 is generated by L1 connections, beta pruning, each Candidate Set in C2 is counted, will be less than The Candidate Set of minimum support abandons, to generate the set L2 of frequent 2- item collections;Data in L2 indicate that word, part of speech connect two-by-two The number occurred after connecing;
Candidate frequent item set set C3 is generated by L2 connections, beta pruning, each Candidate Set in C3 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L3 of frequent 3- item collections;
Candidate frequent item set set C4 is generated by L3 connections, beta pruning, each Candidate Set in C4 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L4 of frequent 4- item collections;
Candidate frequent item set set C5 is generated by L4 connections, beta pruning, each Candidate Set in C5 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L5 of frequent 5- item collections;
Candidate frequent item set set C6 is generated by L5 connections, beta pruning, each Candidate Set in C6 is counted, most ramuscule will be less than The Candidate Set for degree of holding abandons, to generate the set L6 of frequent 6- item collections;
Step 4:Correlation rule is obtained based on frequent 6- item collections.
Step 2 includes:
Use NiIndicate i-th of project A in project set IiThe number of appearance, i values are 1~6, are calculated according to following formula Obtain i-th of project support degree sup (Ai):
sup(Ai)=Ni/ | D |,
Wherein | D | the number of transactions that transaction database includes is indicated, by projects support and set minimum support Min_support (being traditionally arranged to be 10) is compared, and deletes the program member that support is less than minimum support, obtains frequency The set L of numerous 1- item collections1
Step 4 includes:For each frequent item set Lx, x values are 1~6, find out wherein all nonvoid subsets, are counted The confidence level of each nonvoid subset a is calculated, if frequent item set LxSupport sup (Lx) with the support sup of nonvoid subset a (a) ratio be more than Minimum support4 (Minimum support4 oneself is configured according to demand by user, for example is set as 0.8), Then there is correlation rule a==>(Lx- a), association is otherwise not present, correlation rule is Rules for Part of Speech Tagging.
Embodiment
Design following model program framework:
(1) Main functions are responsible for the overall operation of program, as caller initialization, Item Sets calculate, correlation rule is calculated The output operation etc. of method, relevant information.
(2) Apriori () constructed fuction is for creating graphic user interface.
(3) print () function is used to return to the relevant information for needing to export.
(4) createTransRule () function is for creating correlation rule.
⑸createL1()、createL2()、createL3()、createL4()、createL5()、createL6 () six functions are for creating Frequent Set.
(6) removeNotSupportKey () function is used to delete the key that key assignments is less than minimum support.
FindKey (Set keyset, String a, String b, String c, String d, String e, String f) function be used for it is strong integrate key value is searched in keyset as a, b, c, d, e, f's is good for.
Contain (Set keyset, String a, String b, String c, String d, String e, String f) function is for judging integrate whether contained key value in keyset as a strong, b, c, d, e, f's is strong.
(9) getMinusCollect (String [] a, String [] L) function is used to ask the difference set of a and L.
(10) getSubSet (String setN []) function is used to obtain the subset of setN.
Language material uses《Xinjiang daily paper》Language version is tieed up, subject matter is related to politics, economy, sport, health, culture, art, amusement Deng.Stem cutting, affixe extraction and part part-of-speech tagging is completed in the language material at present.
According to the Apriori methods in data mining, each long patterns are excavated respectively, and final pattern is set Minimum support and confidence level are set, the rule of part-of-speech tagging is therefrom excavated.Word, part of speech are can be seen that from the rule excavated And influence of the combination of word and part of speech to current word part of speech.
In the present embodiment, part of speech label sets Tags=Tagi | i=1,2 ..., m }, word set Dwords=Wordi | i= 1,2 ..., n }, item collection I=DwordsUTags, wherein Wordi, Tagi are respectively i-th of word part of speech label corresponding with its.
Marked text T=(Wordi, Tagi) | and Wordi ∈ Dwords, Tagi ∈ Tags }, Tagi is word Wordi The corresponding part of speech label in the retrtieval.Partial-length pattern is illustrated below:
Pattern one:Indicate that the occurrence number of single word or part of speech, wherein occurrence number front three are:N, v, adj.Due to Contextual information is not utilized in one pattern, because without composition rule.
Pattern two:Indicate the influence of previous word or previous part of speech to current part of speech.
The mark rule of acquisition is:If (wordi, adv) is then (word2, n), adv indicates that adverbial word, n indicate noun, this If illustrating, previous word part of speech is adverbial word, and the part of speech of latter word is noun.
Pattern three:Influence of the combination of preceding two word of expression or part of speech to the part of speech of current word.
The rule of acquisition is:If (part of speech 1, v) then (word 2, " ") then (word 3, n), v indicate verb.
Pattern six:Indicate { " previous word ", " part of speech of previous word ", " current word ", " current word part of speech ", " the latter Word ", " part of speech of the latter word " } go out new number.
By should be apparent that restriction effect of the word in pattern to the comparison of different length pattern.
From experimental data it can be seen that:The combination of each pattern with modal length the absolute number for being continuously increased a combination thereof Amount is also continuously increased.Due to being reduced by the support of more context sensitivity, pattern, confidence level increases, and part of speech energy It is enough to be also increased by the possibility of only determination.
More, institute is wanted since the number that word and its corresponding part of speech occur is far from the number that a part of speech individually occurs With the situation that the part of speech in word contextual information does restriction correspondence is more, more complicated, is unfavorable for disappearing to conversion of parts of speech part of speech Discrimination, and influence bigger of the word as a pair of part of speech of the factor of context, i.e., it is more accurate to the limitation of part of speech.In general, Influence of the word to part of speech is some larger in pattern, therefore the support of the pattern containing word wants smaller.
In order to carry out experiment comparison, the present embodiment is first labeled above-mentioned language material with the method for maximum entropy, is accurately 92.01%.According to the mark of acquisition rule, on the basis of maximum entropy model marks, annotation results are optimized, accurately It is 93.13%, better than the result marked with the maximum entropy method based on statistics merely.
The present invention provides the Rules for Part of Speech Tagging automatic obtaining methods based on Apriori algorithm, implement the technology There are many method and approach of scheme, the above is only a preferred embodiment of the present invention, it is noted that for the art Those of ordinary skill for, various improvements and modifications may be made without departing from the principle of the present invention, these change Protection scope of the present invention is also should be regarded as into retouching.The available prior art of each component part being not known in the present embodiment adds To realize.

Claims (5)

1. the Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm, which is characterized in that include the following steps:
Step 1, transaction database is inputted, and it is I={ A that the project set that transaction database includes, which is arranged,1, A2, A3, A4, A5, A6, A1~A6Indicate 6 subsets, i.e., one shares 6 projects, wherein A in expression transaction database1Indicate the set of previous word, A2 Indicate previous word part of speech set, A3Indicate the set of current word, A4Indicate current word part of speech set, A5Indicate the set of latter word, A6Indicate latter word part of speech set;
Step 2, the set L1 of frequent 1- item collections is calculated in scan item set I;
Step 3, the set L6 of frequent 6- item collections is generated;
Step 4:Correlation rule is obtained based on frequent 6- item collections.
2. according to the method described in claim 1, it is characterized in that, step 2 includes:
Use NiIndicate i-th of project A in project set IiThe number of appearance, i values are 1~6, are calculated according to following formula I-th of project support degree sup (Ai):
sup(Ai)=Ni/ | D |,
Wherein | D | the number of transactions that transaction database includes is indicated, by projects support and set minimum support min_ Support is compared, and deletes the program member that support is less than minimum support, obtains the set L of frequent 1- item collections1
3. according to the method described in claim 2, it is characterized in that, step 3 includes:It is generated by L1 connections, beta pruning candidate frequent Item collection set C2 counts each Candidate Set in C2, the Candidate Set less than minimum support is abandoned, 2- frequent to generate The set L2 of collection;
Candidate frequent item set set C3 is generated by L2 connections, beta pruning, each Candidate Set in C3 is counted, minimum support will be less than Candidate Set abandon, to generate the set L3 of frequent 3- item collections;
Candidate frequent item set set C4 is generated by L3 connections, beta pruning, each Candidate Set in C4 is counted, minimum support will be less than Candidate Set abandon, to generate the set L4 of frequent 4- item collections;
Candidate frequent item set set C5 is generated by L4 connections, beta pruning, each Candidate Set in C5 is counted, minimum support will be less than Candidate Set abandon, to generate the set L5 of frequent 5- item collections;
Candidate frequent item set set C6 is generated by L5 connections, beta pruning, each Candidate Set in C6 is counted, minimum support will be less than Candidate Set abandon, to generate the set L6 of frequent 6- item collections.
4. according to the method described in claim 3, it is characterized in that, step 4 includes:For each frequent item set Lx, x values It is 1~6, finds out wherein all nonvoid subsets, if frequent item set LxSupport sup (Lx) with the support of nonvoid subset a The ratio of sup (a) is more than Minimum support4, then there is correlation rule a==>(Lx- a), correlation rule, association is otherwise not present Rule is Rules for Part of Speech Tagging.
5. according to the method described in claim 4, being advised it is characterized in that, being associated with using createTransRule () function creation Then, using createL1 (), createL2 (), createL3 (), createL4 (), createL5 (), createL6 () six A function creation Frequent Set, six functions correspond to set L1, L2, L3, L4, L5 and L6, use getMinusCollect respectively (String [] a, String [] Lx) function asks a and LxDifference set.
CN201810466451.2A 2018-05-16 2018-05-16 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm Pending CN108664642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810466451.2A CN108664642A (en) 2018-05-16 2018-05-16 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810466451.2A CN108664642A (en) 2018-05-16 2018-05-16 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm

Publications (1)

Publication Number Publication Date
CN108664642A true CN108664642A (en) 2018-10-16

Family

ID=63779752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810466451.2A Pending CN108664642A (en) 2018-05-16 2018-05-16 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm

Country Status (1)

Country Link
CN (1) CN108664642A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739953A (en) * 2018-12-30 2019-05-10 广西财经学院 The text searching method extended based on chi-square analysis-Confidence Framework and consequent
CN109767617A (en) * 2018-12-20 2019-05-17 北京航空航天大学 A kind of public security traffic control service exception data analysis method based on Apriori
CN110619073A (en) * 2019-08-30 2019-12-27 北京影谱科技股份有限公司 Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm
CN111309777A (en) * 2020-01-14 2020-06-19 哈尔滨工业大学 Report data mining method for improving association rule based on mutual exclusion expression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320756A (en) * 2015-10-15 2016-02-10 江苏省邮电规划设计院有限责任公司 Improved Apriori algorithm based method for mining database association rule
CN105719155A (en) * 2015-09-14 2016-06-29 南京理工大学 Association rule algorithm based on Apriori improved algorithm
CN106407296A (en) * 2016-08-30 2017-02-15 江苏省邮电规划设计院有限责任公司 Local scan association rule computer data analysis method based on pre-judging screening

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719155A (en) * 2015-09-14 2016-06-29 南京理工大学 Association rule algorithm based on Apriori improved algorithm
CN105320756A (en) * 2015-10-15 2016-02-10 江苏省邮电规划设计院有限责任公司 Improved Apriori algorithm based method for mining database association rule
CN106407296A (en) * 2016-08-30 2017-02-15 江苏省邮电规划设计院有限责任公司 Local scan association rule computer data analysis method based on pre-judging screening

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767617A (en) * 2018-12-20 2019-05-17 北京航空航天大学 A kind of public security traffic control service exception data analysis method based on Apriori
CN109739953A (en) * 2018-12-30 2019-05-10 广西财经学院 The text searching method extended based on chi-square analysis-Confidence Framework and consequent
CN109739953B (en) * 2018-12-30 2021-07-20 广西财经学院 Text retrieval method based on chi-square analysis-confidence framework and back-part expansion
CN110619073A (en) * 2019-08-30 2019-12-27 北京影谱科技股份有限公司 Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm
CN110619073B (en) * 2019-08-30 2022-04-22 北京影谱科技股份有限公司 Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm
CN111309777A (en) * 2020-01-14 2020-06-19 哈尔滨工业大学 Report data mining method for improving association rule based on mutual exclusion expression

Similar Documents

Publication Publication Date Title
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN107220295B (en) Searching and mediating strategy recommendation method for human-human contradiction mediating case
CN108664642A (en) Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm
Mishra et al. MAULIK: an effective stemmer for Hindi language
CN106776562A (en) A kind of keyword extracting method and extraction system
CN107679036A (en) A kind of wrong word monitoring method and system
CN110297880B (en) Corpus product recommendation method, apparatus, device and storage medium
CN103106189B (en) A kind of method and apparatus excavating synonym attribute word
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method
CN103970730A (en) Method for extracting multiple subject terms from single Chinese text
CN109145260A (en) A kind of text information extraction method
Béchet et al. Discovering linguistic patterns using sequence mining
CN109947951A (en) A kind of automatically updated emotion dictionary construction method for financial text analyzing
Sadr et al. Unified topic-based semantic models: a study in computing the semantic relatedness of geographic terms
CN103678656A (en) Unsupervised automatic extraction method of microblog new words based on repeated word strings
CN108363691A (en) A kind of field term identifying system and method for 95598 work order of electric power
CN101957812A (en) Verb semantic information extracting method based on event ontology
CN106503256B (en) A kind of hot information method for digging based on social networks document
CN106776555A (en) A kind of comment text entity recognition method and device based on word model
CN109299248A (en) A kind of business intelligence collection method based on natural language processing
CN108491399A (en) Chinese to English machine translation method based on context iterative analysis
CN104008301A (en) Automatic construction method for hierarchical structure of domain concepts
Ayşe et al. Extraction of semantic word relations in Turkish from dictionary definitions
Mohnot et al. Hybrid approach for Part of Speech Tagger for Hindi language
CN107608959A (en) A kind of English social media short text place name identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181016

RJ01 Rejection of invention patent application after publication