CN108319584A - A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms - Google Patents

A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms Download PDF

Info

Publication number
CN108319584A
CN108319584A CN201810058993.6A CN201810058993A CN108319584A CN 108319584 A CN108319584 A CN 108319584A CN 201810058993 A CN201810058993 A CN 201810058993A CN 108319584 A CN108319584 A CN 108319584A
Authority
CN
China
Prior art keywords
word
speech
neologisms
filtering rule
microblogging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810058993.6A
Other languages
Chinese (zh)
Inventor
刘磊
贾亚璐
孙孟涛
陈浩
李静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201810058993.6A priority Critical patent/CN108319584A/en
Publication of CN108319584A publication Critical patent/CN108319584A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The present invention discloses a kind of new word discovery method of the microblogging class short text based on improved FP Growth algorithms, including:Corpus of text is obtained, is segmented using jieba, the pretreatments such as part-of-speech tagging;Frequent item set word set is obtained by the FP Growth algorithms of optimization, and to each frequent episode ordering;Repeated strings are obtained using N grams models, and intersection is taken with frequent item set;It is filtered by part of speech, filters out some parts of speech being of little use on morphological structure;Using improved mutual information, sliding iterates to calculate mutual information to filter neologisms;It is once filtered again using part of speech rule of combination library;Verify the validity that this method obtains neologisms.

Description

A kind of new word discovery based on the microblogging class short text for improving FP-Growth algorithms Method
Technical field
The invention belongs to text information processing fields, are specifically related to a kind of based on the microblogging for improving FP-Growth algorithms The new word discovery method of class short text.
Background technology
Microblogging is one of most popular social platform in current global range, and daily user can issue greatly on microblogging The text message of amount, this becomes one of the main source of network neologisms.
Difference lies in microblogging is short text, and the information of each user's publication does not exceed 140 for microblogging and general text Character, content is more random, and form has diversity.So research this kind of short text of microblogging is relatively difficult.But magnanimity The knowledge contained in microblogging text monitors public sentiment, and the research in the fields such as new word discovery has great importance.
The research of new word discovery at present is mainly based upon the knowledge of the name entity such as name, place name, mechanism name of traditional text , not relatively fewer based on the research of the new word discovery of microblogging short text, and compare with traditional text, since microblogging has The features such as text is short, irregular, effect of traditional new word discovery method in microblogging class short text are unsatisfactory.
FP-Growth algorithms obtain the frequent item set in data by twice sweep database, are a kind of efficient acquisitions The algorithm of frequent item set can be used for the acquisition of neologisms, but apply existing defects in microblogging class short text.Traditional FP-Growth algorithms have ignored part of speech to the influence at word in the discovery of neologisms, propose a kind of improved FP-Growth thus Algorithm, and neologisms are found in conjunction with N-grams models, improved mutual information and rule.
Invention content
For FP-Growth algorithms in the defect of the new word discovery of microblogging class short text, a kind of improved FP- is proposed Growth algorithms, take into account part of speech, the relevance that not only can be effectively expressed as by frequent episode between word word, also It is difficult that the identification that part of speech imbalance is brought can be cut down, improve to obtain by the integrated learning approach in conjunction with N-grams models Neologisms accuracy rate, while being filtered by part of speech, improved mutual information and part of speech rule of combination library.
To achieve the above object, the present invention adopts the following technical scheme that
A kind of new word discovery method of the microblogging class short text based on improved FP-Growth algorithms, includes the following steps:
Step (1) microblogging language material obtains and pretreatment
Microblogging language material is obtained using the api interface or acquisition reptile of microblogging, is stored as the file of html format.To file into The matching of row canonical obtains text therein, deletes URL therein, then makes pauses in reading unpunctuated ancient writings by punctuation mark.To obtained plain text It is segmented, part-of-speech tagging, uses the third party module jieba of python, obtain pretreated language material, be denoted as G;
Step (2) obtains frequent item set C using the FP-Growth algorithm process G of optimizationfp
Step (2.1) handles microblogging language material G, builds and improves FP-Growth models, two factors of comprehensive word frequency and part of speech, The calculation formula of part of speech relative probability value is as follows:
Wherein, f (w | pos (w)=a) indicates part of speech relative probability values of the word w when part of speech is a, naIndicate word in language material G Property be a word frequency number, N indicates word frequency number total in language material G, n(w | pos (w)=a)Indicate word frequency numbers of the word w when part of speech is a.
When building frequent item set, selection meets condition f (w | pos (w)=a) > α1Repeated strings as candidate frequent episode Collect Rfp, α1For the minimum support of setting.
Step (2.2) is to obtained frequent item set RfpCarry out sequence correction.In the frequent episode that FP-Growth algorithms obtain Word is unordered, thus by with original language material carry out sequence comparison, obtain sequential frequent item set Cfp
Step (3) obtains neologisms Candidate Set C using N-grams modelsgrams
The number for counting N number of word from language material while occurring, the frequency P for being obtained word by N-grams models while being occurred (w1,w2,w3,......wn).Selection meets condition α2< P (w1,w2,w3,......wn) < β2N member repeated strings as neologisms Candidate Set Cgrams, α22It is co-occurrence frequency threshold value.
Step (4) takes frequent item set CfpWith neologisms Candidate Set CgramsIntersection, obtain neologisms candidate C1={ c1, c2,…,cm},ci=(w1,w2,..wn),ciIndicate candidate neologisms, wnIndicate the former word of composition neologisms.
The FP-Growth algorithms of optimization take into account part of speech, not only can effectively be expressed as word original by frequent episode It is difficult can also to cut down the identification that part of speech imbalance is brought for relevance between language.It is obtained simultaneously by N-grams algorithms new Word Candidate Set CgramsThe frequent item set C obtained with the FP-Growth algorithms of optimizationfpThe integrated learning approach of intersection is taken to improve The accuracy rate of obtained neologisms.
Step (5) screens the word for wherein containing filtering part of speech in neologisms candidate C1, using part of speech label, Obtain neologisms Candidate Set C2
Filtering part of speech set includes:
It is filtered according to the above part of speech, is filtered through the neologisms Candidate Set C1 that step (4) obtains, obtains neologisms Candidate Set C2;
Step (6) is filtered neologisms Candidate Set C2 using improved mutual information, obtains neologisms candidate collection C3.If ci =(w1,w2,..wn), ci∈ C2, to each ciUsing the sliding of improved mutual information calculate, improved mutual information calculates public Formula is as follows:
Wherein, p (wi,wi+1) indicate word wiWith word wi+1The frequency occurred jointly, p (wi) indicate word wiFrequency, wi,i+1Table Show word wiWith neighbouring word wi+1It is combined into the weight of word,Indicate word wi, wi+1The frequency of the part of speech combination of co-occurrence,Indicate word wiPart of speech occur frequency.In all frequent item sets, selection meets condition I (wi,wi+1) > β3Word As new set of words C={ c1,c2,c3,......cm, each neologisms are c1=(w1,w2,w3,......wn) constitute, wherein β3For the threshold value of setting.
Step (7) combines the candidate new set of words C3 of filtering rule library R filterings by part of speech, then obtains final new word set Close C4
If ci=(w1,w2,..wn), ci ∈ C3, for each ci, for arbitrary (wi, wi+1), part of speech combination (pos(wi),pos(wi+1)), if meeting any regular in part of speech combination filtering rule library R, to neologisms ciIt is filtered To new set of words C4.
Part of speech rule of combination library R is made of following rule:
Filtering rule one:/ ns/v (can be nr, nz at ns);
Filtering rule two:/ ns/ns (can be nr, nz at ns);
Filtering rule three:/ n/v or/vn/v;
Filtering rule four:/t/t;
Filtering rule five:/t/nr;
Filtering rule six:/ t/f (can be vn, n, l, f at t);
Filtering rule seven:/v/t;
Filtering rule eight:/t/v;
Filtering rule nine:/ns/j;
Description of the drawings
A kind of flow charts of the new word discovery method based on the microblogging class short text for improving FP-Growth algorithms of Fig. 1;
Specific implementation mode
With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Following reality Example is applied for illustrating the present invention, but is not limited to the scope of the present invention.
According to Fig. 1, method proposed by the present invention is (by taking the Sina weibo as an example) realized according to the following steps successively:
Step (1) microblogging language material obtains and pretreatment
Microblogging language material is obtained using the api interface or acquisition reptile of microblogging, is stored as the file of html format, and by just The text message for extracting wherein user's publication is then matched, complicated and simple conversion then is carried out to the text message of acquisition, by punctuation mark Punctuate, punctuation mark include:‘.', ', ', ';’、‘’、‘!' and ':', the text of a line a word is obtained, current development is used More mature jieba Words partition systems carry out participle and part-of-speech tagging to text, and are filtered to the language material after participle, only retain the Chinese Word, part of speech label, obtain corpus G.The language material storage mode of G is with behavior unit, one microblogging text of storage per a line, and about 50 Wan Hang.
Step (2) utilizes improved FP-Growth algorithm process G, obtains frequent item set Cfp
Step (2.1) improves FP-Growth algorithms according to formula (1), wherein passing through parameter threshold α1To obtain frequent episode Collection, parameter alpha2=0.000000000001 so that algorithm not only only accounts for the frequency of frequent item set, but also considers part of speech It influences, because the disequilibrium of part of speech affects the appearance of some neologisms, such as:" anger/vg rancours/v ", "/vg " this part of speech is only Occur 60 times in language material, if only obtaining frequent item set by frequency, " anger/vg rancours/v " just can be filtered.
Frequent item set carry out sequence correction of the step (2.2) to being obtained by improved FP-Growth algorithms, because of FP- The frequent item set that Growth algorithms obtain is that do not have sequential, so by being corrected with original language material position versus, is obtained To frequent item set Cfp, Cfp={ c1,c2,c3,......cm, for each c of the insidei, ci=(w1,w2,w3,......wn), Here part frequent item set is listed, as shown in table 1 below:
1 part frequent item set C of tablefp
Step (3) obtains neologisms Candidate Set C using N-gramsgrams
Neologisms Candidate Set C is obtained using N-grams algorithmsgrams, Cgrams={ c1,c2,c3,......cmInside it is each A candidate's neologisms ci.Wherein N values are 2, ci=(w1,w2).For arbitrary ciAll pass through threshold alpha2、β2It is obtained by filtration, wherein α2 =10, β2=2000 filter out some noise datas of appearance, such as:" /uj people/n ".Here it is candidate that part neologisms are listed Collect Cgrams, as shown in table 2 below:
2 part neologisms Candidate Set C of tablegrams
Step (4) takes frequent item set CfpWith neologisms Candidate Set CgramsIntersection, obtain neologisms Candidate Set C1
By taking the available part neologisms candidate of intersection as shown in table 3 below:
3 part neologisms candidate C1 of table
Step (5) screens the word for wherein containing filtering part of speech in neologisms candidate C1, using part of speech label, Obtain neologisms Candidate Set C2
Filtering part of speech set includes:Distinction word (/b), secondary morpheme (/dg), interjection (/e), number (/m), is intended conjunction (/c) Sound word (/o), preposition (/p), quantifier (/q), pronoun (/r), place word (/s), tense morpheme (/tg), auxiliary word (/u), punctuation mark (/w), non-morpheme word (/x), modal particle (/y), descriptive word (/z).Part part-of-speech information table is as shown in table 4:
4 part part-of-speech information table of table
Shown in the neologisms Candidate Set C2 figure the following table 5 of part:
5 part neologisms candidate C2 of table
Step (6) is filtered neologisms Candidate Set C2 using improved mutual information, obtains neologisms candidate collection C3
Ci=(w1, w2), ci ∈ C2 are calculated each ci using improved mutual information, are calculated according to formula (2) To the mutual information for often singing neologisms in neologisms Candidate Set C2, pass through threshold alpha3Candidate Set is filtered, new set of words C3 is obtained, gives below The new set of words C3 in part is gone out, as shown in table 6:
The new word set C3 in 6 part of table
Then step (7) obtains final new set of words C4 by the candidate new set of words C3 of part of speech rule of combination library R filterings
It is filtered by following rule:
Filtering rule one:It is filtered out shaped like noise word as " Gansu/ns publications/v ", structure is as follows:/ ns/v is (at ns Can be nr, nz);
Filtering rule two:It is filtered out shaped like noise word as " Jiangxi/Ganzhou ns/ns ", structure is as follows:/ns/ns(ns Place can be nr, nz);
Filtering rule three:It is filtered out shaped like noise word as " reason/n is /v ", structure is as follows:/ n/v or/vn/v;
Filtering rule four:It is filtered out shaped like noise word as " tomorrow/t New Year's Eve/t ", structure is as follows:/t/t;
Filtering rule five:It is filtered out shaped like noise word as " tomorrow/t the beginning of spring/nr ", structure is as follows:/t/nr;
Filtering rule six:Shaped like " Ching Ming Festival/t during/f " as noise word filter out, structure is as follows:(t goes out also/t/f Can be vn, n, l, f);
Filtering rule seven:It is filtered out shaped like noise word as " by/v today/t ", structure is as follows:/v/t;
Filtering rule eight:Shaped like " now/t apparently/v " as noise word filter out, structure is as follows:/t/v;
Filtering rule nine:It is filtered out shaped like noise word as " Tianjin/ns traffic police/j ", structure is as follows:/ns/j;
The part neologisms obtained by rule-based filtering are as shown in table 7 below:
7 part neologisms candidate C4 of table
Step (8) new word identification effect analysis
The accuracy rate of the neologisms of computational algorithm identification calculates accuracy rate and sees formula (4):
Wherein, p indicates that the accuracy rate of model, tp indicate the correct neologisms quantity identified, the neologisms of fp wrong identifications Quantity.
Be obtained by calculation that improved FP-growth algorithms identify on microblogging class short text neologisms accurate is: 69%.This algorithm is got well than the effect of general FP-growth algorithms and N-grams models, the standard that general new word discovery is calculated True rate is all 57% or so.And compared with character labeling model, the present invention does not need a large amount of artificial marks of early period.

Claims (3)

1. it is a kind of based on improve FP-Growth algorithms microblogging class short text new word discovery method, which is characterized in that including with Lower step:
Step (1), microblogging language material obtain and pretreatment
Microblogging language material is obtained using the api interface or acquisition reptile of microblogging, carrying out canonical matching to file obtains microblogging therein Body matter deletes URL therein, then makes pauses in reading unpunctuated ancient writings by punctuation mark, is segmented to obtained plain text, part of speech mark Note, obtains pretreated language material, is denoted as G;
Step (2) utilizes improved FP-Growth algorithm process language material G, acquisition frequent item set Cfp
Step (3) obtains neologisms Candidate Set C using N-grams modelsgrams
The number for counting N number of word from language material while occurring, the frequency P (w for being obtained word by N-grams models while being occurred1,w2, w3,......wn).Selection meets condition α2< P (w1,w2,w3,......wn) < β2N members repeated strings as neologisms Candidate Set Cgrams, α22It is co-occurrence frequency threshold value.
Step (4) takes frequent item set CfpWith neologisms Candidate Set CgramsIntersection, obtain neologisms candidate C1={ c1,c2,…, cm},ci=(w1,w2,..wn),ciIndicate candidate neologisms, wjIndicate the former word of composition neologisms.
Step (5), in neologisms candidate C1, using part of speech label to wherein contain filtering part of speech word screen, obtain To neologisms Candidate Set C2
Step (6) is filtered neologisms Candidate Set C2 using improved mutual information, neologisms candidate collection C3 is obtained, if ci= (w1,w2,..wn), ci∈ C2, to each ciUsing improved mutual information formula, to adjacent wjIt is calculated, improved mutual trust It is as follows to cease calculation formula:
Wherein, p (wi,wi+1) indicate word wiWith word wi+1The frequency occurred jointly, p (wi) indicate word wiFrequency, wi,i+1Indicate word wiWith neighbouring word wi+1It is combined into the weight of word, npos(wi,wi+1)Indicate word wi, wi+1The frequency of the part of speech combination of co-occurrence,Indicate word wiPart of speech occur frequency;In all frequent item sets, selection meets condition I (wi,wi+1) > β3Word As new set of words C={ c1,c2,c3,......cm, each neologisms are c1=(w1,w2,w3,......wn) constitute, wherein β3For the threshold value of setting;
Step (7) combines the candidate new set of words C3 of filtering rule library R filterings by part of speech, then obtains final new set of words C4,
If ci=(w1,w2,..wn), ci ∈ C3, for each ci, for arbitrary (wi, wi+1), part of speech combines (pos (wi),pos(wi+1)), if meeting any regular in part of speech combination filtering rule library R, remove neologisms ci, finally obtain neologisms Set C4;
Part of speech combination filtering rule library R is made of following rule:
Filtering rule one:/ ns/v (can be nr, nz at ns);
Filtering rule two:/ ns/ns (can be nr, nz at ns);
Filtering rule three:/ n/v or/vn/v;
Filtering rule four:/t /t;
Filtering rule five:/t /nr;
Filtering rule six:/ t/f (can be vn, n, l, f at t);
Filtering rule seven:/v /t;
Filtering rule eight:/t /v;
Filtering rule nine:/ns /j.
2. the new word discovery method as described in claim 1 based on the microblogging class short text for improving FP-Growth algorithms, special Sign is that step (2) specifically includes:
Step (2.1) handles microblogging language material G, and two factors of comprehensive word frequency and part of speech build improved FP-Growth models, word The calculation formula of property relative probability value is as follows:
Wherein, f (w | pos (w)=a) indicates part of speech relative probability values of the word w when part of speech is a, naIndicate that part of speech is the language material G of a The word frequency number of middle word, N indicate word frequency number total in language material G, n(w | pos (w)=a)Indicate word frequency numbers of the word w when part of speech is a;
When building frequent item set, selection meets condition f (w | pos (w)=a) > α1Repeated strings as candidate frequent item set Rfp, α1For the minimum support of setting;
Step (2.2) is to obtained frequent item set RfpCarry out sequence correction, by with original language material carry out sequence comparison, obtain Sequential frequent item set Cfp
3. the new word discovery method as claimed in claim 2 based on the microblogging class short text for improving FP-Growth algorithms, special Sign is that filtering part of speech set includes in step (2):
b Distinction word Take the initial consonant of Chinese character " other ". c Conjunction Take the 1st letter of English conjunction conjunction. dg Secondary morpheme Adverbial morpheme.Adverbial word code is d, is set with D before morpheme code g. e Interjection Take the 1st letter of English interjection exclamation. m Number The 3rd letter of English numeral is taken, n, u have his use. o Onomatopoeia Take the 1st letter of English onomatopoeia onomatopoeia. p Preposition Take the 1st letter of English preposition prepositional. q Quantifier Take the 1st letter of English quantity. r Pronoun The 2nd letter for taking English pronoun pronoun, because p has been used for preposition. s Place word Take the 1st letter of English space. tg Tense morpheme Time part of speech morpheme.Time word code is t, is set with T before the code g of morpheme. u Auxiliary word Take English auxiliary word auxiliary w Punctuation mark x Non- morpheme word Non- morpheme word is a symbol, and alphabetical x is commonly used in representing unknown number, symbol. y Modal particle Take the initial consonant of Chinese character " language ". z Descriptive word Take the previous letter of the initial consonant of Chinese character " shape ".
CN201810058993.6A 2018-01-22 2018-01-22 A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms Withdrawn CN108319584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810058993.6A CN108319584A (en) 2018-01-22 2018-01-22 A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810058993.6A CN108319584A (en) 2018-01-22 2018-01-22 A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms

Publications (1)

Publication Number Publication Date
CN108319584A true CN108319584A (en) 2018-07-24

Family

ID=62887532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810058993.6A Withdrawn CN108319584A (en) 2018-01-22 2018-01-22 A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms

Country Status (1)

Country Link
CN (1) CN108319584A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543021A (en) * 2018-11-29 2019-03-29 北京光年无限科技有限公司 A kind of narration data processing method and system towards intelligent robot
CN110532548A (en) * 2019-08-12 2019-12-03 上海大学 A kind of hyponymy abstracting method based on FP-Growth algorithm
CN110874408A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Model training method, text recognition device and computing equipment
CN111339403A (en) * 2020-02-11 2020-06-26 安徽理工大学 Commodity comment-based new word extraction method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874408A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Model training method, text recognition device and computing equipment
CN110874408B (en) * 2018-08-29 2023-05-26 阿里巴巴集团控股有限公司 Model training method, text recognition device and computing equipment
CN109543021A (en) * 2018-11-29 2019-03-29 北京光年无限科技有限公司 A kind of narration data processing method and system towards intelligent robot
CN109543021B (en) * 2018-11-29 2022-03-18 北京光年无限科技有限公司 Intelligent robot-oriented story data processing method and system
CN110532548A (en) * 2019-08-12 2019-12-03 上海大学 A kind of hyponymy abstracting method based on FP-Growth algorithm
CN111339403A (en) * 2020-02-11 2020-06-26 安徽理工大学 Commodity comment-based new word extraction method
CN111339403B (en) * 2020-02-11 2022-08-02 安徽理工大学 Commodity comment-based new word extraction method

Similar Documents

Publication Publication Date Title
CN108509425B (en) Chinese new word discovery method based on novelty
CN104268160B (en) A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role
CN107180025B (en) Method and device for identifying new words
CN106897559B (en) A kind of symptom and sign class entity recognition method and device towards multi-data source
CN104636466B (en) Entity attribute extraction method and system for open webpage
CN102033879B (en) Method and device for identifying Chinese name
CN108319584A (en) A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms
CN102682120B (en) Method and device for acquiring essential article commented on network
CN105630884B (en) A kind of geographical location discovery method of microblog hot event
CN109543178A (en) A kind of judicial style label system construction method and system
CN105988990A (en) Device and method for resolving zero anaphora in Chinese language, as well as training method
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN108920482B (en) Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model
CN104298714B (en) A kind of mass text automatic marking method based on abnormality processing
CN106294744A (en) Interest recognition methods and system
CN104778256A (en) Rapid incremental clustering method for domain question-answering system consultations
CN110674296B (en) Information abstract extraction method and system based on key words
CN110457711B (en) Subject word-based social media event subject identification method
CN107577668A (en) Social media non-standard word correcting method based on semanteme
CN107357785A (en) Theme feature word abstracting method and system, feeling polarities determination methods and system
CN113033185B (en) Standard text error correction method and device, electronic equipment and storage medium
CN110941720A (en) Knowledge base-based specific personnel information error correction method
CN107943786A (en) A kind of Chinese name entity recognition method and system
CN106980620A (en) A kind of method and device matched to Chinese character string
CN110119443A (en) A kind of sentiment analysis method towards recommendation service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20180724