CN106547866B - A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word - Google Patents

A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word Download PDF

Info

Publication number
CN106547866B
CN106547866B CN201610936655.9A CN201610936655A CN106547866B CN 106547866 B CN106547866 B CN 106547866B CN 201610936655 A CN201610936655 A CN 201610936655A CN 106547866 B CN106547866 B CN 106547866B
Authority
CN
China
Prior art keywords
network
word
emotion
classification
mrow
Prior art date
Application number
CN201610936655.9A
Other languages
Chinese (zh)
Other versions
CN106547866A (en
Inventor
马力
刘锋
李培
白琳
宫玉龙
杨琳
Original Assignee
西安邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安邮电大学 filed Critical 西安邮电大学
Priority to CN201610936655.9A priority Critical patent/CN106547866B/en
Publication of CN106547866A publication Critical patent/CN106547866A/en
Application granted granted Critical
Publication of CN106547866B publication Critical patent/CN106547866B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word, it is theoretical using random network, utilize word co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, form a stochastic network model based on word order built with affective characteristics, that is emotion Term co-occurrence network model, model yojan is carried out on this basis, the most long matching process of emotion word and TC algorithms are combined and carry out the classification of SWLM TC unsupervised learnings, or further the most long matching process of emotion word and HMM machine learning algorithms are combined and establishes fine granularity sentiment classification model and is predicted using model realization classification;The fine granularity emotional semantic classification of the achievable paragraph level text of the present invention, improve the precision of simple TC algorithms, make classification more accurate, using SWLM TC to after sample set progress HMM model training and carrying out mood classification to sample to be tested storehouse, improve the automation of simple machine learning algorithm.

Description

A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word

Technical field

The invention belongs to technical field of information retrieval, more particularly to a kind of fine granularity based on the random co-occurrence network of emotion word Sensibility classification method.

Background technology

In recent years, with the fast development of economy and information technology, the development of social pattern of networking profound influence is interconnected, And huge impetus is generated to economy, internet resident generates vast as the open sea information, accelerates in mobile Internet During landing, the popularization of various Intelligent mobile equipments, information is allowed to be passed with lower cost, faster speed in internet Broadcast, different types of information can produce different influences, and passive speech can allow netizen to produce negative effect, and malignant population disappears The generation of breath and public accident, can not only influence the emotion of individual, or even can produce huge economic loss, excavate emotion letter Breath is just into urgent problems.In terms of the construction of text emotion corpus, current existing corpus includes Pang corpus, Whissell corpus, Berardinelli film comment corpus, product review corpus, and Chinese Emotional Corpus marks The resource of aspect is then fewer, and Tsing-Hua University is labelled with the emotion language material of part of tourism sight spot description, is synthesized for assistant voice, But scale is also smaller.Blog, the forum text related to on-line news and commentary etc. are called new text in the world, on network Such new text carry out sentiment analysis for us and provide data source, the analysis to new text turns into processing works as One focus of preceding research.In today of informationization, network has become a part for people's life, and mood is parsed into understand The important references of the true idea of netizen, in the contingency management of public accident, utilize the new text research network condition of the people on network Thread turns into a new direction.

Relatively go deep into for the tendency Journal of Sex Research of text at present, the relatively success in product review and film review, in view of The complexity of language, the difference of individual expression, and formation to human emotion do not have systematic description, at present for fine granularity Sentiment analysis is also seldom, and Chinese during evolution, make by many reasons such as grammer is free, have a large vocabulary, form is freer Into the difference with English sentiment analysis, the semantic analysis being commonly used in English is very big in the analysis difficulty of Chinese, causes Many difficulties.Sentiment analysis and psychologic relation are inseparable, and psychological study is found, between vocabulary and human emotion Relation can measure, and the semantic tendency of independent vocabulary or phrase is important for passing on human emotion.There is research table Bright, the semantic tendency of vocabulary and phrase mainly has two phenomenons:1) the emotion term of same tendency often occurs simultaneously;2) it is opposite The emotion term of tendency occurs when typically different.Due to the presence of the two phenomenons, sentiment analysis can simplify many things, have Research shows that the word co-occurrence network that the plain text of English and Chinese is established all meets small world, and in this network On the basis of carried out the research of text segmentation and the aspect of subject extraction, have research by stochastic network model for text subject point Analysis, there is research that stochastic network model is used for into sentiment classification, but random network theory is applied to the fine granularity feelings of text In sense analysis, have no that correlative study is reported at present.

The content of the invention

The shortcomings that in order to overcome above-mentioned prior art, it is an object of the invention to provide one kind to be based on the random co-occurrence of emotion word The fine granularity sensibility classification method of network, by the most long matching SWLM (Sentimental Word Longest Match) of emotion word It is combined with machine learning algorithm, the fine granularity emotional semantic classification of paragraph level text can be achieved.

To achieve these goals, the technical solution adopted by the present invention is:

A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word, it is theoretical using random network, utilize Word co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, one is formed with affective characteristics structure based on word order Stochastic network model, i.e. emotion Term co-occurrence network model, model yojan is carried out on this basis, by the most long match party of emotion word Method (SWLM, Sentimental Word Longest Match) and TC algorithms, which combine, carries out SWLM-TC unsupervised learning classification, Or further the most long matching process of emotion word and HMM machine learning algorithms are combined and establishes fine granularity sentiment classification model and utilizes Model realization classification prediction.

The building process of the emotion Term co-occurrence network model is as follows:

1) subordinate sentence is performed to each text to operate to obtain one group of orderly sentence S1→S2→…→Sn

2) to each sentence SiSegmented, filter out stop words and insignificant notional word, use emotion vocabulary body Storehouse carries out emotion word mark, obtains one group of orderly emotion word W1→W2→…→Wn

3) to each sentence, using WL (Word Long, length of window, typically take 2) position sliding window word is extracted from sentence Remittance pair<wi,wj>IfA new node w is then added into Wi, and be wiWeight nwiIf initial value is 1;Otherwise nwi Add 1, ifA new side (w is then added into Ei,wj), and be (wi,wj) weight nwi,wjIf initial value is 1;It is no Then nwi,wjAdd 1;

4) after all text-processings are completed, network model G, which is established, to be completed;

Wherein, S represents that, by the molecular sequence of a plurality of sentence, w represents the emotion word extracted, and w ∈ ∑s, ∑ is Chinese vocabulary Collection, Chinese word collects to remove stop words, the emotional noumenon word after emotion vocabulary ontology library mark again after meaningless notional word Collection;W be network model G node set, W={ wi| i ∈ [1, N] }, N is G node number;E is network model G side collection Close, the number on network model G side is M, E={ (wi,wj)|wi,wj∈ W, and wiAnd wjBetween order cooccurrence relation be present, (wi,wj) represent from node wiPoint to node wjDirected edge;NWFor the weight of network model G interior joints, NW={ nwi|wi∈w}; NE is the weight on side in network model G, represents node wiWith wjBetween side weight, NE={ nwi,wj|(wi,wj)∈E}。

In the emotion vocabulary ontology library, emotion is divided into the group of 7 major class 21, emotional semantic classification be respectively it is happy happy (PA), Feel at ease (PE) }, good { respect (PD), praise (PH), believe (PG), like (PB), wish (PK) }, anger { angry (NA) }, sorrow it is { sad Wound (NB), disappointed (NJ), remorse (NH), think (PE), fear { flurried (NI), frightened (NC), shy (NG) }, dislike and { unhappy (NE), abhor (ND), censure (NN), envy (NK), suspect (NL) }, frightened { surprised (PC) };Power points of emotion intensity is 1,3,5,7,9 five grades, 9 represent maximum intensity, and 1 represents that intensity is minimum, and the part of speech species one in emotion vocabulary body is divided into 7 classes, is noun respectively (noun), verb (verb), adjective (adj), adverbial word (adv), network vocabulary (nw), Chinese idiom (idiom), prepositional phrase (prep) emotion word 27466, is contained altogether.

By network model G according to pleasure, good, anger, sorrow, fear, dislike, shy seven kinds of moods and be divided into 7 sub-networks, sub-network was split Cheng Zhong, if the situation of fracture occurs, it is attached, is built with the network sub-block being broken using that node of weight highest The seven sub- network G x calculated available for fine granularity | x={ 1,2,3,4,5,6,7 } is G1,G2,G3,G4,G5,G6,G7

The most long matching process of the emotion word carries out most long matching by the weight limit vocabulary of emotion word so that does not use Disambiguation and noise control sonication, you can under Accurate classification to related emotion theme, and weight is carried out by seven subclassification models Calculate, draw the parameter that can carry out machine learning classification.

When being classified, have and be defined as below:

Most long weight coupling path length dmax(S):Network G x | x={ 1,2,3,4,5,6,7 }, if two emotion words are suitable Sequence covers, then is matched using the side being joined directly together, if two emotion words are in network GxIn network compartments be present, then select Selection is matched by the maximum node of weight when path, and as S length, calculation formula are as follows:

Wherein dmax(wi,wi+x) be in network i-th of word to the weight limit coupling path of the i-th+x words;

Emotion weight coefficient SW (Sentimetal weight):In network G, the respective shared emotion pole of seven sub-networks Property proportion, using this coefficient classification can be allowed more obvious, reduced because classification problem caused by boundary is fuzzy, make emotion word network The number of recurrences of middle word is freq, polar intensity P, and calculation formula is as follows:

WCi=freq × P

Wherein WC is the emotion numerical value of each word in sub-network, WyFor the emotion numerical value of sub-network, SWxFor sub-network x SW Value, i.e. emotion weight coefficient;

Classification factor CC (Classification coefficient):After being determined in maximum matching word path, this The reproduction degree Re and emotion intensity power of word on path, it is assumed that have n word, then calculation formula is as follows:

CCi=Re × power

Wherein CCiIt is the classification factor of single word;

Classify predictive coefficient CPC (Classification prediction coefficient):Using machine learning When algorithm is classified, for can not judgement sample the forecasting mechanism taken of classification;According to SWxIt is ranked up, if SW1+ SW2>80%, SW1/SW2>1.5, then it is included into SW1Under, if SW1+SW2>80%, SW1/SW2<=1.5, it is included into this case SW1And SW2Under two attributes;If SW1+SW2<80%, then it represents that the classification of this article is more complicated, according to classification factor It is included under corresponding classification:

The SWLM-TC methods comprise the following steps:

1) subordinate sentence is carried out to the article of required classification, institute subordinate sentence subsequence is S '1→S′2→…→S′n

2) each order sentence is segmented, and removes insignificant notional word, auxiliary word, and use emotion vocabulary body Dictionary is labeled, and selects labeled word, in sequence, i.e. W '1→W′2→…→W′n

3) corresponding network search is carried out according to the ownership of the word marked;

4) Path selection is carried out to the word in network, if two adjacent words, then using the road being joined directly together Footpath;If two non-conterminous words, then select on their phase access paths, by the word in weight limit path, according to upper State step and find weight limit path, find out dmax(S);

5) weight limit path d is calculatedmax(S) the classification factor CC on;

6) the classification factor CC under each home subnet network of calculating, coefficient of comparisons size, if identical, classification factor CC* SW, SW are emotion weight coefficient (Sentimetal weight), that is, weight of the emotion of classifying in 7 sub-networks, if not phase Together, then, if first weight accounts for 80 percent, corresponding mood is belonged to according to principle of ordering final classification factor CC Under network, if no more than 80 percent, this classification is grouped under weight ranking the first two mood network;

In the case of 7) if classification can not being ensured, treat classifying text according to classification predictive coefficient CPC classify it is pre- Survey.

It is described to establish fine granularity sentiment classification model and be using the method for model realization classification prediction:

1) fine grit classification is carried out to a part of text in wherein all samples using SWLM-TC, calculated in sample set The weight coefficient SW of the affiliated emotion of each textx, another part text is as classification confirmatory experiment;

2) all samples for being classified using SWLM-TC:Calculate the classification factor CC of each text, classification factor Classified according to the step of SWLM-TC algorithms the 6th, then this sample is added the corresponding emotional semantic classification collection TS of xxUnder (Train Set), If in the case of classification factor is indeterminable, carried out using SWLM-TC, then it is predicted using the step of SWLM-TC algorithms the 7th, It is grouped under corresponding classification;

3) after sample data having been calculated into text emotion using SWLM-TC algorithms, the text training accordingly classified is used HMM disaggregated models, then it is trained using HMM disaggregated models:

A) for text to be measured, classified using HMM algorithms, if can correctly classify, decide that this text Sub- emotional semantic classification;

B) for without classification results text, classification prediction is carried out using classification predictive coefficient CPC.

HMM is a kind of machine learning method, carries out affection computation to sample set first by SWLM-TC algorithms, is divided Class is 7 sub- emotion text Sample Storehouses, trains HMM model using Sample Storehouse, uses the HMM model trained, it is possible to text The remaining a part of text in this storehouse carries out text classification test checking

Compared with prior art, the beneficial effects of the invention are as follows:

1st, this method can carry out fine grit classification to the emotion of text, calculate, have more different from traditional tendentiousness Fine-grained classification.

2nd, the precision of simple TC algorithms is improved, makes classification more accurate.

3rd, carried using SWLM-TC to after sample set progress HMM model training and carrying out mood classification to sample to be tested storehouse The high automation of simple machine learning algorithm.

Brief description of the drawings

Fig. 1 is inventive algorithm general flow chart.

Fig. 2 is SWLM-TC algorithm flow charts of the present invention.

Fig. 3 is SWLM-HMM algorithm flow charts of the present invention.

Fig. 4 is the line chart of the labeling algorithm TC experimental datas based on word frequency.

Fig. 5 is the line chart of SWLM-TC heuritic approach experimental datas.

Fig. 6 is the line chart of SWLM-HMM algorithm experimental data.

Fig. 7 is micro- average data schematic diagram in present invention experiment.

Fig. 8 is grand average data schematic diagram in present invention experiment.

Fig. 9 is grouped data distribution schematic diagram in present invention experiment (correct classification).

Figure 10 is grouped data distribution schematic diagram (assigning to such by mistake) in present invention experiment.

Figure 11 is grouped data distribution schematic diagram in present invention experiment (belonging to such by mistake to be divided).

Embodiment

Describe embodiments of the present invention in detail with reference to the accompanying drawings and examples.

As shown in figure 1, a kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word of the present invention, first, It is theoretical using random network, using word co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, one is formed with emotion spy The stochastic network model based on word order of structure, i.e. emotion Term co-occurrence network model are levied, carries out model about on this basis Letter, the most long matching process (SWLM, Sentimental Word Longest Match) of emotion word and TC algorithms are combined and carried out SWLM-TC unsupervised learnings are classified, or are further combined the most long matching process of emotion word and HMM machine learning algorithms and established carefully Granularity sentiment classification model simultaneously utilizes model realization classification prediction.Particular content is as follows:

The 1 emotion word co-occurrence model based on random network

Studied for the ease of carrying out fine granularity to paragraph level text, find the inherent law between emotion word, this hair It is bright by improving document [YANG Feng, PENG Qin-ke, XU Tao, Sentiment Classification for Comments Based on Random Network Theory,Acta Automatica Sinica,2010.6 Vol.36, No6] method that is proposed, to build the emotion Term co-occurrence network model of suitable fine granularity sentiment analysis.

1.1 emotion vocabulary body dictionaries

Chinese emotion vocabulary ontology library is that Dalian University of Technology's Research into information retrieval room passes through under the guidance that Lin Hongfei is taught Cross the Chinese ontological resource that the effort of all teaching and research room members is arranged and marked.The emotional semantic classification of Chinese emotion vocabulary body System is built on the basis of the 6 major class emotional semantic classification systems of more influential Ekman abroad.On Ekman basis On, vocabulary body adds emotional category " good " and has carried out finer division to commendation emotion.Emotion in final vocabulary body It is divided into the group of 7 major class 21.Emotional semantic classification is respectively happy { happy (PA), feeling at ease (PE) }, good { respect (PD), commendation (PH), phase Letter (PG), like (PB), wish (PK) }, anger { angry (NA) }, sorrow { sad (NB), disappointed (NJ), remorse (NH), think (PE) }, fear { flurried (NI), frightened (NC), shy (NG) }, dislike { unhappy (NE), abhor (ND), censure (NN), envy (NK), suspect (NL) }, be frightened { surprised (PC) }.Power points of emotion intensity is 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 represents that intensity is minimum.Emotion word The part of speech species one converged in body is divided into 7 classes, is noun (noun) respectively, verb (verb), adjective (adj), adverbial word (adv), network vocabulary (nw), Chinese idiom (idiom), prepositional phrase (prep), emotion word 27466 is contained altogether.

1.2 network model

Small-world network model [Watts D J, the Strogtz S H.Collective that Watts and Strogtz is introduced dynamics of‘small-world’networks.Nature,1998,393(6684):440-442], Barabasi and Scale-free model [Barabasi A L, the Albert R.Emergence of scaling in that Albert is proposed random networks.Science,1999,286(5439):509-512] carry out complex network and start after sex work, Compared to regular network and random network, worldlet and scales-free network have:Small average path length, big convergence factor The co-occurrence patterns network constructed by the association between word has the correlation properties of small-world network.It can construct Small average length and big convergence factor are utilized on meaning of a word network, is rapidly performed by the classification of emotion grain size characteristic.

1.3 emotion Term co-occurrence network models

Document [Shi Jing, Hu Ming, Dai Guo-Zhong.Topic analysis of Chinese text based on small world model.Journal of Chinese Information Processing,2007,21 (3):69-75 (Shi Jing, Hu Ming, Topic Analysis of Chinese Text Journal of Chinese Information Processings of the state loyalty based on Small World Model is worn, 2007,21(3):In 69-75)], stochastic network model, document [YANG are established according to the common cooccurrence relation between word Feng,PENG Qin-ke,XU Tao,Sentiment Classification for Comments Based on Random Network Theory, Acta Automatica Sinica, 2010.6Vol.36, No6] in, according to the order between word Cooccurrence relation establishes stochastic network model, and the increment type that this network model is used for news short commentary opinion creates word order co-occurrence net Network, coordinate SCP algorithms, have very big benefit to the short commentary opinion for lacking a large amount of term networks, this algorithm calculates for tendentiousness Compare suitable, but be not suitable for for the emotional semantic classification of more granularities.

In order to build fine-grained sentiment analysis, the present invention uses emotion word Build Order co-occurrence stochastic network model, feelings Sense Term co-occurrence order embodies the information of the semantic aspect of emotion word, such as front and rear modification, the co-occurrence distance and word of emotion word The semantic relation of language has very big relation.The present invention is to establish model, i.e. co-occurrence according to the close order cooccurrence relation of emotion word The length of window WL smaller (typically taking 2) in region, and in view of the orbution of vocabulary co-occurrence.

After the completion of emotion Term co-occurrence network struction, by this big network model according to seven different emotion major classes, It is divided into seven small emotion word networks, then carries out associative operation.

In order to describe the construction method of emotion Term co-occurrence network model, the mathematical definition of correlation is used:

∑:Chinese word collects, and the word finder that the present invention uses is to pass through emotion word again after removing stop words, meaningless notional word Emotional noumenon word set after remittance ontology library mark;

w:The emotion word extracted.W∈∑;

S:By the molecular sequence of a plurality of sentence;

N:G node number;

M:The number on G side;

W={ wi|i∈[1,N]}:G node set;

E={ (wi,wj)|wi,wj∈ W, and wiAnd wjBetween existing order cooccurrence relation:G line set, wherein (wi, wj) represent from node wiPoint to node wjDirected edge;

NW={ nwi|wi∈w}:The weight of G interior joints;

NE={ nwi,wj|(wi,wj)∈E}:The weight on side in G, represent node wiWith wjBetween side weight.

Emotion Term co-occurrence network model G method for building up is given below:

1) subordinate sentence is performed to each text to operate to obtain one group of orderly sentence S1→S2→…→Sn

2) to each sentence SiSegmented, filter out stop words and insignificant notional word, use emotion vocabulary body Storehouse carries out emotion word mark, obtains one group of orderly emotion word W1→W2→…→Wn

3) to each sentence Si, vocabulary pair is extracted from sentence using WL (typically taking 2) position sliding window<wi,wj>IfA new node w is then added into Wi, and be wiWeight nwiIf initial value is 1;Otherwise nwiAdd 1, to wjOperation With wiIt is similar, ifA new side (w is then added into Ei,wj), and be (wi,wj) weight nwi,wjIf initial value is 1;Otherwise nwi,wjAdd 1.

4) after all text-processings are completed, network model G, which is established, completes

5) network model G is divided into 7 sub-network subnets according to seven kinds of moods (happy, good, anger, sorrow, fear, dislike, be frightened) In network split process, if the situation of fracture occurs, connected using that point of weight highest and the network sub-block of fracture Connect, the seven sub-network (G calculated available for fine granularity1,G2,G3,G4,G5,G6,G7) structure completion.

The emotion fine granularity tagsort of 2 text-orienteds

In research before, using the Sentiment orientation of text as research emphasis, with the research that deepens continuously in this field, Fine-grained researching value and purposes just highlight, and fine granularity and sentiment classification emphasis are different, and fine granularity is one Individual polytypic problem, and tendentiousness only needs to calculate the tendentiousness of text, used artificial mark dictionary is also Different, tendency Journal of Sex Research only needs to mark out the tendentiousness of word, and fine granularity marks dictionary emotion vocabulary ontology library then to relating to And emotion word carried out the correlated characteristics such as part of speech species, intensity, polarity and marked.Calculated with the use of HMM machine learning Method carries out fine granularity emotional semantic classification.

The most long matching sorting technique SWLM of emotion word, most long matching is carried out by the weight limit vocabulary of emotion word so that Disambiguation and noise control sonication are not used, can just compare to be accurate to and be categorized under related emotion theme, and pass through seven subclassifications Model carries out calculating weight, draws the parameter that machine learning classification is carried out available for HMM.

The present invention is defined as below:

Define 1 (most long weight coupling path length dmax(S))

Network GxIn, if two emotion word order coverings, are matched using the side being joined directly together;If two feelings Feel word G in a networkxNetwork compartments be present, then select to select to be matched by the maximum node of weight when path, i.e., For most long weight coupling path S length.Calculation formula is as follows:

Wherein dmax(wi,wi+x) be in network i-th of word to the weight limit coupling path of the i-th+x words.

Define 2 (emotion weight coefficient SW (Sentimetal weight))

In meaning of a word network G, the respective shared feeling polarities proportion of seven sub-networks, using this coefficient classification can be allowed brighter It is aobvious, reduce because classification problem caused by boundary is fuzzy, the number of recurrences for making word in emotion word network is freq, and polar intensity is P.Calculation formula is as follows:

WCi=freq × P

Wherein WC is the emotion numerical value of each word in sub-network, WyFor the emotion numerical value of sub-network, SWxFor sub-network x SW Value, i.e. emotion weight coefficient.

2.1 use the SWLM-TC unsupervised learning sorting techniques of TC algorithms

Define 3 (classification factor CC (Classification coefficient))

Defined classification factor when classification factor is SWLM-TC unsupervised algorithm classifications, it is true in maximum matching word path After fixed, the reproduction degree Re of the word on this paths, also emotion intensity power, it is assumed that have n word, calculation formula is such as Under:

CCi=Re × power

Wherein CCiIt is the classification factor of single word.

Define 4 (classification predictive coefficient CPC (Classification prediction coefficient))

Classification predictive coefficient be when being classified using machine learning algorithm, for can not judgement sample classification When, the forecasting mechanism taken.According to SWxIt is ranked up, if SW1+SW2>80%, wherein SW1/SW2>1.5, then it is included into SW1Under, otherwise it is included into SW1And SW2Under;If SW1+SW2<80%, then it represents that, the classification of this article is more complicated, then according to Classification factor is included under corresponding classification.

Due to the appearance of the text emotion word of paragraph level, the main line train of thought of article emotion is showed, random by co-occurrence The use of network, this affectional emotion train of thought are retained, therefore are had well by the random co-occurrence network of order Performance.

With reference to figure 2, the processing step that TC algorithms are marked using emotion word based on SWLM is as follows:

1) subordinate sentence, S ' are carried out to the article of required classification1→S′2→…→S′n

2) each order sentence is segmented, and removes insignificant notional word, auxiliary word, and use emotion vocabulary body Dictionary is labeled, and selects labeled word, in sequence, i.e. W '1→W′2→…→W′n

3) corresponding network search is carried out according to the ownership of the word marked;

4) Path selection is carried out to the word in network, I is if two adjacent words, then using the road being joined directly together Footpath;II is then selected on their phase access paths if two non-conterminous words, by the word in weight limit path, according to Above-mentioned steps find weight limit path, find out dmax(S);

5) weight limit path d is calculatedmax(S) the classification factor CC on, calculating process refer to definition 3;

6) the classification factor CC under each home subnet network is calculated, coefficient of comparisons size, if I coefficient magnitudes are identical, is divided Class coefficient CC*SW;If II coefficients differ, carry out step III, III according to principle of ordering final CC, if first Weight accounts for 80 percent, then belongs under corresponding mood network, if no more than 80 percent, this classification is grouped into Weight coefficient CC, under preceding two class.

7) in the case of if classification can not being ensured, classifying text is treated according to defining 4 and carries out classification prediction.

2.2 SWLM-HMM based on supervision machine study algorithm classification

Machine learning occupies very big effect in text classification, and HMM algorithms have extraordinary performance in NLP, due to The terseness of HMM algorithms, amount of calculation is small, can be carried out training for the sample sequence of random length, in view of HMM is to fine granularity feelings Sense classification is learnt, to improve the degree of accuracy of SWLM-HMM classification.

When classification using SWLM-HMM, it is impossible to directly corpus is trained using HMM, entered with reference to SWLM Handled after row processing, reuse HMM algorithms and be trained, the degree of accuracy of classification can be improved and accelerate classification speed.

With reference to figure 3, the method for training corpus is as follows:

(1) a part of text in Sample Storehouse is used, fine grit classification is carried out to sample set using SWLM-TC, wherein SWLM-TC assorting process calculates the weight coefficient SW of the affiliated emotion of this sample as shown in SWLM-TCx

(2) all samples for being classified using SWLM-TC:Calculate the classification factor CC of each text, classification system Number is classified according to the step of SWLM-TC algorithms the 6th, then this sample is added the corresponding emotional semantic classification collection TS of xx(Train Set) Under, if in the case of classification factor is indeterminable, carried out using SWLM-TC, then carried out using the step of SWLM-TC algorithms the 7th pre- Survey, be grouped under corresponding classification;

(3) after the part text sampled in Sample Storehouse being used into the good sub- emotion of SWLM-TC algorithm classifications, by what is obtained Training text, use the text training HMM disaggregated models accordingly classified;Wherein classify and be characterized in the emotion of each text Word has marked, and has sequentially formed chain type word according to text, will each text during being trained using HMM model This emotion word word string and the sub- emotion classified pass to HMM model as parameter, HMM model training are carried out, by all samples HMM algorithms are all input to be trained;

A) for remaining text in Sample Storehouse, classified using HMM algorithms, if can correctly classify, carried out corresponding Sort out and calculate.

B) for without classification results text, classification prediction is carried out using defining 4.

3 emotion fine granularity tagsorts are tested

3.1 grouped data

Experimental data of the present invention is using the blog data and CCF natural language processings collected and Chinese computing meeting evaluation and test NLP&CC2014 data.Microblog data 7000 is crawled, have chosen 4000 blog datas therein, the number with NLP&CC2014 Merged according to 2000 in concentration similar topic, about 6000, the sample marked in last corpus data, wherein all Selection is containing the microblogging being in a bad mood, and is rejected for loss of emotion microblogging and blog data, including following data are formed:

1)TrainDataNet:Use wherein 6000 microblog datas;

2)TrainDataHMM:Using 5000 microblog datas in 6000, wherein sampling gathers feelings in wherein containing 7 Feel the microblogging of data.

3)TrainDataTest:Using except TrainDataHMM other 1000 data.

The data distribution of table 1

Accuracy and recall rate apply two most-often used metrics in information retrieval and Statistical Classification field, are used for The quality of evaluation result.

Present invention experiment is passed through using experiment gathered data and Chinese NLP&CC2014 Chinese tendentiousness evaluation and test data set After the test of system, experimental result is as follows.

3.2 classification results

A. initial data is tested

1) SWLM-TC emotional semantic classification experiment

Using the validity of SWLM-TC verification algorithms, the validity of classification is weighed using accuracy, recall rate and F values.

By seven classification results displayings such as table 2:

The SWLM-TC algorithm classification results of table 2

Emotional semantic classification Correctly classified Assign to such by mistake Belong to such but by mistake to be divided It is happy 641 378 242 It is good 654 341 227 Anger 621 412 225 Sorrow 610 384 239 Fear 609 351 245 Dislike 619 362 222 It is frightened 627 359 219

2) SWLM-HMM emotional semantic classification experiment

SWLM-HMM algorithm experimental results are as shown in table 3

The SWLM-HMM algorithm classification results of table 3

3) the emotional semantic classification experiment of TC algorithms

The checking test result of the test of TC algorithms is as shown in table 4 below

The TC algorithm classification results of table 4

Emotional semantic classification Correctly classified Assign to such by mistake Belong to such but by mistake to be divided It is happy 531 468 352 It is good 547 478 334 Anger 511 426 335 Sorrow 521 437 328 Fear 534 456 320 Dislike 508 434 333 It is frightened 519 471 327

B. accuracy rate, recall rate, F1 values

Data calculating is carried out to above-mentioned experimental data, accuracy rate, recall rate and the F1 Value Datas that SWLM-TC is calculated are such as Under:

1) SWLM-TC emotional semantic classification experiment

P, R and the F1 value of SWLM-TC algorithms are as shown in table 5 below

The SWLM-TC of table 5 P, R and F1 value

Emotional semantic classification Accuracy rate Recall rate F1 It is happy 62.90% 72.59% 67.40% It is good 65.73% 74.23% 69.72% Anger 60.12% 73.40% 66.10% Sorrow 61.37% 71.85% 66.20% Fear 63.44% 71.31% 67.14% Dislike 63.10% 73.60% 67.95% It is frightened 63.59% 74.11% 68.45%

2) SWLM-HMM emotional semantic classification experiment

P, R and the F1 value of SWLM-HMM algorithms are as shown in table 6 below

The SWLM-HMM of table 6 P, R and F1 value

3) the emotional semantic classification experiment of TC algorithms

P, R and the F1 value of TC algorithms are as shown in table 7 below

P, R and F1 value of the TC algorithms of table 7

Emotional semantic classification Accuracy rate Recall rate F1 It is happy 53.15% 60.14% 56.43% It is good 53.37% 62.09% 57.40% Anger 54.54% 60.40% 57.32% Sorrow 54.38% 61.37% 57.66% Fear 53.94% 62.53% 57.92% Dislike 53.93% 60.40% 56.98% It is frightened 52.42% 61.35% 56.54%

C. it is grand average and micro- average

Use the grand average and as shown in table 8 below for average P, R and F1 value of each algorithm

The algorithm of table 8 it is grand average and micro- average

Labeling algorithm TC, SWLM-TC heuritic approach based on word frequency, SWLM-HMM algorithms, several foldings of experimental data The contrast of line chart is as shown in Figure 4, Figure 5 and Figure 6.

What Fig. 4, Fig. 5 and Fig. 6 were represented is the result analyzed using same sentiment dictionary, and wherein TC algorithms generation is walked always To see, the most basic TC algorithms of sentiment analysis and SWLM-TC algorithms, SWLM-HMM algorithm comparing results are as follows,

1) SWLM-HMM in accuracy rate>SWLM-TC>Accuracy rate scope of TC, the TC algorithm in 7 granularities be Accuracy rate scope of 52.42%-54.54%, the SWLM-TC algorithm in 7 granularities is 60.12%-65.73%, SWLM-HMM Accuracy rate scope of the algorithm in 7 granularities is 69.60%-73.21%, it is therefore seen that, TC algorithms fall in percentage magnitude After SWLM-TC and SWLM-HMM algorithms, SWLM-TC lags behind SWLM-HMM in percentage magnitude.

2) SWLM-HMM in recall rate>SWLM-TC>Recall rate scope of TC, the TC algorithm in 7 granularities be Recall rate scope of 60.14%-62.09%, the SWLM-TC algorithm in 7 granularities is 71.31%-74.23%, SWLM-HMM Recall rate scope of the algorithm in 7 granularities is 79.39%-83.88%, and TC algorithms lag behind SWLM- in percentage magnitude TC and SWLM-HMM algorithms, SWLM-TC lag behind SWLM-HMM in percentage magnitude.

3) SWLM-HMM in F1 values>SWLM-TC>TC, in several evaluation criterions, SWLM-TC and SWLM-HMM performance It is better than TC algorithms, F1 value scope of the TC algorithms in 7 granularities is 56.43%-57.66%, and SWLM-TC algorithms are in 7 grains F1 value scopes on degree are 66.10%-69.72%, and F1 value scope of the SWLM-HMM algorithms in 7 granularities is 75.26%- 77.90%, TC algorithm lag behind SWLM-TC and SWLM-HMM algorithms in percentage magnitude, and SWLM-TC is in percentage magnitude Lag behind SWLM-HMM.

From data above contrast, find either accuracy rate, recall rate, in F1 values, SWLM-TC algorithms and SWLM- The result of HMM algorithms will be better than TC algorithm, this demonstrate that, only compared using performance of traditional TC algorithms in fine granularity calculating Difference.

Micro- average and grand average data such as Fig. 7 and Fig. 8.

It is better than TC from grand average and micro- average broken line it can be seen from the figure that SWLM-HMM and SWLM-TC algorithm performance Algorithm, in SWLM-TC and SWLM-HMM algorithms, SWLM-HMM performance is better than the performance of SWLM-TC algorithms.Three kinds of algorithms Performance on same data set is almost 10% magnitude.

From data set distribution, the column diagram of several tables, three distributions such as Fig. 9, figure of three algorithms in classification are drawn Shown in 10 and Figure 11:

It is can be seen that from Fig. 9, Figure 10 and Figure 11 in correct classification, the correct classifying text of TC algorithms is minimum, assigns to by mistake Wrong bar number under such also compare it is more, in such this project divided is belonged to by mistake, the bar numbers of TC algorithms be also it is most, It is all leading with TC algorithms in SWLM-TC and SWLM-HMM and corresponding data.

3.3 experiment conclusion

By raw experimental data, and carry out in the calculating of accuracy rate, recall rate and F1 values, it can be deduced that following viewpoint with Conclusion:

1) SWLM-TC algorithms are higher 7.7%-13.31% percentage points than traditional TC algorithms in the degree of accuracy, and SWLM-HMM is calculated Method is higher than SWLM-TC algorithm 9.48%-13.09% percentage points in the degree of accuracy.Illustrate that SWLM-X algorithms exist in data It is higher than traditional algorithm in the degree of accuracy, it is due to that algorithm compares computational methods proposed by the present invention before being classified, SWLM- TC and SWLM-HMM have passed through the stage of the random co-occurrence network of emotion word, and the role in the present invention of co-occurrence word network embodies Out, the original intention for demonstrating the algorithm proposed by the present invention based on emotion Term co-occurrence network be to;In SWLM-TC and In the comparison of SWLM-HMM algorithms, SWLM-HMM algorithms will be less than by finding SWLM-TC accuracy rate, be because SWLM-HMM algorithms Before classification, the training of collection is trained using SWLM-TC algorithms, is gone again point using the model that trains in the later stage Class, and the text also obscured in the later stage to emotion has carried out emotion prediction, uses the two strategies to cause SWLM-HMM in standard SWLM-TC and TC algorithms are surmounted naturally in true rate.

The feelings of the bigger appearance of weight in athymia of the emotion Term co-occurrence network on completion text, and prominent text Word is felt, on this 2 points so that SWLM-X algorithms highlight advantage on fine granularity affection computation.

2) in recall rate, SWLM-TC algorithms will be higher by 11.17%-14.09% percentage points than traditional TC algorithms, SWLM-HMM algorithms will be higher by 8.08%-12.57% percentage points than SWLM-HMM algorithm, illustrate SWLM-TC and SWLM- For HMM algorithms when classifying to certain class, the ability that can correctly classify is higher than traditional algorithm, is because the present invention carries The algorithm and framework gone out is when some class is classified, the efficiency of the demarcation of the specific emotion word for belonging to such and accurate Degree is will be high than traditional algorithm.Algorithm proposed by the present invention is higher than traditional algorithm in the ability of the important emotion word of protrusion, and And the feeling polarities in emotion vocabulary ontology library are employed in the present invention, this parameter is utilized well by text, passes through co-occurrence Number is multiplied with feeling polarities so that feeling polarities intense emotion word is more prominent, so as to reduce the dry of other emotion words Disturb so that more accurate when classification.

3) in the comparison of F1 values, SWLM-TC algorithms are higher by 9.67%-13.29% percentage points than traditional algorithm, SWLM-HMM algorithms will be higher by 9.16%-11.8% percentage points than SWLM-TC algorithm, and overall merit aspect, the present invention proposes SWLM-TC algorithms and SWLM-HMM will be better than traditional algorithm, in terms of this for, algorithm proposed by the present invention is in synthesis Exceed in performance and carry out fine granularity emotional semantic classification algorithm using traditional algorithm.

4) from each item data of experiment, Computational frame and algorithm proposed by the present invention classification correlated performance not Mistake, but it is very big with the difference in terms of emotional orientation analysis, because fine granularity calculates not only requires nothing more than algorithm in all fields Fine grit classification can be carried out, it is also higher in jamproof performance, for the emotion word occurred in an article, use demarcation Way is easy to disturb, and causes unnecessary more classification problems, because the Sentiment orientation rate of exchange in an article are stable, one As in the case of, be critically important property in feeling polarities and co-occurrence, in general sorting algorithm is handled not to this, institute When to handle this class text, algorithm proposed by the present invention is just more advantageous, in general text, emotion pole Property the stronger word and not strong text of co-occurrence, algorithm of the invention when processing as the algorithm based on demarcation, Unlike but, the mechanism that prediction is employed in the text fuzzyyer to emotion is used among the present invention so that right Can be more much better than without forecasting mechanism in terms of the processing of this class text.The present invention employs SWLM calculations in fine granularity Computational frame Method, in treatment mechanism, the relevant knowledge of complex network is absorbed, treat the completion aspect of the carry out emotion word in classifying text Aspect of performance have relatively good effect, after taking above-mentioned mechanism, fine granularity Computational frame to fine granularity calculate passes through Experimental verification is studied.

Claims (3)

1. a kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word, theoretical using random network, word is utilized Language co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, one is formed with affective characteristics structure based on word order Stochastic network model, i.e. emotion Term co-occurrence network model, model yojan are carried out on this basis, by the most long matching process of emotion word (SWLM, Sentimental Word Longest Match) and TC algorithms, which combine, carries out SWLM-TC unsupervised learning classification, or Further the most long matching process of emotion word and HMM machine learning algorithms are combined to establish fine granularity sentiment classification model and utilize and are somebody's turn to do Model realization classification prediction, wherein, the building process of the emotion Term co-occurrence network model is as follows:
1) subordinate sentence is performed to each text to operate to obtain one group of orderly sentence S1→S2→…→Sn
2) to each sentence SiSegmented, filter out stop words and insignificant notional word, carried out using emotion vocabulary ontology library Emotion word marks, and obtains one group of orderly emotion word W1→W2→…→Wn
3) to each sentence, vocabulary pair is extracted from sentence using WL positions sliding window<wi,wj>IfThen added into W One new node wi, and be wiWeight nwiIf initial value is 1;Otherwise nwiAdd 1, ifOne is then added into E New side (wi,wj), and be (wi,wj) weight nwi,wjIf initial value is 1;Otherwise nwi,wjAdd 1;
4) after all text-processings are completed, network model G, which is established, to be completed;
Wherein, S is represented by the molecular sequence of a plurality of sentence, and w represents the emotion word extracted, w ∈ ∑s, and ∑ collects for Chinese word, Chinese word collects to remove stop words, the emotional noumenon word set after emotion vocabulary ontology library mark again after meaningless notional word;W For network model G node set, W={ wi| i ∈ [1, N] }, N is G node number;E be network model G line set, net The number on network model G side is M, E={ (wi,wj)|wi,wj∈ W, and wiAnd wjBetween order cooccurrence relation be present, (wi,wj) Represent from node wiPoint to node wjDirected edge;NWFor the weight of network model G interior joints, NW={ nwi|wi∈w};NE is net The weight on side in network model G, represent node wiWith wjBetween side weight, NE={ nwi,wj|(wi,wj)∈E};
By network model G according to pleasure, good, anger, sorrow, fear, dislike, shy seven kinds of moods and be divided into 7 sub-networks, sub-network split process In, if the situation of fracture occurs, it is attached using that node of weight highest with the network sub-block being broken, structure can The seven sub- network G x calculated for fine granularity | x={ 1,2,3,4,5,6,7 } is G1,G2,G3,G4,G5,G6,G7
Characterized in that, when being classified, have and be defined as below:
Most long weight coupling path length dmax(S):Network G x | x={ 1,2,3,4,5,6,7 }, if two emotion word orders cover Lid, then matched using the side being joined directly together, if two emotion words are in network GxIn network compartments be present, then select path When selection by weight maximum node matched, as S length, calculation formula are as follows:
<mrow> <msub> <mi>d</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>d</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mo>+</mo> <mi>x</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow>
Wherein dmax(wi,wi+x) be in network i-th of word to the weight limit coupling path of the i-th+x words;
Emotion weight coefficient SW (Sentimetal weight):In network G, the respective shared feeling polarities ratio of seven sub-networks Weight, using this coefficient classification can be allowed more obvious, reduced because classification problem caused by boundary is fuzzy, makes word in emotion word network Number of recurrences be freq, polar intensity P, calculation formula is as follows:
WCi=freq × P
<mrow> <msub> <mi>W</mi> <mi>y</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>WC</mi> <mi>i</mi> </msub> </mrow>
<mrow> <msub> <mi>SW</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>W</mi> <mrow> <mi>y</mi> <mi>i</mi> </mrow> </msub> <mo>/</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>7</mn> </munderover> <msub> <mi>W</mi> <mrow> <mi>y</mi> <mi>i</mi> </mrow> </msub> </mrow>
Wherein WC is the emotion numerical value of each word in sub-network, WyFor the emotion numerical value of sub-network, SWxFor sub-network x SW values, That is emotion weight coefficient;
Classification factor CC (Classification coefficient):After being determined in maximum matching word path, this paths On word reproduction degree Re and emotion intensity power, it is assumed that have n word, then calculation formula is as follows:
CCi=Re × power
<mrow> <mi>C</mi> <mi>C</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>CC</mi> <mi>i</mi> </msub> </mrow>
Wherein CCiIt is the classification factor of single word;
Classify predictive coefficient CPC (Classification prediction coefficient):Using machine learning algorithm When being classified, for can not judgement sample the forecasting mechanism taken of classification;According to SWxIt is ranked up, if SW1+SW2> 80%, SW1/SW2>1.5, then it is included into SW1Under, if SW1+SW2>80%, SW1/SW2<=1.5, SW is included into this case1 And SW2Under two attributes;If SW1+SW2<80%, then it represents that the classification of this article is more complicated, is included into according to classification factor Under corresponding classification:
2. the fine granularity sensibility classification method according to claim 1 based on the random co-occurrence network of emotion word, it is characterised in that In the emotion vocabulary ontology library, emotion is divided into the group of 7 major class 21, and emotional semantic classification is respectively that pleasure { happy (PA), is felt at ease (PE) }, good { respect (PD), praise (PH), believe (PG), like (PB), wish (PK) }, anger { angry (NA) }, sorrow are { sad (NB), disappointed (NJ), remorse (NH), think (PE), fear { flurried (NI), frightened (NC), shy (NG) }, dislike unhappy (NE), abhor (ND), Censure (NN), envy (NK), suspect (NL) }, frightened { surprised (PC) };Power points of emotion intensity is 1,3,5,7,9 five grades, and 9 represent Maximum intensity, 1 represents that intensity is minimum, and the part of speech species one in emotion vocabulary body is divided into 7 classes, is noun (noun) respectively, Verb (verb), adjective (adj), adverbial word (adv), network vocabulary (nw), Chinese idiom (idiom), prepositional phrase (prep), contain altogether There is emotion word 27466.
3. the fine granularity sensibility classification method according to claim 1 based on the random co-occurrence network of emotion word, it is characterised in that The most long matching process of the emotion word carries out most long matching by the weight limit vocabulary of emotion word so that does not use disambiguation and prevents Noise processed, you can under Accurate classification to related emotion theme, and weight calculation is carried out by seven subclassification models, drawn The parameter of machine learning classification can be carried out.
CN201610936655.9A 2016-10-24 2016-10-24 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word CN106547866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610936655.9A CN106547866B (en) 2016-10-24 2016-10-24 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610936655.9A CN106547866B (en) 2016-10-24 2016-10-24 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word

Publications (2)

Publication Number Publication Date
CN106547866A CN106547866A (en) 2017-03-29
CN106547866B true CN106547866B (en) 2017-12-26

Family

ID=58392940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610936655.9A CN106547866B (en) 2016-10-24 2016-10-24 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word

Country Status (1)

Country Link
CN (1) CN106547866B (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8417713B1 (en) * 2007-12-05 2013-04-09 Google Inc. Sentiment detection as a ranking signal for reviewable entities
CN104899231A (en) * 2014-03-07 2015-09-09 上海市玻森数据科技有限公司 Sentiment analysis engine based on fine-granularity attributive classification

Also Published As

Publication number Publication date
CN106547866A (en) 2017-03-29

Similar Documents

Publication Publication Date Title
Chen et al. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN
Liu et al. Recurrent neural network for text classification with multi-task learning
Preoţiuc-Pietro et al. Beyond binary labels: political ideology prediction of twitter users
Pandey et al. Twitter sentiment analysis using hybrid cuckoo search method
Babar et al. Improving performance of text summarization
Li et al. Commonsense knowledge base completion
CN105740228B (en) A kind of internet public feelings analysis method and system
CN101408883B (en) Method for collecting network public feelings viewpoint
CN106407333B (en) Spoken language query identification method and device based on artificial intelligence
CN104462066B (en) Semantic character labeling method and device
CN104137102B (en) Non- true type inquiry response system and method
Ramadan et al. Large-scale multi-domain belief tracking with knowledge sharing
Naili et al. Comparative study of word embedding methods in topic segmentation
Sahlgren et al. The effects of data size and frequency range on distributional semantic models
CN102789498B (en) Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning
Saleena An ensemble classification system for twitter sentiment analysis
Zubiaga et al. Discourse-aware rumour stance classification in social media using sequential classifiers
CN105895087A (en) Voice recognition method and apparatus
CN103034626A (en) Emotion analyzing system and method
CN103116637A (en) Text sentiment classification method facing Chinese Web comments
JP2009099088A (en) Sns user profile extraction device, extraction method and extraction program, and device using user profile
CN105069072B (en) Hybrid subscriber score information based on sentiment analysis recommends method and its recommendation apparatus
Zhu et al. Clinical concept extraction with contextual word embedding
CN103324700B (en) Noumenon concept attribute learning method based on Web information
CN101127042A (en) Sensibility classification method based on language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant