CN106547866B - A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word - Google Patents
A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word Download PDFInfo
- Publication number
- CN106547866B CN106547866B CN201610936655.9A CN201610936655A CN106547866B CN 106547866 B CN106547866 B CN 106547866B CN 201610936655 A CN201610936655 A CN 201610936655A CN 106547866 B CN106547866 B CN 106547866B
- Authority
- CN
- China
- Prior art keywords
- word
- emotion
- network
- classification
- mrow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 54
- 230000002996 emotional Effects 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000010801 machine learning Methods 0.000 claims abstract description 16
- 230000036651 mood Effects 0.000 claims abstract description 9
- 230000013016 learning Effects 0.000 claims abstract description 5
- 230000000875 corresponding Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000001808 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 241000894007 species Species 0.000 claims description 4
- 210000001503 Joints Anatomy 0.000 claims description 3
- 235000019580 granularity Nutrition 0.000 description 37
- 238000004458 analytical method Methods 0.000 description 16
- 241001269238 Data Species 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 241000668709 Dipterocarpus costatus Species 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000002596 correlated Effects 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 101700010564 penG Proteins 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000000527 sonication Methods 0.000 description 2
- 206010007559 Cardiac failure congestive Diseases 0.000 description 1
- 206010027940 Mood altered Diseases 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005755 formation reaction Methods 0.000 description 1
- 230000003211 malignant Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006011 modification reaction Methods 0.000 description 1
- 230000000717 retained Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word, it is theoretical using random network, utilize word co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, form a stochastic network model based on word order built with affective characteristics, that is emotion Term co-occurrence network model, model yojan is carried out on this basis, the most long matching process of emotion word and TC algorithms are combined and carry out the classification of SWLM TC unsupervised learnings, or further the most long matching process of emotion word and HMM machine learning algorithms are combined and establishes fine granularity sentiment classification model and is predicted using model realization classification;The fine granularity emotional semantic classification of the achievable paragraph level text of the present invention, improve the precision of simple TC algorithms, make classification more accurate, using SWLM TC to after sample set progress HMM model training and carrying out mood classification to sample to be tested storehouse, improve the automation of simple machine learning algorithm.
Description
Technical field
The invention belongs to technical field of information retrieval, more particularly to a kind of fine granularity based on the random co-occurrence network of emotion word
Sensibility classification method.
Background technology
In recent years, with the fast development of economy and information technology, the development of social pattern of networking profound influence is interconnected,
And huge impetus is generated to economy, internet resident generates vast as the open sea information, accelerates in mobile Internet
During landing, the popularization of various Intelligent mobile equipments, information is allowed to be passed with lower cost, faster speed in internet
Broadcast, different types of information can produce different influences, and passive speech can allow netizen to produce negative effect, and malignant population disappears
The generation of breath and public accident, can not only influence the emotion of individual, or even can produce huge economic loss, excavate emotion letter
Breath is just into urgent problems.In terms of the construction of text emotion corpus, current existing corpus includes Pang corpus,
Whissell corpus, Berardinelli film comment corpus, product review corpus, and Chinese Emotional Corpus marks
The resource of aspect is then fewer, and Tsing-Hua University is labelled with the emotion language material of part of tourism sight spot description, is synthesized for assistant voice,
But scale is also smaller.Blog, the forum text related to on-line news and commentary etc. are called new text in the world, on network
Such new text carry out sentiment analysis for us and provide data source, the analysis to new text turns into processing works as
One focus of preceding research.In today of informationization, network has become a part for people's life, and mood is parsed into understand
The important references of the true idea of netizen, in the contingency management of public accident, utilize the new text research network condition of the people on network
Thread turns into a new direction.
Relatively go deep into for the tendency Journal of Sex Research of text at present, the relatively success in product review and film review, in view of
The complexity of language, the difference of individual expression, and formation to human emotion do not have systematic description, at present for fine granularity
Sentiment analysis is also seldom, and Chinese during evolution, make by many reasons such as grammer is free, have a large vocabulary, form is freer
Into the difference with English sentiment analysis, the semantic analysis being commonly used in English is very big in the analysis difficulty of Chinese, causes
Many difficulties.Sentiment analysis and psychologic relation are inseparable, and psychological study is found, between vocabulary and human emotion
Relation can measure, and the semantic tendency of independent vocabulary or phrase is important for passing on human emotion.There is research table
Bright, the semantic tendency of vocabulary and phrase mainly has two phenomenons:1) the emotion term of same tendency often occurs simultaneously;2) it is opposite
The emotion term of tendency occurs when typically different.Due to the presence of the two phenomenons, sentiment analysis can simplify many things, have
Research shows that the word co-occurrence network that the plain text of English and Chinese is established all meets small world, and in this network
On the basis of carried out the research of text segmentation and the aspect of subject extraction, have research by stochastic network model for text subject point
Analysis, there is research that stochastic network model is used for into sentiment classification, but random network theory is applied to the fine granularity feelings of text
In sense analysis, have no that correlative study is reported at present.
The content of the invention
The shortcomings that in order to overcome above-mentioned prior art, it is an object of the invention to provide one kind to be based on the random co-occurrence of emotion word
The fine granularity sensibility classification method of network, by the most long matching SWLM (Sentimental Word Longest Match) of emotion word
It is combined with machine learning algorithm, the fine granularity emotional semantic classification of paragraph level text can be achieved.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word, it is theoretical using random network, utilize
Word co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, one is formed with affective characteristics structure based on word order
Stochastic network model, i.e. emotion Term co-occurrence network model, model yojan is carried out on this basis, by the most long match party of emotion word
Method (SWLM, Sentimental Word Longest Match) and TC algorithms, which combine, carries out SWLM-TC unsupervised learning classification,
Or further the most long matching process of emotion word and HMM machine learning algorithms are combined and establishes fine granularity sentiment classification model and utilizes
Model realization classification prediction.
The building process of the emotion Term co-occurrence network model is as follows:
1) subordinate sentence is performed to each text to operate to obtain one group of orderly sentence S1→S2→…→Sn;
2) to each sentence SiSegmented, filter out stop words and insignificant notional word, use emotion vocabulary body
Storehouse carries out emotion word mark, obtains one group of orderly emotion word W1→W2→…→Wn;
3) to each sentence, using WL (Word Long, length of window, typically take 2) position sliding window word is extracted from sentence
Remittance pair<wi,wj>IfA new node w is then added into Wi, and be wiWeight nwiIf initial value is 1;Otherwise nwi
Add 1, ifA new side (w is then added into Ei,wj), and be (wi,wj) weight nwi,wjIf initial value is 1;It is no
Then nwi,wjAdd 1;
4) after all text-processings are completed, network model G, which is established, to be completed;
Wherein, S represents that, by the molecular sequence of a plurality of sentence, w represents the emotion word extracted, and w ∈ ∑s, ∑ is Chinese vocabulary
Collection, Chinese word collects to remove stop words, the emotional noumenon word after emotion vocabulary ontology library mark again after meaningless notional word
Collection;W be network model G node set, W={ wi| i ∈ [1, N] }, N is G node number;E is network model G side collection
Close, the number on network model G side is M, E={ (wi,wj)|wi,wj∈ W, and wiAnd wjBetween order cooccurrence relation be present,
(wi,wj) represent from node wiPoint to node wjDirected edge;NWFor the weight of network model G interior joints, NW={ nwi|wi∈w};
NE is the weight on side in network model G, represents node wiWith wjBetween side weight, NE={ nwi,wj|(wi,wj)∈E}。
In the emotion vocabulary ontology library, emotion is divided into the group of 7 major class 21, emotional semantic classification be respectively it is happy happy (PA),
Feel at ease (PE) }, good { respect (PD), praise (PH), believe (PG), like (PB), wish (PK) }, anger { angry (NA) }, sorrow it is { sad
Wound (NB), disappointed (NJ), remorse (NH), think (PE), fear { flurried (NI), frightened (NC), shy (NG) }, dislike and { unhappy (NE), abhor
(ND), censure (NN), envy (NK), suspect (NL) }, frightened { surprised (PC) };Power points of emotion intensity is 1,3,5,7,9 five grades,
9 represent maximum intensity, and 1 represents that intensity is minimum, and the part of speech species one in emotion vocabulary body is divided into 7 classes, is noun respectively
(noun), verb (verb), adjective (adj), adverbial word (adv), network vocabulary (nw), Chinese idiom (idiom), prepositional phrase
(prep) emotion word 27466, is contained altogether.
By network model G according to pleasure, good, anger, sorrow, fear, dislike, shy seven kinds of moods and be divided into 7 sub-networks, sub-network was split
Cheng Zhong, if the situation of fracture occurs, it is attached, is built with the network sub-block being broken using that node of weight highest
The seven sub- network G x calculated available for fine granularity | x={ 1,2,3,4,5,6,7 } is G1,G2,G3,G4,G5,G6,G7。
The most long matching process of the emotion word carries out most long matching by the weight limit vocabulary of emotion word so that does not use
Disambiguation and noise control sonication, you can under Accurate classification to related emotion theme, and weight is carried out by seven subclassification models
Calculate, draw the parameter that can carry out machine learning classification.
When being classified, have and be defined as below:
Most long weight coupling path length dmax(S):Network G x | x={ 1,2,3,4,5,6,7 }, if two emotion words are suitable
Sequence covers, then is matched using the side being joined directly together, if two emotion words are in network GxIn network compartments be present, then select
Selection is matched by the maximum node of weight when path, and as S length, calculation formula are as follows:
Wherein dmax(wi,wi+x) be in network i-th of word to the weight limit coupling path of the i-th+x words;
Emotion weight coefficient SW (Sentimetal weight):In network G, the respective shared emotion pole of seven sub-networks
Property proportion, using this coefficient classification can be allowed more obvious, reduced because classification problem caused by boundary is fuzzy, make emotion word network
The number of recurrences of middle word is freq, polar intensity P, and calculation formula is as follows:
WCi=freq × P
Wherein WC is the emotion numerical value of each word in sub-network, WyFor the emotion numerical value of sub-network, SWxFor sub-network x SW
Value, i.e. emotion weight coefficient;
Classification factor CC (Classification coefficient):After being determined in maximum matching word path, this
The reproduction degree Re and emotion intensity power of word on path, it is assumed that have n word, then calculation formula is as follows:
CCi=Re × power
Wherein CCiIt is the classification factor of single word;
Classify predictive coefficient CPC (Classification prediction coefficient):Using machine learning
When algorithm is classified, for can not judgement sample the forecasting mechanism taken of classification;According to SWxIt is ranked up, if SW1+
SW2>80%, SW1/SW2>1.5, then it is included into SW1Under, if SW1+SW2>80%, SW1/SW2<=1.5, it is included into this case
SW1And SW2Under two attributes;If SW1+SW2<80%, then it represents that the classification of this article is more complicated, according to classification factor
It is included under corresponding classification:
The SWLM-TC methods comprise the following steps:
1) subordinate sentence is carried out to the article of required classification, institute subordinate sentence subsequence is S '1→S′2→…→S′n;
2) each order sentence is segmented, and removes insignificant notional word, auxiliary word, and use emotion vocabulary body
Dictionary is labeled, and selects labeled word, in sequence, i.e. W '1→W′2→…→W′n;
3) corresponding network search is carried out according to the ownership of the word marked;
4) Path selection is carried out to the word in network, if two adjacent words, then using the road being joined directly together
Footpath;If two non-conterminous words, then select on their phase access paths, by the word in weight limit path, according to upper
State step and find weight limit path, find out dmax(S);
5) weight limit path d is calculatedmax(S) the classification factor CC on;
6) the classification factor CC under each home subnet network of calculating, coefficient of comparisons size, if identical, classification factor CC*
SW, SW are emotion weight coefficient (Sentimetal weight), that is, weight of the emotion of classifying in 7 sub-networks, if not phase
Together, then, if first weight accounts for 80 percent, corresponding mood is belonged to according to principle of ordering final classification factor CC
Under network, if no more than 80 percent, this classification is grouped under weight ranking the first two mood network;
In the case of 7) if classification can not being ensured, treat classifying text according to classification predictive coefficient CPC classify it is pre-
Survey.
It is described to establish fine granularity sentiment classification model and be using the method for model realization classification prediction:
1) fine grit classification is carried out to a part of text in wherein all samples using SWLM-TC, calculated in sample set
The weight coefficient SW of the affiliated emotion of each textx, another part text is as classification confirmatory experiment;
2) all samples for being classified using SWLM-TC:Calculate the classification factor CC of each text, classification factor
Classified according to the step of SWLM-TC algorithms the 6th, then this sample is added the corresponding emotional semantic classification collection TS of xxUnder (Train Set),
If in the case of classification factor is indeterminable, carried out using SWLM-TC, then it is predicted using the step of SWLM-TC algorithms the 7th,
It is grouped under corresponding classification;
3) after sample data having been calculated into text emotion using SWLM-TC algorithms, the text training accordingly classified is used
HMM disaggregated models, then it is trained using HMM disaggregated models:
A) for text to be measured, classified using HMM algorithms, if can correctly classify, decide that this text
Sub- emotional semantic classification;
B) for without classification results text, classification prediction is carried out using classification predictive coefficient CPC.
HMM is a kind of machine learning method, carries out affection computation to sample set first by SWLM-TC algorithms, is divided
Class is 7 sub- emotion text Sample Storehouses, trains HMM model using Sample Storehouse, uses the HMM model trained, it is possible to text
The remaining a part of text in this storehouse carries out text classification test checking
Compared with prior art, the beneficial effects of the invention are as follows:
1st, this method can carry out fine grit classification to the emotion of text, calculate, have more different from traditional tendentiousness
Fine-grained classification.
2nd, the precision of simple TC algorithms is improved, makes classification more accurate.
3rd, carried using SWLM-TC to after sample set progress HMM model training and carrying out mood classification to sample to be tested storehouse
The high automation of simple machine learning algorithm.
Brief description of the drawings
Fig. 1 is inventive algorithm general flow chart.
Fig. 2 is SWLM-TC algorithm flow charts of the present invention.
Fig. 3 is SWLM-HMM algorithm flow charts of the present invention.
Fig. 4 is the line chart of the labeling algorithm TC experimental datas based on word frequency.
Fig. 5 is the line chart of SWLM-TC heuritic approach experimental datas.
Fig. 6 is the line chart of SWLM-HMM algorithm experimental data.
Fig. 7 is micro- average data schematic diagram in present invention experiment.
Fig. 8 is grand average data schematic diagram in present invention experiment.
Fig. 9 is grouped data distribution schematic diagram in present invention experiment (correct classification).
Figure 10 is grouped data distribution schematic diagram (assigning to such by mistake) in present invention experiment.
Figure 11 is grouped data distribution schematic diagram in present invention experiment (belonging to such by mistake to be divided).
Embodiment
Describe embodiments of the present invention in detail with reference to the accompanying drawings and examples.
As shown in figure 1, a kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word of the present invention, first,
It is theoretical using random network, using word co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, one is formed with emotion spy
The stochastic network model based on word order of structure, i.e. emotion Term co-occurrence network model are levied, carries out model about on this basis
Letter, the most long matching process (SWLM, Sentimental Word Longest Match) of emotion word and TC algorithms are combined and carried out
SWLM-TC unsupervised learnings are classified, or are further combined the most long matching process of emotion word and HMM machine learning algorithms and established carefully
Granularity sentiment classification model simultaneously utilizes model realization classification prediction.Particular content is as follows:
The 1 emotion word co-occurrence model based on random network
Studied for the ease of carrying out fine granularity to paragraph level text, find the inherent law between emotion word, this hair
It is bright by improving document [YANG Feng, PENG Qin-ke, XU Tao, Sentiment Classification for
Comments Based on Random Network Theory,Acta Automatica Sinica,2010.6 Vol.36,
No6] method that is proposed, to build the emotion Term co-occurrence network model of suitable fine granularity sentiment analysis.
1.1 emotion vocabulary body dictionaries
Chinese emotion vocabulary ontology library is that Dalian University of Technology's Research into information retrieval room passes through under the guidance that Lin Hongfei is taught
Cross the Chinese ontological resource that the effort of all teaching and research room members is arranged and marked.The emotional semantic classification of Chinese emotion vocabulary body
System is built on the basis of the 6 major class emotional semantic classification systems of more influential Ekman abroad.On Ekman basis
On, vocabulary body adds emotional category " good " and has carried out finer division to commendation emotion.Emotion in final vocabulary body
It is divided into the group of 7 major class 21.Emotional semantic classification is respectively happy { happy (PA), feeling at ease (PE) }, good { respect (PD), commendation (PH), phase
Letter (PG), like (PB), wish (PK) }, anger { angry (NA) }, sorrow { sad (NB), disappointed (NJ), remorse (NH), think (PE) }, fear
{ flurried (NI), frightened (NC), shy (NG) }, dislike { unhappy (NE), abhor (ND), censure (NN), envy (NK), suspect (NL) }, be frightened
{ surprised (PC) }.Power points of emotion intensity is 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 represents that intensity is minimum.Emotion word
The part of speech species one converged in body is divided into 7 classes, is noun (noun) respectively, verb (verb), adjective (adj), adverbial word
(adv), network vocabulary (nw), Chinese idiom (idiom), prepositional phrase (prep), emotion word 27466 is contained altogether.
1.2 network model
Small-world network model [Watts D J, the Strogtz S H.Collective that Watts and Strogtz is introduced
dynamics of‘small-world’networks.Nature,1998,393(6684):440-442], Barabasi and
Scale-free model [Barabasi A L, the Albert R.Emergence of scaling in that Albert is proposed
random networks.Science,1999,286(5439):509-512] carry out complex network and start after sex work,
Compared to regular network and random network, worldlet and scales-free network have:Small average path length, big convergence factor
The co-occurrence patterns network constructed by the association between word has the correlation properties of small-world network.It can construct
Small average length and big convergence factor are utilized on meaning of a word network, is rapidly performed by the classification of emotion grain size characteristic.
1.3 emotion Term co-occurrence network models
Document [Shi Jing, Hu Ming, Dai Guo-Zhong.Topic analysis of Chinese text
based on small world model.Journal of Chinese Information Processing,2007,21
(3):69-75 (Shi Jing, Hu Ming, Topic Analysis of Chinese Text Journal of Chinese Information Processings of the state loyalty based on Small World Model is worn,
2007,21(3):In 69-75)], stochastic network model, document [YANG are established according to the common cooccurrence relation between word
Feng,PENG Qin-ke,XU Tao,Sentiment Classification for Comments Based on Random
Network Theory, Acta Automatica Sinica, 2010.6Vol.36, No6] in, according to the order between word
Cooccurrence relation establishes stochastic network model, and the increment type that this network model is used for news short commentary opinion creates word order co-occurrence net
Network, coordinate SCP algorithms, have very big benefit to the short commentary opinion for lacking a large amount of term networks, this algorithm calculates for tendentiousness
Compare suitable, but be not suitable for for the emotional semantic classification of more granularities.
In order to build fine-grained sentiment analysis, the present invention uses emotion word Build Order co-occurrence stochastic network model, feelings
Sense Term co-occurrence order embodies the information of the semantic aspect of emotion word, such as front and rear modification, the co-occurrence distance and word of emotion word
The semantic relation of language has very big relation.The present invention is to establish model, i.e. co-occurrence according to the close order cooccurrence relation of emotion word
The length of window WL smaller (typically taking 2) in region, and in view of the orbution of vocabulary co-occurrence.
After the completion of emotion Term co-occurrence network struction, by this big network model according to seven different emotion major classes,
It is divided into seven small emotion word networks, then carries out associative operation.
In order to describe the construction method of emotion Term co-occurrence network model, the mathematical definition of correlation is used:
∑:Chinese word collects, and the word finder that the present invention uses is to pass through emotion word again after removing stop words, meaningless notional word
Emotional noumenon word set after remittance ontology library mark;
w:The emotion word extracted.W∈∑;
S:By the molecular sequence of a plurality of sentence;
N:G node number;
M:The number on G side;
W={ wi|i∈[1,N]}:G node set;
E={ (wi,wj)|wi,wj∈ W, and wiAnd wjBetween existing order cooccurrence relation:G line set, wherein (wi,
wj) represent from node wiPoint to node wjDirected edge;
NW={ nwi|wi∈w}:The weight of G interior joints;
NE={ nwi,wj|(wi,wj)∈E}:The weight on side in G, represent node wiWith wjBetween side weight.
Emotion Term co-occurrence network model G method for building up is given below:
1) subordinate sentence is performed to each text to operate to obtain one group of orderly sentence S1→S2→…→Sn;
2) to each sentence SiSegmented, filter out stop words and insignificant notional word, use emotion vocabulary body
Storehouse carries out emotion word mark, obtains one group of orderly emotion word W1→W2→…→Wn;
3) to each sentence Si, vocabulary pair is extracted from sentence using WL (typically taking 2) position sliding window<wi,wj>IfA new node w is then added into Wi, and be wiWeight nwiIf initial value is 1;Otherwise nwiAdd 1, to wjOperation
With wiIt is similar, ifA new side (w is then added into Ei,wj), and be (wi,wj) weight nwi,wjIf initial value is
1;Otherwise nwi,wjAdd 1.
4) after all text-processings are completed, network model G, which is established, completes
5) network model G is divided into 7 sub-network subnets according to seven kinds of moods (happy, good, anger, sorrow, fear, dislike, be frightened)
In network split process, if the situation of fracture occurs, connected using that point of weight highest and the network sub-block of fracture
Connect, the seven sub-network (G calculated available for fine granularity1,G2,G3,G4,G5,G6,G7) structure completion.
The emotion fine granularity tagsort of 2 text-orienteds
In research before, using the Sentiment orientation of text as research emphasis, with the research that deepens continuously in this field,
Fine-grained researching value and purposes just highlight, and fine granularity and sentiment classification emphasis are different, and fine granularity is one
Individual polytypic problem, and tendentiousness only needs to calculate the tendentiousness of text, used artificial mark dictionary is also
Different, tendency Journal of Sex Research only needs to mark out the tendentiousness of word, and fine granularity marks dictionary emotion vocabulary ontology library then to relating to
And emotion word carried out the correlated characteristics such as part of speech species, intensity, polarity and marked.Calculated with the use of HMM machine learning
Method carries out fine granularity emotional semantic classification.
The most long matching sorting technique SWLM of emotion word, most long matching is carried out by the weight limit vocabulary of emotion word so that
Disambiguation and noise control sonication are not used, can just compare to be accurate to and be categorized under related emotion theme, and pass through seven subclassifications
Model carries out calculating weight, draws the parameter that machine learning classification is carried out available for HMM.
The present invention is defined as below:
Define 1 (most long weight coupling path length dmax(S))
Network GxIn, if two emotion word order coverings, are matched using the side being joined directly together;If two feelings
Feel word G in a networkxNetwork compartments be present, then select to select to be matched by the maximum node of weight when path, i.e.,
For most long weight coupling path S length.Calculation formula is as follows:
Wherein dmax(wi,wi+x) be in network i-th of word to the weight limit coupling path of the i-th+x words.
Define 2 (emotion weight coefficient SW (Sentimetal weight))
In meaning of a word network G, the respective shared feeling polarities proportion of seven sub-networks, using this coefficient classification can be allowed brighter
It is aobvious, reduce because classification problem caused by boundary is fuzzy, the number of recurrences for making word in emotion word network is freq, and polar intensity is
P.Calculation formula is as follows:
WCi=freq × P
Wherein WC is the emotion numerical value of each word in sub-network, WyFor the emotion numerical value of sub-network, SWxFor sub-network x SW
Value, i.e. emotion weight coefficient.
2.1 use the SWLM-TC unsupervised learning sorting techniques of TC algorithms
Define 3 (classification factor CC (Classification coefficient))
Defined classification factor when classification factor is SWLM-TC unsupervised algorithm classifications, it is true in maximum matching word path
After fixed, the reproduction degree Re of the word on this paths, also emotion intensity power, it is assumed that have n word, calculation formula is such as
Under:
CCi=Re × power
Wherein CCiIt is the classification factor of single word.
Define 4 (classification predictive coefficient CPC (Classification prediction coefficient))
Classification predictive coefficient be when being classified using machine learning algorithm, for can not judgement sample classification
When, the forecasting mechanism taken.According to SWxIt is ranked up, if SW1+SW2>80%, wherein SW1/SW2>1.5, then it is included into
SW1Under, otherwise it is included into SW1And SW2Under;If SW1+SW2<80%, then it represents that, the classification of this article is more complicated, then according to
Classification factor is included under corresponding classification.
Due to the appearance of the text emotion word of paragraph level, the main line train of thought of article emotion is showed, random by co-occurrence
The use of network, this affectional emotion train of thought are retained, therefore are had well by the random co-occurrence network of order
Performance.
With reference to figure 2, the processing step that TC algorithms are marked using emotion word based on SWLM is as follows:
1) subordinate sentence, S ' are carried out to the article of required classification1→S′2→…→S′n;
2) each order sentence is segmented, and removes insignificant notional word, auxiliary word, and use emotion vocabulary body
Dictionary is labeled, and selects labeled word, in sequence, i.e. W '1→W′2→…→W′n;
3) corresponding network search is carried out according to the ownership of the word marked;
4) Path selection is carried out to the word in network, I is if two adjacent words, then using the road being joined directly together
Footpath;II is then selected on their phase access paths if two non-conterminous words, by the word in weight limit path, according to
Above-mentioned steps find weight limit path, find out dmax(S);
5) weight limit path d is calculatedmax(S) the classification factor CC on, calculating process refer to definition 3;
6) the classification factor CC under each home subnet network is calculated, coefficient of comparisons size, if I coefficient magnitudes are identical, is divided
Class coefficient CC*SW;If II coefficients differ, carry out step III, III according to principle of ordering final CC, if first
Weight accounts for 80 percent, then belongs under corresponding mood network, if no more than 80 percent, this classification is grouped into
Weight coefficient CC, under preceding two class.
7) in the case of if classification can not being ensured, classifying text is treated according to defining 4 and carries out classification prediction.
2.2 SWLM-HMM based on supervision machine study algorithm classification
Machine learning occupies very big effect in text classification, and HMM algorithms have extraordinary performance in NLP, due to
The terseness of HMM algorithms, amount of calculation is small, can be carried out training for the sample sequence of random length, in view of HMM is to fine granularity feelings
Sense classification is learnt, to improve the degree of accuracy of SWLM-HMM classification.
When classification using SWLM-HMM, it is impossible to directly corpus is trained using HMM, entered with reference to SWLM
Handled after row processing, reuse HMM algorithms and be trained, the degree of accuracy of classification can be improved and accelerate classification speed.
With reference to figure 3, the method for training corpus is as follows:
(1) a part of text in Sample Storehouse is used, fine grit classification is carried out to sample set using SWLM-TC, wherein
SWLM-TC assorting process calculates the weight coefficient SW of the affiliated emotion of this sample as shown in SWLM-TCx;
(2) all samples for being classified using SWLM-TC:Calculate the classification factor CC of each text, classification system
Number is classified according to the step of SWLM-TC algorithms the 6th, then this sample is added the corresponding emotional semantic classification collection TS of xx(Train Set)
Under, if in the case of classification factor is indeterminable, carried out using SWLM-TC, then carried out using the step of SWLM-TC algorithms the 7th pre-
Survey, be grouped under corresponding classification;
(3) after the part text sampled in Sample Storehouse being used into the good sub- emotion of SWLM-TC algorithm classifications, by what is obtained
Training text, use the text training HMM disaggregated models accordingly classified;Wherein classify and be characterized in the emotion of each text
Word has marked, and has sequentially formed chain type word according to text, will each text during being trained using HMM model
This emotion word word string and the sub- emotion classified pass to HMM model as parameter, HMM model training are carried out, by all samples
HMM algorithms are all input to be trained;
A) for remaining text in Sample Storehouse, classified using HMM algorithms, if can correctly classify, carried out corresponding
Sort out and calculate.
B) for without classification results text, classification prediction is carried out using defining 4.
3 emotion fine granularity tagsorts are tested
3.1 grouped data
Experimental data of the present invention is using the blog data and CCF natural language processings collected and Chinese computing meeting evaluation and test
NLP&CC2014 data.Microblog data 7000 is crawled, have chosen 4000 blog datas therein, the number with NLP&CC2014
Merged according to 2000 in concentration similar topic, about 6000, the sample marked in last corpus data, wherein all
Selection is containing the microblogging being in a bad mood, and is rejected for loss of emotion microblogging and blog data, including following data are formed:
1)TrainDataNet:Use wherein 6000 microblog datas;
2)TrainDataHMM:Using 5000 microblog datas in 6000, wherein sampling gathers feelings in wherein containing 7
Feel the microblogging of data.
3)TrainDataTest:Using except TrainDataHMM other 1000 data.
The data distribution of table 1
Accuracy and recall rate apply two most-often used metrics in information retrieval and Statistical Classification field, are used for
The quality of evaluation result.
Present invention experiment is passed through using experiment gathered data and Chinese NLP&CC2014 Chinese tendentiousness evaluation and test data set
After the test of system, experimental result is as follows.
3.2 classification results
A. initial data is tested
1) SWLM-TC emotional semantic classification experiment
Using the validity of SWLM-TC verification algorithms, the validity of classification is weighed using accuracy, recall rate and F values.
By seven classification results displayings such as table 2:
The SWLM-TC algorithm classification results of table 2
Emotional semantic classification | Correctly classified | Assign to such by mistake | Belong to such but by mistake to be divided |
It is happy | 641 | 378 | 242 |
It is good | 654 | 341 | 227 |
Anger | 621 | 412 | 225 |
Sorrow | 610 | 384 | 239 |
Fear | 609 | 351 | 245 |
Dislike | 619 | 362 | 222 |
It is frightened | 627 | 359 | 219 |
2) SWLM-HMM emotional semantic classification experiment
SWLM-HMM algorithm experimental results are as shown in table 3
The SWLM-HMM algorithm classification results of table 3
3) the emotional semantic classification experiment of TC algorithms
The checking test result of the test of TC algorithms is as shown in table 4 below
The TC algorithm classification results of table 4
Emotional semantic classification | Correctly classified | Assign to such by mistake | Belong to such but by mistake to be divided |
It is happy | 531 | 468 | 352 |
It is good | 547 | 478 | 334 |
Anger | 511 | 426 | 335 |
Sorrow | 521 | 437 | 328 |
Fear | 534 | 456 | 320 |
Dislike | 508 | 434 | 333 |
It is frightened | 519 | 471 | 327 |
B. accuracy rate, recall rate, F1 values
Data calculating is carried out to above-mentioned experimental data, accuracy rate, recall rate and the F1 Value Datas that SWLM-TC is calculated are such as
Under:
1) SWLM-TC emotional semantic classification experiment
P, R and the F1 value of SWLM-TC algorithms are as shown in table 5 below
The SWLM-TC of table 5 P, R and F1 value
Emotional semantic classification | Accuracy rate | Recall rate | F1 |
It is happy | 62.90% | 72.59% | 67.40% |
It is good | 65.73% | 74.23% | 69.72% |
Anger | 60.12% | 73.40% | 66.10% |
Sorrow | 61.37% | 71.85% | 66.20% |
Fear | 63.44% | 71.31% | 67.14% |
Dislike | 63.10% | 73.60% | 67.95% |
It is frightened | 63.59% | 74.11% | 68.45% |
2) SWLM-HMM emotional semantic classification experiment
P, R and the F1 value of SWLM-HMM algorithms are as shown in table 6 below
The SWLM-HMM of table 6 P, R and F1 value
3) the emotional semantic classification experiment of TC algorithms
P, R and the F1 value of TC algorithms are as shown in table 7 below
P, R and F1 value of the TC algorithms of table 7
Emotional semantic classification | Accuracy rate | Recall rate | F1 |
It is happy | 53.15% | 60.14% | 56.43% |
It is good | 53.37% | 62.09% | 57.40% |
Anger | 54.54% | 60.40% | 57.32% |
Sorrow | 54.38% | 61.37% | 57.66% |
Fear | 53.94% | 62.53% | 57.92% |
Dislike | 53.93% | 60.40% | 56.98% |
It is frightened | 52.42% | 61.35% | 56.54% |
C. it is grand average and micro- average
Use the grand average and as shown in table 8 below for average P, R and F1 value of each algorithm
The algorithm of table 8 it is grand average and micro- average
Labeling algorithm TC, SWLM-TC heuritic approach based on word frequency, SWLM-HMM algorithms, several foldings of experimental data
The contrast of line chart is as shown in Figure 4, Figure 5 and Figure 6.
What Fig. 4, Fig. 5 and Fig. 6 were represented is the result analyzed using same sentiment dictionary, and wherein TC algorithms generation is walked always
To see, the most basic TC algorithms of sentiment analysis and SWLM-TC algorithms, SWLM-HMM algorithm comparing results are as follows,
1) SWLM-HMM in accuracy rate>SWLM-TC>Accuracy rate scope of TC, the TC algorithm in 7 granularities be
Accuracy rate scope of 52.42%-54.54%, the SWLM-TC algorithm in 7 granularities is 60.12%-65.73%, SWLM-HMM
Accuracy rate scope of the algorithm in 7 granularities is 69.60%-73.21%, it is therefore seen that, TC algorithms fall in percentage magnitude
After SWLM-TC and SWLM-HMM algorithms, SWLM-TC lags behind SWLM-HMM in percentage magnitude.
2) SWLM-HMM in recall rate>SWLM-TC>Recall rate scope of TC, the TC algorithm in 7 granularities be
Recall rate scope of 60.14%-62.09%, the SWLM-TC algorithm in 7 granularities is 71.31%-74.23%, SWLM-HMM
Recall rate scope of the algorithm in 7 granularities is 79.39%-83.88%, and TC algorithms lag behind SWLM- in percentage magnitude
TC and SWLM-HMM algorithms, SWLM-TC lag behind SWLM-HMM in percentage magnitude.
3) SWLM-HMM in F1 values>SWLM-TC>TC, in several evaluation criterions, SWLM-TC and SWLM-HMM performance
It is better than TC algorithms, F1 value scope of the TC algorithms in 7 granularities is 56.43%-57.66%, and SWLM-TC algorithms are in 7 grains
F1 value scopes on degree are 66.10%-69.72%, and F1 value scope of the SWLM-HMM algorithms in 7 granularities is 75.26%-
77.90%, TC algorithm lag behind SWLM-TC and SWLM-HMM algorithms in percentage magnitude, and SWLM-TC is in percentage magnitude
Lag behind SWLM-HMM.
From data above contrast, find either accuracy rate, recall rate, in F1 values, SWLM-TC algorithms and SWLM-
The result of HMM algorithms will be better than TC algorithm, this demonstrate that, only compared using performance of traditional TC algorithms in fine granularity calculating
Difference.
Micro- average and grand average data such as Fig. 7 and Fig. 8.
It is better than TC from grand average and micro- average broken line it can be seen from the figure that SWLM-HMM and SWLM-TC algorithm performance
Algorithm, in SWLM-TC and SWLM-HMM algorithms, SWLM-HMM performance is better than the performance of SWLM-TC algorithms.Three kinds of algorithms
Performance on same data set is almost 10% magnitude.
From data set distribution, the column diagram of several tables, three distributions such as Fig. 9, figure of three algorithms in classification are drawn
Shown in 10 and Figure 11:
It is can be seen that from Fig. 9, Figure 10 and Figure 11 in correct classification, the correct classifying text of TC algorithms is minimum, assigns to by mistake
Wrong bar number under such also compare it is more, in such this project divided is belonged to by mistake, the bar numbers of TC algorithms be also it is most,
It is all leading with TC algorithms in SWLM-TC and SWLM-HMM and corresponding data.
3.3 experiment conclusion
By raw experimental data, and carry out in the calculating of accuracy rate, recall rate and F1 values, it can be deduced that following viewpoint with
Conclusion:
1) SWLM-TC algorithms are higher 7.7%-13.31% percentage points than traditional TC algorithms in the degree of accuracy, and SWLM-HMM is calculated
Method is higher than SWLM-TC algorithm 9.48%-13.09% percentage points in the degree of accuracy.Illustrate that SWLM-X algorithms exist in data
It is higher than traditional algorithm in the degree of accuracy, it is due to that algorithm compares computational methods proposed by the present invention before being classified, SWLM-
TC and SWLM-HMM have passed through the stage of the random co-occurrence network of emotion word, and the role in the present invention of co-occurrence word network embodies
Out, the original intention for demonstrating the algorithm proposed by the present invention based on emotion Term co-occurrence network be to;In SWLM-TC and
In the comparison of SWLM-HMM algorithms, SWLM-HMM algorithms will be less than by finding SWLM-TC accuracy rate, be because SWLM-HMM algorithms
Before classification, the training of collection is trained using SWLM-TC algorithms, is gone again point using the model that trains in the later stage
Class, and the text also obscured in the later stage to emotion has carried out emotion prediction, uses the two strategies to cause SWLM-HMM in standard
SWLM-TC and TC algorithms are surmounted naturally in true rate.
The feelings of the bigger appearance of weight in athymia of the emotion Term co-occurrence network on completion text, and prominent text
Word is felt, on this 2 points so that SWLM-X algorithms highlight advantage on fine granularity affection computation.
2) in recall rate, SWLM-TC algorithms will be higher by 11.17%-14.09% percentage points than traditional TC algorithms,
SWLM-HMM algorithms will be higher by 8.08%-12.57% percentage points than SWLM-HMM algorithm, illustrate SWLM-TC and SWLM-
For HMM algorithms when classifying to certain class, the ability that can correctly classify is higher than traditional algorithm, is because the present invention carries
The algorithm and framework gone out is when some class is classified, the efficiency of the demarcation of the specific emotion word for belonging to such and accurate
Degree is will be high than traditional algorithm.Algorithm proposed by the present invention is higher than traditional algorithm in the ability of the important emotion word of protrusion, and
And the feeling polarities in emotion vocabulary ontology library are employed in the present invention, this parameter is utilized well by text, passes through co-occurrence
Number is multiplied with feeling polarities so that feeling polarities intense emotion word is more prominent, so as to reduce the dry of other emotion words
Disturb so that more accurate when classification.
3) in the comparison of F1 values, SWLM-TC algorithms are higher by 9.67%-13.29% percentage points than traditional algorithm,
SWLM-HMM algorithms will be higher by 9.16%-11.8% percentage points than SWLM-TC algorithm, and overall merit aspect, the present invention proposes
SWLM-TC algorithms and SWLM-HMM will be better than traditional algorithm, in terms of this for, algorithm proposed by the present invention is in synthesis
Exceed in performance and carry out fine granularity emotional semantic classification algorithm using traditional algorithm.
4) from each item data of experiment, Computational frame and algorithm proposed by the present invention classification correlated performance not
Mistake, but it is very big with the difference in terms of emotional orientation analysis, because fine granularity calculates not only requires nothing more than algorithm in all fields
Fine grit classification can be carried out, it is also higher in jamproof performance, for the emotion word occurred in an article, use demarcation
Way is easy to disturb, and causes unnecessary more classification problems, because the Sentiment orientation rate of exchange in an article are stable, one
As in the case of, be critically important property in feeling polarities and co-occurrence, in general sorting algorithm is handled not to this, institute
When to handle this class text, algorithm proposed by the present invention is just more advantageous, in general text, emotion pole
Property the stronger word and not strong text of co-occurrence, algorithm of the invention when processing as the algorithm based on demarcation,
Unlike but, the mechanism that prediction is employed in the text fuzzyyer to emotion is used among the present invention so that right
Can be more much better than without forecasting mechanism in terms of the processing of this class text.The present invention employs SWLM calculations in fine granularity Computational frame
Method, in treatment mechanism, the relevant knowledge of complex network is absorbed, treat the completion aspect of the carry out emotion word in classifying text
Aspect of performance have relatively good effect, after taking above-mentioned mechanism, fine granularity Computational frame to fine granularity calculate passes through
Experimental verification is studied.
Claims (3)
1. a kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word, theoretical using random network, word is utilized
Language co-occurrence phenomenon, by the mark of emotional noumenon vocabulary dictionary, one is formed with affective characteristics structure based on word order
Stochastic network model, i.e. emotion Term co-occurrence network model, model yojan are carried out on this basis, by the most long matching process of emotion word
(SWLM, Sentimental Word Longest Match) and TC algorithms, which combine, carries out SWLM-TC unsupervised learning classification, or
Further the most long matching process of emotion word and HMM machine learning algorithms are combined to establish fine granularity sentiment classification model and utilize and are somebody's turn to do
Model realization classification prediction, wherein, the building process of the emotion Term co-occurrence network model is as follows:
1) subordinate sentence is performed to each text to operate to obtain one group of orderly sentence S1→S2→…→Sn;
2) to each sentence SiSegmented, filter out stop words and insignificant notional word, carried out using emotion vocabulary ontology library
Emotion word marks, and obtains one group of orderly emotion word W1→W2→…→Wn;
3) to each sentence, vocabulary pair is extracted from sentence using WL positions sliding window<wi,wj>IfThen added into W
One new node wi, and be wiWeight nwiIf initial value is 1;Otherwise nwiAdd 1, ifOne is then added into E
New side (wi,wj), and be (wi,wj) weight nwi,wjIf initial value is 1;Otherwise nwi,wjAdd 1;
4) after all text-processings are completed, network model G, which is established, to be completed;
Wherein, S is represented by the molecular sequence of a plurality of sentence, and w represents the emotion word extracted, w ∈ ∑s, and ∑ collects for Chinese word,
Chinese word collects to remove stop words, the emotional noumenon word set after emotion vocabulary ontology library mark again after meaningless notional word;W
For network model G node set, W={ wi| i ∈ [1, N] }, N is G node number;E be network model G line set, net
The number on network model G side is M, E={ (wi,wj)|wi,wj∈ W, and wiAnd wjBetween order cooccurrence relation be present, (wi,wj)
Represent from node wiPoint to node wjDirected edge;NWFor the weight of network model G interior joints, NW={ nwi|wi∈w};NE is net
The weight on side in network model G, represent node wiWith wjBetween side weight, NE={ nwi,wj|(wi,wj)∈E};
By network model G according to pleasure, good, anger, sorrow, fear, dislike, shy seven kinds of moods and be divided into 7 sub-networks, sub-network split process
In, if the situation of fracture occurs, it is attached using that node of weight highest with the network sub-block being broken, structure can
The seven sub- network G x calculated for fine granularity | x={ 1,2,3,4,5,6,7 } is G1,G2,G3,G4,G5,G6,G7;
Characterized in that, when being classified, have and be defined as below:
Most long weight coupling path length dmax(S):Network G x | x={ 1,2,3,4,5,6,7 }, if two emotion word orders cover
Lid, then matched using the side being joined directly together, if two emotion words are in network GxIn network compartments be present, then select path
When selection by weight maximum node matched, as S length, calculation formula are as follows:
<mrow>
<msub>
<mi>d</mi>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<msub>
<mi>d</mi>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mo>+</mo>
<mi>x</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
</mrow>
Wherein dmax(wi,wi+x) be in network i-th of word to the weight limit coupling path of the i-th+x words;
Emotion weight coefficient SW (Sentimetal weight):In network G, the respective shared feeling polarities ratio of seven sub-networks
Weight, using this coefficient classification can be allowed more obvious, reduced because classification problem caused by boundary is fuzzy, makes word in emotion word network
Number of recurrences be freq, polar intensity P, calculation formula is as follows:
WCi=freq × P
<mrow>
<msub>
<mi>W</mi>
<mi>y</mi>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msub>
<mi>WC</mi>
<mi>i</mi>
</msub>
</mrow>
<mrow>
<msub>
<mi>SW</mi>
<mi>x</mi>
</msub>
<mo>=</mo>
<msub>
<mi>W</mi>
<mrow>
<mi>y</mi>
<mi>i</mi>
</mrow>
</msub>
<mo>/</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mn>7</mn>
</munderover>
<msub>
<mi>W</mi>
<mrow>
<mi>y</mi>
<mi>i</mi>
</mrow>
</msub>
</mrow>
Wherein WC is the emotion numerical value of each word in sub-network, WyFor the emotion numerical value of sub-network, SWxFor sub-network x SW values,
That is emotion weight coefficient;
Classification factor CC (Classification coefficient):After being determined in maximum matching word path, this paths
On word reproduction degree Re and emotion intensity power, it is assumed that have n word, then calculation formula is as follows:
CCi=Re × power
<mrow>
<mi>C</mi>
<mi>C</mi>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msub>
<mi>CC</mi>
<mi>i</mi>
</msub>
</mrow>
Wherein CCiIt is the classification factor of single word;
Classify predictive coefficient CPC (Classification prediction coefficient):Using machine learning algorithm
When being classified, for can not judgement sample the forecasting mechanism taken of classification;According to SWxIt is ranked up, if SW1+SW2>
80%, SW1/SW2>1.5, then it is included into SW1Under, if SW1+SW2>80%, SW1/SW2<=1.5, SW is included into this case1
And SW2Under two attributes;If SW1+SW2<80%, then it represents that the classification of this article is more complicated, is included into according to classification factor
Under corresponding classification:
2. the fine granularity sensibility classification method according to claim 1 based on the random co-occurrence network of emotion word, it is characterised in that
In the emotion vocabulary ontology library, emotion is divided into the group of 7 major class 21, and emotional semantic classification is respectively that pleasure { happy (PA), is felt at ease
(PE) }, good { respect (PD), praise (PH), believe (PG), like (PB), wish (PK) }, anger { angry (NA) }, sorrow are { sad
(NB), disappointed (NJ), remorse (NH), think (PE), fear { flurried (NI), frightened (NC), shy (NG) }, dislike unhappy (NE), abhor (ND),
Censure (NN), envy (NK), suspect (NL) }, frightened { surprised (PC) };Power points of emotion intensity is 1,3,5,7,9 five grades, and 9 represent
Maximum intensity, 1 represents that intensity is minimum, and the part of speech species one in emotion vocabulary body is divided into 7 classes, is noun (noun) respectively,
Verb (verb), adjective (adj), adverbial word (adv), network vocabulary (nw), Chinese idiom (idiom), prepositional phrase (prep), contain altogether
There is emotion word 27466.
3. the fine granularity sensibility classification method according to claim 1 based on the random co-occurrence network of emotion word, it is characterised in that
The most long matching process of the emotion word carries out most long matching by the weight limit vocabulary of emotion word so that does not use disambiguation and prevents
Noise processed, you can under Accurate classification to related emotion theme, and weight calculation is carried out by seven subclassification models, drawn
The parameter of machine learning classification can be carried out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610936655.9A CN106547866B (en) | 2016-10-24 | 2016-10-24 | A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610936655.9A CN106547866B (en) | 2016-10-24 | 2016-10-24 | A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106547866A CN106547866A (en) | 2017-03-29 |
CN106547866B true CN106547866B (en) | 2017-12-26 |
Family
ID=58392940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610936655.9A Active CN106547866B (en) | 2016-10-24 | 2016-10-24 | A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106547866B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992667B (en) * | 2019-03-26 | 2021-06-08 | 新华三大数据技术有限公司 | Text classification method and device |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766331A (en) * | 2017-11-10 | 2018-03-06 | 云南大学 | The method that automatic Calibration is carried out to word emotion value |
CN111239812A (en) * | 2019-05-17 | 2020-06-05 | 北京市地震局 | Social media big data and machine learning-based seismic intensity rapid evaluation method |
CN112101033B (en) * | 2020-09-01 | 2021-06-15 | 广州威尔森信息科技有限公司 | Emotion analysis method and device for automobile public praise |
CN112883145A (en) * | 2020-12-24 | 2021-06-01 | 浙江万里学院 | Emotion multi-tendency classification method for Chinese comments |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8417713B1 (en) * | 2007-12-05 | 2013-04-09 | Google Inc. | Sentiment detection as a ranking signal for reviewable entities |
CN104899231A (en) * | 2014-03-07 | 2015-09-09 | 上海市玻森数据科技有限公司 | Sentiment analysis engine based on fine-granularity attributive classification |
-
2016
- 2016-10-24 CN CN201610936655.9A patent/CN106547866B/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992667B (en) * | 2019-03-26 | 2021-06-08 | 新华三大数据技术有限公司 | Text classification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106547866A (en) | 2017-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106547866B (en) | A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word | |
CN106919673B (en) | Text mood analysis system based on deep learning | |
CN104137102B (en) | Non- true type inquiry response system and method | |
CN108363753A (en) | Comment text sentiment classification model is trained and sensibility classification method, device and equipment | |
CN102789498B (en) | Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning | |
CN105022805B (en) | A kind of sentiment analysis method based on SO-PMI information on commodity comment | |
CN105005553B (en) | Short text Sentiment orientation analysis method based on sentiment dictionary | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN107066446A (en) | A kind of Recognition with Recurrent Neural Network text emotion analysis method of embedded logic rules | |
CN107609132B (en) | Semantic ontology base based Chinese text sentiment analysis method | |
CN108536870B (en) | Text emotion classification method fusing emotional features and semantic features | |
CN107403017A (en) | A kind of method that real-time news of intellectual analysis influences on financial market | |
CN106227756A (en) | A kind of stock index forecasting method based on emotional semantic classification and system | |
CN106598950A (en) | Method for recognizing named entity based on mixing stacking model | |
Yan et al. | An improved single-pass algorithm for chinese microblog topic detection and tracking | |
US20160170993A1 (en) | System and method for ranking news feeds | |
Wahid et al. | Cricket sentiment analysis from Bangla text using recurrent neural network with long short term memory model | |
Rustamov | A hybrid system for subjectivity analysis | |
Nair et al. | Comparative study of twitter sentiment on covid-19 tweets | |
CN108763402A (en) | Class center vector Text Categorization Method based on dependence, part of speech and semantic dictionary | |
CN109670169A (en) | A kind of deep learning sensibility classification method based on feature extraction | |
Kim et al. | CNN based sentence classification with semantic features using word clustering | |
CN107967337A (en) | A kind of cross-cutting sentiment analysis method semantic based on feeling polarities enhancing | |
CN107122471A (en) | A kind of method that hotel's characteristic comment is extracted | |
Sedighi et al. | Opinion spam detection with attention-based neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |