CN106446147A - Emotion analysis method based on structuring features - Google Patents

Emotion analysis method based on structuring features Download PDF

Info

Publication number
CN106446147A
CN106446147A CN201610839375.6A CN201610839375A CN106446147A CN 106446147 A CN106446147 A CN 106446147A CN 201610839375 A CN201610839375 A CN 201610839375A CN 106446147 A CN106446147 A CN 106446147A
Authority
CN
China
Prior art keywords
text
dictionary
influence
score value
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610839375.6A
Other languages
Chinese (zh)
Inventor
苏育挺
王慧晶
张静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610839375.6A priority Critical patent/CN106446147A/en
Publication of CN106446147A publication Critical patent/CN106446147A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an emotion analysis method based on structuring features. The emotion analysis method includes the steps of collecting Twitter text data; building a Twitter text database; collecting existing emotion polarity value dictionaries; manually establishing related auxiliary dictionaries; preprocessing the Twitter text database; defining an emotion score influence factor, extracting language features of information, and updating the value of the emotion score influence factor every time one language feature is extracted; calculating the emotion polarity values of the Twitter text data through the emotion polarity value dictionaries and the emotion score influence factor. According to the emotion analysis method based on the structuring features, it is avoided that in supervision methods, a large amount of marked data is required to train a classifier, and analysis and generalization are difficult; the CPU processing requirement, the internal storage requirement and the overhead for calculating training time are reduced.

Description

A kind of sentiment analysis method based on structured features
Technical field
The present invention relates to a kind of sentiment analysis method.More particularly to a kind of unsupervised emotion based on structured features Analysis method.
Background technology
Appearance with social media and popular, increasing user tends to divide by different social media platforms Enjoy their particular views or simple their emotion of expression and mood.In these social platform, Twitter becomes and flows most One of website of row, shows according to statistics in 2016, it has had more than 645,000,000 register user at present, averagely The daily tweet quantity sent out is more than 190,000,000.By the API of Twitter, we can obtain the number enriching in a large number According to enabling us to sufficiently these data be detected and excavate, be the good opportunity of sentiment analysis.Thus helping us Infer the popular viewpoint for all kinds of things, we can make wiser prediction and selection using these conclusions, is based on The sentiment analysis of Twitter text data, become study hotspot instantly naturally.
Sentiment analysis for Twitter text data relate generally to natural language processing, opining mining and emotional semantic classification Etc. technology.The method realizing sentiment analysis at present mainly has two kinds:A kind of is unsupervised approaches based on dictionary, this method master Depend on and contain the sentiment dictionary carrying feeling polarities information in a large number, such as LIWC[1]、ANEW[2]、AFINN[3]、VADER[4]、 SentiWordNet[5]Deng;Second method is measure of supervision, and this method passes through machine learning algorithm from a large number with mark Extracting data feature training grader, such as SVM (Support Vector Machine),Bayes、Decision Tree etc..Most-often used feature is depositing of n-grams (continuous 1 in text, 2,3 or multiple text-independent unit) Whether or usage frequency.But this method needs, in the training stage, the data that a large amount of bands mark, therefore with regard to CPU process, Memory requirements and for the training time computing cost larger.Additionally, for the data of a large portion, Supervised classification device institute The decision-making score value of prediction is in close proximity to decision boundary, and which imply which kind of grader belongs on earth for text is very not Determine, therefore, or distribute to the label of this kind of data if it were not for full of prunes right be also cas fortuit[6].Therefore exist Here sentiment analysis are realized it is intended that selecting based on the unsupervised approaches of dictionary.
Twitter text is mainly based on the significant challenge that the sentiment analysis field of Twitter text data faces at present The feature of itself is brought:Within the length of a such as tweet is limited in 140 words, so for the information of our offers With regard to relatively limited;Except its irregular language construction and grammatical representation mode, in a tweet, may also contain many Initialism, symbol expression, topic label, slang, chained address etc., this makes emotion extract and opining mining becomes difficult. Existing conventional traditional natural language processing techniques (Natural Language Preprocessing, NLP) such as participle, standard Change, part-of-speech tagging etc. can be effectively applied on the specification text normally write, and no longer suitable for Twitter data With.
Content of the invention
The technical problem to be solved is to provide a kind of avoiding to be needed to be marked in a large number in supervision class method Data is training the sentiment analysis method based on structured features of grader.
The technical solution adopted in the present invention is:A kind of sentiment analysis method based on structured features, walks including following Suddenly:
1) gather Twitter text data, set up Twitter text database;
2) collect existing feeling polarities value dictionary, preferentially choose by the sentiment dictionary manually generating;
3) set up related auxiliary dictionary manually, including:Standard word dictionary, negative word dictionary, strengthen qualifier dictionary, subtract Weak qualifier dictionary and network slang dictionary;
4) described Twitter text database is pre-processed, including:
(1) first participle is carried out to the data in Twitter text database;
(2) it is standardized;
(3) text is carried out with part-of-speech tagging (Part-of-Speech Tagging, POS Tagging);
5) defining emotion score value factor of influence, to step 4) information that obtains of pretreatment carries out language feature extraction, described Language feature include the language feature of the other language feature of word-level, the language feature of phrase rank and sentence level, often carry A language feature is taken just to update the numerical value of an emotion score value factor of influence;
6) utilize step 2) the feeling polarities value dictionary that obtains and step 5) the emotion score value factor of influence that obtains is every Twitter text data calculates feeling polarities value.
Step 2) described in feeling polarities value dictionary include:Sentiment dictionary AFINN that 3 manually manually generate, SentiStrength and VADER, and a sentiment dictionary Opinion Observer automatically generating.
Step 4) participle described in (1st) step is that Twitter text data is divided into minimum significant text-independent Unit, marks the type of each text-independent unit simultaneously respectively.
Step 4) standardization described in (2nd) step is by the text-independent using repetitive letter using standard English dictionary Unit is changed to canonical form, symbol expression in identification Twitter text data, picture expression and letter expressing, and judge and Mark corresponding feeling polarities.
Step 4) part-of-speech tagging is carried out to text described in (3rd) step, it is the part of speech class marking each text-independent unit Not.
Step 5) described in definition emotion score value factor of influence, be for each text-independent unit t introduce an emotion divide Value factor of influence IFt, wherein IFt>=0, initial value is 1, in order to react the described language feature emotion to text-independent unit Intensity enhancing or the degree weakening, emotion score value factor of influence formula is as follows:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update The emotion score value factor of influence of this unit t, p refers to a certain feature, and P refers to all feature sets that can affect emotion score value factor of influence Close.
Step 5) in word-level, other language feature extracts and includes:
If the alphabetical all Caps in a text-independent unit, all Caps mark sAllCaps=1, otherwise sAllCaps=0, and update emotion score value factor of influence formula IFt
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update The emotion score value factor of influence of this unit t;
If a text-independent unit uses repetitive letter, distribute an elongation factor for each text-independent unitIntorigRepresent original text-independent unit, tnormRepresent the text-independent after standardization Unit, and update emotion score value factor of influence formula IFt
Step 5) in the language feature of phrase rank extracted include:
Using step 3) the manual negative word dictionary set up, determine the beginning containing negative content phrase, by fullstop, ask Number, exclamation mark and off-gauge text-independent unit be defined as the end mark containing negative content phrase, and update emotion and divide Value factor of influence formula IFt
Wherein t is the text-independent unit within the scope of negatives.
Using step 3) the enhancing qualifier dictionary set up manually and weaken qualifier dictionary, find out Twitter textual data According to all of qualifierCalculate the stretching, extension factor of qualifier according to the following formula:
Wherein m represents certain qualifier, MDMRepresent and modify set of words, if the alphabetical all Caps of certain qualifier m, repair Excuse all Caps markOtherwiseIf certain qualifier m employs repetitive letter, using repetition Letter designationsOtherwise
And update emotion score value factor of influence formula IFt
Step 5) in the language feature of sentence level extracted include:
By the sentence structure of X but Y, determine the Twitter text data using adversative conjunction (as but, yet etc.), Part X before mark conjunction and part Y after conjunction, and update emotion score value factor of influence formula IFt
Wherein, if text-independent unit t is in X,If in Y,
By following three kinds of sentence structures:If X, Y, If X () then Y and Y, if X., determine use condition sentence Twitter text data, in sentence structure, X is conditional clause and Y is result sentence, and updates emotion score value factor of influence formula IFt
Wherein, text-independent unit t is in X.
Step 6) described in calculating feeling polarities value, including:
(1) calculate the basic emotion polarity number of each text-independent unit t, if L is feeling polarities value dictionary collection used, Lt={ l ∈ Lt| t ∈ l } represent comprise text-independent unit t sentiment dictionary subset, each text-independent list is obtained by following formula The basic emotion polarity number s of first tt
Wherein score (l, t) is the basic emotion polarity number of each text-independent unit t being given in dictionary l, | Lt| table Show the number of the feeling polarities value dictionary comprising text-independent unit t;
(2) to each text-independent unit t, using step 5) the emotion score value factor of influence IF that obtainstUpdate each only The basic emotion polarity number s of vertical text unit tt
(3) it is that every Twitter text data T calculates overall emotion score value ST
A kind of sentiment analysis method based on structured features of the present invention, it is to avoid need quilt in a large number in supervision class method The data of mark, to train grader it is difficult to analyzing and carrying out vague generalization, reduces CPU process, memory requirements and training time Computing cost.Beneficial effects of the present invention are specifically:
1st, avoid and supervise class method using based on having of machine learning, need not rely upon the data being marked in a large number to instruct Practice grader thus realizing sentiment analysis;
2nd, employ the preprocessor of fine emotion perception such that it is able to effectively process informal social media literary composition This information, improves efficiency and the classification accuracy of subsequent treatment;
3rd, propose a kind of structurized feature extraction mode, divide such that it is able to easily update emotion defined in us Value factor of influence, and then improve the calculating process of emotion score value.
Brief description
Fig. 1 is the flow chart based on the sentiment analysis method of structured features for the present invention.
Specific embodiment
With reference to embodiment and accompanying drawing, a kind of of the present invention is made based on the sentiment analysis method of structured features in detail Describe in detail bright.
As shown in figure 1, a kind of sentiment analysis method based on structured features of the present invention, comprise the steps:
1) gather Twitter text data, set up Twitter text database;
2) collect existing feeling polarities value dictionary, preferentially choose by the sentiment dictionary manually generating;Described feelings Sense polarity number dictionary includes:Sentiment dictionary AFINN, SentiStrength and VADER that 3 manually manually generate, and one The sentiment dictionary Opinion Observer automatically generating.Table 1 gives the general introduction of feeling polarities value dictionary and its feature.
Table 1 sentiment dictionary is summarized
3) set up related auxiliary dictionary manually, including:Standard word dictionary, negative word dictionary, strengthen qualifier dictionary, subtract Weak qualifier dictionary and network slang dictionary;Table 2 gives the summary of dictionary used by us.
Table 2 auxiliary dictionary general introduction
4) described Twitter text database is pre-processed, including:
(1) first participle is carried out to the data in Twitter text database.Described participle, is by Twitter text Data is divided into minimum significant text-independent unit, marks the type of each text-independent unit respectively simultaneously, such as word, Topic label, symbol expression, chained address etc..Mated by regular expression different types of text-independent unit and be its Mark respective labels.
(2) it is standardized.Described standardization, is by the text-independent using repetitive letter using standard English dictionary Unit is changed to canonical form, symbol expression in identification Twitter text data, picture expression and letter expressing, and judge and Mark corresponding feeling polarities.Specific as follows:
A. letter elongation. letter elongation refers to increase the expression dynamics of word using the letter repeating, and is primarily based on voice Coding is from DenAnd DslangIn look for the index of word, if our standardization device runs into one is not present in this two words Word in allusion quotation, then confirmed the option mating, then calculate between the option that input is mated with each by voice coding Levenshtein distance weighs its similitude, returns best match.
B. symbol expression. the figure of the facial expression that symbol expression is made up of punctuation mark or letter represents As:-),:),:O) etc., we by positive and passive symbol table mutual affection be not standardized as [EMOTICON+] and [EMOTICON-].
C. picture expression (emoji).Since two thousand and ten, increasing picture expression is added into standard unicode (UNICODE)-Unicode 8.0 in, such asExpress one's feelings similar with symbol, all of picture is expressed one's feelings all standards by we Change and correspond to predefined text-independent unit s, such as [EMOJI+], [EMOJI0], [EMOJI-].
D. letter expressing (emotext). last, we standardize letter expressing such as haha, hehe, xixi, and we pass through Coupling comprises at least k repetitive letter (setting k=2 at present) and finds out these letter expressings with matching regular expressions, then will Each letter expressing is standardized as its core form, such as hhahahah is changed into haha.
(3) text is carried out with part-of-speech tagging (Part-of-Speech Tagging, POS Tagging).Described to literary composition Originally carry out part-of-speech tagging, be the part of speech classification marking each text-independent unit, such as noun, adjective, verb etc..
5) defining emotion score value factor of influence, to step 4) information that obtains of pretreatment carries out language feature extraction, described Language feature include the language feature of the other language feature of word-level, the language feature of phrase rank and sentence level, often carry A language feature is taken just to update the numerical value of an emotion score value factor of influence;Described definition emotion score value factor of influence, be Introduce an emotion score value factor of influence IF for each text-independent unit tt, wherein IFt>=0, initial value is 1, in order to react , to the emotion intensity enhancing of text-independent unit or the degree weakening, emotion score value factor of influence formula is such as the language feature stated Under:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update The emotion score value factor of influence of this unit t, p refers to a certain feature, and P refers to all feature sets that can affect emotion score value factor of influence Close.
To word-level, other language feature extracts and includes:
If the alphabetical all Caps in a text-independent unit, all Caps mark sAllCaps=1, otherwise sAllCaps=0, and update emotion score value factor of influence formula IFt
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update The emotion score value factor of influence of this unit t.
If a text-independent unit uses repetitive letter, distribute an elongation factor for each text-independent unitIntorigRepresent original text-independent unit, tnormRepresent the text-independent after standardization Unit, and update emotion score value factor of influence formula IFt
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update The emotion score value factor of influence of this unit t.
Inclusion is extracted to the language feature of phrase rank:
Using step 3) the manual negative word dictionary set up, determine the beginning containing negative content phrase, by fullstop, ask Number, exclamation mark and off-gauge text-independent unit be defined as the end mark containing negative content phrase, and update emotion and divide Value factor of influence formula IFt
Wherein t is the text-independent unit within the scope of negatives,After referring to update, the emotion of text-independent unit t is divided Value factor of influence,The emotion score value factor of influence of the text-independent unit t before referring to update.
Using step 3) the enhancing qualifier dictionary set up manually and weaken qualifier dictionary, find out Twitter textual data According to all of qualifierCalculate the stretching, extension factor of qualifier according to the following formula:
Wherein m represents certain qualifier, MDMRepresent and modify set of words, if the alphabetical all Caps of certain qualifier m, repair Excuse all Caps markOtherwiseIf certain qualifier m employs repetitive letter, using repetition Letter designationsOtherwise
And update emotion score value factor of influence formula IFt
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update The emotion score value factor of influence of this unit t.
In the language feature of sentence level extracted include:
By the sentence structure of X but Y, determine the Twitter text data using adversative conjunction (as but, yet etc.), Part X before mark conjunction and part Y after conjunction, and update emotion score value factor of influence formula IFt
Wherein, if text-independent unit t is in X,If in Y, The emotion score value factor of influence of text-independent unit t after referring to update,The emotion score value of the text-independent unit t before referring to update Factor of influence.
By following three kinds of sentence structures:If X, Y, If X () then Y and Y, if X., determine use condition sentence Twitter text data, in sentence structure, X is conditional clause and Y is result sentence, and updates emotion score value factor of influence formula IFt
Wherein, text-independent unit t is in X.
6) utilize step 2) the feeling polarities value dictionary that obtains and step 5) the emotion score value factor of influence that obtains is every Twitter text data calculates feeling polarities value.Described calculating feeling polarities value, including:
(1) calculate the basic emotion polarity number of each text-independent unit t, if L is feeling polarities value dictionary collection used, Lt={ l ∈ Lt| t ∈ l } represent comprise text-independent unit t sentiment dictionary subset, each text-independent list is obtained by following formula The basic emotion polarity number s of first tt
Wherein score (l, t) is the basic emotion polarity number of each text-independent unit t being given in dictionary l, | Lt| table Show the number of the feeling polarities value dictionary comprising text-independent unit t.
(2) to each text-independent unit t, using step 5) the emotion score value factor of influence IF that obtainstUpdate each only The basic emotion polarity number s of vertical text unit tt
(3) it is that every Twitter text data T calculates overall emotion score value ST
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Sequence number is for illustration only, does not represent the quality of embodiment.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.
Bibliography in background technology is as follows:
[1]Pennebaker J W,Francis M E,Booth R J.Linguistic inquiry and word count:LIWC 2001[J].Mahway:Lawrence Erlbaum Associates,2001,71:2001..
[2]Bradley M M,Lang P J.Affective norms for English words(ANEW): Instruction manual and affective ratings[R].Technical report C-1,the center for research in psychophysiology,University of Florida,1999.
[3]Nielsen FA new ANEW:Evaluation of a word list for sentiment analysis in microblogs[J].arXiv preprint arXiv:1103.2903,2011.
[4]Hutto C J,Gilbert E.Vader:A parsimonious rule-based model for sentiment analysis of social media text[C]//Eighth International AAAI Conference on Weblogs and Social Media.2014.
[5]Baccianella S,Esuli A,Sebastiani F.SentiWordNet 3.0:An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining[C]//LREC.2010,10: 2200-2204.
[6]Chikersal P,Poria S,Cambria E.SeNTU:sentiment analysis of tweets by combining a rule-based classifier with supervised learning[J].SemEval- 2015,2015:647.

Claims (10)

1. a kind of sentiment analysis method based on structured features is it is characterised in that comprise the steps:
1) gather Twitter text data, set up Twitter text database;
2) collect existing feeling polarities value dictionary, preferentially choose by the sentiment dictionary manually generating;
3) set up related auxiliary dictionary manually, including:Standard word dictionary, negative word dictionary, strengthen qualifier dictionary, weaken and repair Excuse dictionary and network slang dictionary;
4) described Twitter text database is pre-processed, including:
(1) first participle is carried out to the data in Twitter text database;
(2) it is standardized;
(3) text is carried out with part-of-speech tagging (Part-of-Speech Tagging, POS Tagging);
5) defining emotion score value factor of influence, to step 4) information that obtains of pretreatment carries out language feature extraction, described language Speech feature includes the language feature of the other language feature of word-level, the language feature of phrase rank and sentence level, often extracts one Individual language feature just updates the numerical value of an emotion score value factor of influence;
6) utilize step 2) the feeling polarities value dictionary that obtains and step 5) the emotion score value factor of influence that obtains is every Twitter text data calculates feeling polarities value.
2. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 2) institute The feeling polarities value dictionary stated includes:Sentiment dictionary AFINN, SentiStrength and VADER that 3 manually manually generate, with And a sentiment dictionary Opinion Observer automatically generating.
3. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 4) (1) participle described in step, is that Twitter text data is divided into minimum significant text-independent unit, marks respectively simultaneously Note the type of each text-independent unit.
4. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 4) (2) standardization described in step, is, using standard English dictionary, the text-independent unit using repetitive letter is changed to canonical form, Symbol expression in identification Twitter text data, picture expression and letter expressing, and judge and mark corresponding feeling polarities.
5. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 4) (3) part of speech classification part-of-speech tagging being carried out to text, being each text-independent unit of mark described in step.
6. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) institute The definition emotion score value factor of influence stated, is to introduce an emotion score value factor of influence IF for each text-independent unit tt, its Middle IFt>=0, initial value is 1, or weakens to the emotion intensity enhancing of text-independent unit in order to react described language feature Degree, emotion score value factor of influence formula is as follows:
IF t n e w = IF t o l d · 2 log 10 ( Σ p p ∈ P ) - - - ( 2 )
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Refer to the text-independent list before updating The emotion score value factor of influence of first t, p refers to a certain feature, and P refers to all characteristic sets that can affect emotion score value factor of influence.
7. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) in To word-level, other language feature extracts and includes:
If the alphabetical all Caps in a text-independent unit, all Caps mark sAllCaps=1, otherwise sAllCaps= 0, and update emotion score value factor of influence formula IFt
IF t n e w = IF t o l d · 2 log 10 ( 1 + s t A l l C a p s ) - - - ( 2 )
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Refer to the text-independent list before updating The emotion score value factor of influence of first t;
If a text-independent unit uses repetitive letter, distribute an elongation factor for each text-independent unitIntorigRepresent original text-independent unit, tnormRepresent the text-independent after standardization Unit, and update emotion score value factor of influence formula IFt
IF t n e w = IF t o l d · 2 log 10 ( s t E x L e n ) - - - ( 3 ) .
8. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) in Inclusion is extracted to the language feature of phrase rank:
Using step 3) the manual negative word dictionary set up, determine the beginning containing negative content phrase, by fullstop, question mark, sense Exclamation and off-gauge text-independent unit are defined as the end mark containing negative content phrase, and update the impact of emotion score value Factor formula IFt
IF t n e w = IF t o l d · ( - 1 ) - - - ( 4 )
Wherein t is the text-independent unit within the scope of negatives;
Using step 3) the enhancing qualifier dictionary set up manually and weaken qualifier dictionary, find out Twitter text data institute Some qualifiersCalculate the stretching, extension factor of qualifier according to the following formula:
s t D M = Σ m ∈ M t D M ( 1 + s m A l l C a p s + s m E x L e n ) - - - ( 5 )
Wherein m represents certain qualifier, MDMRepresent and modify set of words, if the alphabetical all Caps of certain qualifier m, qualifier All Caps markOtherwiseIf certain qualifier m employs repetitive letter, using repetitive letter MarkOtherwise
And update emotion score value factor of influence formula IFt
IF t n e w = IF t o l d · 2 log 10 ( s t D M ) - - - ( 6 ) .
9. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) in Inclusion is extracted to the language feature of sentence level:
By the sentence structure of X but Y, determine the Twitter text data using adversative conjunction (as but, yet etc.), mark Part X before conjunction and part Y after conjunction, and update emotion score value factor of influence formula IFt
IF t n e w = IF t o l d · 2 log 10 ( sgn t C O N J ) - - - ( 7 )
Wherein, if text-independent unit t is in X,If in Y,
By following three kinds of sentence structures:If X, Y, If X () then Y and Y, if X., determine use condition sentence Twitter text data, in sentence structure, X is conditional clause and Y is result sentence, and updates emotion score value factor of influence formula IFt
IF t n e w = 0 - - - ( 8 )
Wherein, text-independent unit t is in X.
10. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 6) Described calculating feeling polarities value, including:
(1) calculate the basic emotion polarity number of each text-independent unit t, if L is feeling polarities value dictionary collection used, Lt= {l∈Lt| t ∈ l } represent comprise text-independent unit t sentiment dictionary subset, each text-independent unit t is obtained by following formula Basic emotion polarity number st
s t = Σ l ∈ L t s c o r e ( l , t ) | L t | , L t ≠ 0 0 , L t = 0 - - - ( 9 )
Wherein score (l, t) is the basic emotion polarity number of each text-independent unit t being given in dictionary l, | Lt| represent bag The number of the feeling polarities value dictionary of the t of unit containing text-independent;
(2) to each text-independent unit t, using step 5) the emotion score value factor of influence IF that obtainstUpdate each text-independent The basic emotion polarity number s of unit tt
s t n e w = s t o l d · IF t - - - ( 10 ) ;
(3) it is that every Twitter text data T calculates overall emotion score value ST
ST=∑t∈Tst(11).
CN201610839375.6A 2016-09-20 2016-09-20 Emotion analysis method based on structuring features Pending CN106446147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610839375.6A CN106446147A (en) 2016-09-20 2016-09-20 Emotion analysis method based on structuring features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610839375.6A CN106446147A (en) 2016-09-20 2016-09-20 Emotion analysis method based on structuring features

Publications (1)

Publication Number Publication Date
CN106446147A true CN106446147A (en) 2017-02-22

Family

ID=58166213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610839375.6A Pending CN106446147A (en) 2016-09-20 2016-09-20 Emotion analysis method based on structuring features

Country Status (1)

Country Link
CN (1) CN106446147A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980650A (en) * 2017-03-01 2017-07-25 平顶山学院 A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications
CN108681532A (en) * 2018-04-08 2018-10-19 天津大学 A kind of sentiment analysis method towards Chinese microblogging
CN109697657A (en) * 2018-12-27 2019-04-30 厦门快商通信息技术有限公司 A kind of dining recommending method, server and storage medium
CN111046136A (en) * 2019-11-13 2020-04-21 天津大学 Method for calculating multi-dimensional emotion intensity value by fusing emoticons and short text
CN111046137A (en) * 2019-11-13 2020-04-21 天津大学 Multidimensional emotion tendency analysis method
CN111143564A (en) * 2019-12-27 2020-05-12 北京百度网讯科技有限公司 Unsupervised multi-target chapter-level emotion classification model training method and unsupervised multi-target chapter-level emotion classification model training device
CN111312394A (en) * 2020-01-15 2020-06-19 东北电力大学 Psychological health condition evaluation system based on combined emotion and processing method thereof
CN117521813A (en) * 2023-11-20 2024-02-06 中诚华隆计算机技术有限公司 Scenario generation method, device, equipment and chip based on knowledge graph
CN117521813B (en) * 2023-11-20 2024-05-28 中诚华隆计算机技术有限公司 Scenario generation method, device, equipment and chip based on knowledge graph

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
JP2012226747A (en) * 2011-04-21 2012-11-15 Palo Alto Research Center Inc Incorporation of glossary knowledge in svm learning for improvement in feeling classification
CN104008091A (en) * 2014-05-26 2014-08-27 上海大学 Sentiment value based web text sentiment analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012226747A (en) * 2011-04-21 2012-11-15 Palo Alto Research Center Inc Incorporation of glossary knowledge in svm learning for improvement in feeling classification
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN104008091A (en) * 2014-05-26 2014-08-27 上海大学 Sentiment value based web text sentiment analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王志涛 等: "基于词典和规则集的中文微博情感分析", 《计算机工程与应用》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980650A (en) * 2017-03-01 2017-07-25 平顶山学院 A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications
CN108681532A (en) * 2018-04-08 2018-10-19 天津大学 A kind of sentiment analysis method towards Chinese microblogging
CN108681532B (en) * 2018-04-08 2022-03-25 天津大学 Sentiment analysis method for Chinese microblog
CN109697657A (en) * 2018-12-27 2019-04-30 厦门快商通信息技术有限公司 A kind of dining recommending method, server and storage medium
CN111046136A (en) * 2019-11-13 2020-04-21 天津大学 Method for calculating multi-dimensional emotion intensity value by fusing emoticons and short text
CN111046137A (en) * 2019-11-13 2020-04-21 天津大学 Multidimensional emotion tendency analysis method
CN111143564A (en) * 2019-12-27 2020-05-12 北京百度网讯科技有限公司 Unsupervised multi-target chapter-level emotion classification model training method and unsupervised multi-target chapter-level emotion classification model training device
CN111312394A (en) * 2020-01-15 2020-06-19 东北电力大学 Psychological health condition evaluation system based on combined emotion and processing method thereof
CN111312394B (en) * 2020-01-15 2023-09-29 东北电力大学 Psychological health assessment system based on combined emotion and processing method thereof
CN117521813A (en) * 2023-11-20 2024-02-06 中诚华隆计算机技术有限公司 Scenario generation method, device, equipment and chip based on knowledge graph
CN117521813B (en) * 2023-11-20 2024-05-28 中诚华隆计算机技术有限公司 Scenario generation method, device, equipment and chip based on knowledge graph

Similar Documents

Publication Publication Date Title
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN108763326B (en) Emotion analysis model construction method of convolutional neural network based on feature diversification
CN112001187B (en) Emotion classification system based on Chinese syntax and graph convolution neural network
Saha et al. Proposed approach for sarcasm detection in twitter
CN106446147A (en) Emotion analysis method based on structuring features
CN109710770A (en) A kind of file classification method and device based on transfer learning
CN112001185A (en) Emotion classification method combining Chinese syntax and graph convolution neural network
CN107305539A (en) A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN107025284A (en) The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN106484664A (en) Similarity calculating method between a kind of short text
CN108733653A (en) A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information
CN107590134A (en) Text sentiment classification method, storage medium and computer
CN110287319B (en) Student evaluation text analysis method based on emotion analysis technology
CN108388554B (en) Text emotion recognition system based on collaborative filtering attention mechanism
CN112001186A (en) Emotion classification method using graph convolution neural network and Chinese syntax
CN105975454A (en) Chinese word segmentation method and device of webpage text
CN106598940A (en) Text similarity solution algorithm based on global optimization of keyword quality
CN101520802A (en) Question-answer pair quality evaluation method and system
CN103646088A (en) Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN107122349A (en) A kind of feature word of text extracting method based on word2vec LDA models
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN109002473A (en) A kind of sentiment analysis method based on term vector and part of speech
CN105740382A (en) Aspect classification method for short comment texts
CN110134934A (en) Text emotion analysis method and device
Bansal et al. Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170222

WD01 Invention patent application deemed withdrawn after publication