CN103544246A - Method and system for constructing multi-emotion dictionary for internet - Google Patents

Method and system for constructing multi-emotion dictionary for internet Download PDF

Info

Publication number
CN103544246A
CN103544246A CN201310470531.2A CN201310470531A CN103544246A CN 103544246 A CN103544246 A CN 103544246A CN 201310470531 A CN201310470531 A CN 201310470531A CN 103544246 A CN103544246 A CN 103544246A
Authority
CN
China
Prior art keywords
word
sentiment dictionary
score
internet
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310470531.2A
Other languages
Chinese (zh)
Inventor
刘奕群
马少平
张敏
金奕江
张阔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Beijing Sogou Technology Development Co Ltd
Original Assignee
Tsinghua University
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Beijing Sogou Technology Development Co Ltd filed Critical Tsinghua University
Priority to CN201310470531.2A priority Critical patent/CN103544246A/en
Publication of CN103544246A publication Critical patent/CN103544246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Abstract

Provided are a method and system for constructing a multi-emotion dictionary for the internet. The method includes the steps that internet text linguistic data are obtained from the internet; data preprocessing is carried out on the obtained text linguistic data to obtain candidate words of the emotional dictionary; new words are extracted from the obtained text linguistic data to obtain candidate words of the emotional dictionary; an undirected graph model is constructed by means of the obtained candidate words of the emotional dictionary; iterative computation is conducted on multiple motion scores of each node in an undirected graph by means of the undirected graph model and a label propagation algorithm to construct the emotion dictionary. According to the method and system, different seed words can be adopted to construct emotion dictionaries with different emotions, and therefore the results of emotion recognition are richer.

Description

The multiple sentiment dictionary construction method in internet and system
Technical field
The present invention relates to network information Intelligent treatment field, particularly relate to and utilize the emotion mood showing in internet text to construct the method and system of sentiment dictionary.
Background technology
Along with the development of internet, Social Media emerges in multitude.Social Media be take internet as medium, for user provides the intercommunion platform of sharing suggestion, experience, has collected a large amount of contents that user produces, and directly reflects people's mood, viewpoint and hobby.Content of text in Social Media comprises blog, micro-blog, forum's discussion, product review etc., is the carrier that user expresses individual emotion, and Social Public Feelings, brand reputation, product evaluation etc. are had a significant impact.Therefore, the text emotion analytical technology for these media becomes hot issue in recent years.Text emotion is analyzed this computer technology, is one section of expressed emotion tendency of text is identified.The emotional expression of people in text is very complicated in theory, except certainly (praising) with oppose (demoting), also may express happiness, indignation, grief, fear, the mood such as surprised.But the correlative study of computational linguistics at present is generally divided into commendation and derogatory sense by emotion tendency, sometimes also comprises neutrality or mixing etc.This degree be reduced at the needs that can meet to a certain extent people, have broad application prospects.
Therefore, the user feeling embodying in identification text, becomes a gordian technique in network information field, in business, politics, social event, plays an important role.For example, in the product review of e-commerce website, by automatically identifying consumer to product, be even appreciation to each attribute of product or criticize, can affect other consumers and make the buying behavior that is applicable to oneself, also can make goods producer find advantage and the deficiency of product, to promote its improvement; Film comment website, viewing person provides evaluation to each factors such as the story of a play or opera of film, performer, photographies, as identified it with automated process, passes judgement on tendency, can make comprehensive understanding to spectators' reflection of a film; In business, the public praise of the user of colony to a certain brand or commodity evaluation formation, one of user profile of businessman's attention, the evaluation that user is passed from mouth to mouth, the reputation of Hui Dui businessman impacts, businessman can expand the impact of product by marketing in internet media, induction user's consumer behavior; By catching the hot issue relevant to certain industry in microblogging, analyze its emotion trend, can predict the tendency of stock; In many political events, netizen utilizes internet as the platform of information transmission and data publish, the all reflections to some extent in microblogging of voter's tendency during so multinational election, different camps, therefore researcher utilizes relevant microblogging to carry out prior forecast or ex-post analysis, the impact of the Probe into Network will of the people on election.
Social Media text is exactly that language is lack of standardization, word free from the outstanding different of traditional media text.Traditional natural language processing method is carried out grammatical analysis to text conventionally, depends on linguistic knowledge.But for Social Media text, due to its text representation may not standard, grammatical, the accuracy of traditional analysis reduces greatly.The neologisms that and for example some users produce, are unexistent (i.e. " unregistered words ") in traditional dictionary, or word meaning great changes will take place, this is very limited the application of classic method.
The recognition result that text emotion is analyzed is the classification such as commendation, derogatory sense normally, so text emotion analysis can adopt the method for machine learning, as classification task, completes.In aforementioned comment on commodity or film comment website, user is conventionally furnished with scoring in comment, and this mark can be used as the marking of comment text emotion degree, i.e. the mark to comment text, therefore these comments and scoring can be used as corpus, for Supervised machine learning process.These methods are all usingd vocabulary (tuple) as feature, and combining classification device (as model-naive Bayesian, maximum entropy model, supporting vector machine model etc.) completes the training and testing of supervision.If lack sufficient corpus, supervised learning method has lost ample scope for abilities.For the huge internet text of this quantity of microblogging, adopt and manually can only mark microblogging text seldom, its suitable application area and scale are restricted.Copy the marking score value of comment website as tag along sort, in microblogging, can suppose emoticon in text (as smiley ":-) " or the face symbol of crying " :-(") represent its emotion tendency, with the appearance of this symbol, as tag along sort, train.But these emoticons often exist noise as tag along sort, and be subject to the restriction of symbol distortion, kind.Therefore, the emotional semantic classification based on supervised learning is subject to severely restricts, and unsupervised learning method based on sentiment dictionary still plays very important effect.
Sentiment dictionary refers to a dictionary that comprises emotion word and emotion tendency thereof.These emotion words be take adjective conventionally as main, express clear and definite emotion tendency, for example " good ", " bad " in word; " happiness ", " sadness " etc.In reality, artificial constructed sentiment dictionary is subject to the restriction of cost and scale, is unsuitable for promoting.And from corpus of text, can utilize the feature of text automatically to build sentiment dictionary.This automated manner conventionally from a small-scale emotion word subset (or rule), utilize afterwards connecting each other between word, expand gradually set, calculate the emotion tendency of more word.Automatically the process that builds sentiment dictionary mainly faces following problem:
Choosing of candidate's emotion word: emotion word majority is adjective, the therefore common only emotion word using adjective as candidate.For slightly complicated situation, can utilize Rule Extraction to go out abundanter emotion word or emotion phrase.
Tolerance lexical relation: for being diffused into large-scale word from small-scale emotion seed word (being called for short seed word), lexical relation should reflect the emotional connection between them.These contacts generally comprise: cooccurrence relation, and this is because commendatory term can occur with commendatory term more jointly, derogatory term can be more and derogatory term co-occurrence, so the cooccurrence relation in sentence can be set up contact between word; Or in employing sentence by conjunction (" with ", " with ", " but ") relation set up, although this mode quantity can not show a candle to the former, quality is higher; Deeper is semantic relation, as utilizes synonym, antonym relation of WordNet etc.
The propagation of emotion tendency: word and the contact between them have formed a figure, need to the emotion propensity score of seed word be propagated into more vocabulary with suitable computing method.For example, the figure building with synonym, antonymy, can be according to the type on these limits, by the word cluster of identical polar; With mutual information (point-wise mutual information, PMI), calculate the relation of neologisms and existing word.In the model based on figure, also can adopt figure to propagate the modes such as (graph propagation) or label propagation (label propagation) and complete.
These problems show, although use sentiment dictionary carries out the method for sentiment analysis, have avoided this bottleneck of corpus, and the structure of sentiment dictionary itself is very important.If the small scale of sentiment dictionary, can omit a lot of emotion words, the emotion of None-identified text tendency, particularly for some short texts, is more difficult for hitting emotion word; If sentiment dictionary is of low quality, also can cause the mistake of sentiment analysis result.
Summary of the invention
In view of above content, be necessary to provide the multiple sentiment dictionary construction method in a kind of internet and system, the cooccurrence relation of its elementary cell (as word, symbol etc.) of utilizing some text representation emotions in internet text, in conjunction with the method for new word discovery, by iteration circulation way, automatically construct sentiment dictionary.
A multiple sentiment dictionary construction method, the method comprises: obtaining step, obtains internet text language material from internet; Data pre-treatment step, carries out data pre-service to obtain the candidate word of sentiment dictionary to obtained corpus of text; Extract neologisms step, from obtained corpus of text, extract neologisms to obtain the candidate word of sentiment dictionary; Design of graphics model step, utilizes the candidate word of resulting sentiment dictionary to build non-directed graph model; Iterative computation step, utilizes the multiple emotion score of each node in non-directed graph model and label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
A multiple sentiment dictionary constructing system, this system comprises: acquisition module, for obtain internet text language material from internet; Data preprocessing module, carries out data pre-service to obtain the candidate word of sentiment dictionary for the corpus of text to obtained; Extract neologisms module, for the corpus of text from obtained, extract neologisms to obtain the candidate word of sentiment dictionary; Design of graphics model module, for utilizing the candidate word of resulting sentiment dictionary to build non-directed graph model; Iterative computation module, for utilizing the multiple emotion score of non-directed graph model and each node of label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
Compared to prior art, the present invention is directed to the deficiency of sentiment dictionary in the existing sentiment analysis algorithm of internet text, propose to build the method for the identification multiple emotion of internet text sentiment dictionary used.Compare with classic method, this method utilizes emotion mark, network neologisms, emotion icons, the mistake of more distinctive informal texts in internet text to write the structure dictionaries such as word, is not limited to traditional emotion word in single language or field.Adopt different seed words can construct the different moods sentiment dictionary of (as happy, angry, grieved, frightened, surprised etc.), make the result of emotion recognition abundanter.
Accompanying drawing explanation
Fig. 1 is the applied environment figure of the multiple sentiment dictionary constructing system in internet of the present invention.
Fig. 2 is the module map of the preferred embodiment of the multiple sentiment dictionary constructing system in internet of the present invention.
Fig. 3 is the process flow diagram of the preferred embodiment of the multiple sentiment dictionary constructing system in internet of the present invention.
Fig. 4 is typical high frequency tuple schematic diagram.
Fig. 5 is non-directed graph model schematic diagram.
Fig. 6 is matrix schematic diagram in the same way.
Fig. 7 is the adopted score schematic diagram of passing judgement on of word.
Fig. 8 is the mood score schematic diagram of word.
Main element symbol description
Calculation element 1
The multiple sentiment dictionary constructing system in internet 10
Storer 20
Processor 30
Display device 40
Input equipment 50
Acquisition module 100
Data preprocessing module 101
Extract neologisms module 102
Design of graphics model module 103
Iterative computation module 104
Following embodiment further illustrates the present invention in connection with above-mentioned accompanying drawing.
Embodiment
As shown in Figure 1, be the applied environment figure of the multiple sentiment dictionary constructing system 10 in internet of the present invention (hereinafter to be referred as system 10) preferred embodiment.This system 10 runs in calculation element 1.This calculation element 1 also comprises by the connected storer 20 of data bus, processor 30, display device 40 and input equipment 50.Calculation element 1 can be computer, mobile phone, PDA(Personal Digital Assistant, personal digital assistant) etc.
Storer 20 is for program code and other data information of storage system 10.Described display device 40 can be that the LCDs of computer is, the touch-screen of mobile phone etc.The various data that described input equipment 50 arranges for inputting user, for example, keyboard, mouse etc.
Consulting shown in Fig. 2, is the functional block diagram of these system 10 preferred embodiments.In one embodiment, this system 10 mainly comprises acquisition module 100, data preprocessing module 101, extracts neologisms module 102, design of graphics model module 103 and iterative computation module 104.Module 100-104 is the program segment that comprises computer instruction, can complete specific function, than program, is more suitable in describing the implementation of software in calculation element 1.The computer instruction that module 100-104 comprises is stored in storer 20, and the processor 30 of calculation element 1 is carried out these computer instructions.Concrete function below in conjunction with Fig. 3 specification module 100-104.
As shown in Figure 3, be the process flow diagram of the preferred embodiment of the multiple sentiment dictionary construction method in internet of the present invention.According to different demands, in this process flow diagram, the order of step can change, and some step can be omitted.
Step S10, acquisition module 100 obtains internet text language material from internet.
Step S11,101 pairs of corpus of text that obtain of data preprocessing module carry out data pre-service to obtain the candidate word of sentiment dictionary.
Data preprocessing module 101 need to be carried out participle to obtained corpus of text.For the corpus of text (as English) of space-separated, can directly pass through space participle; And for Chinese, Japanese etc. without space as the corpus of text of separating, can obtain candidate's word set by the mode of extraction n tuple (n-gram).For this candidate's word set, remove a certain proportion of high frequency tuple (normally stop words etc.) and low frequency tuple (normally name, non-word etc.) afterwards, only get remaining intermediate frequency tuple as the candidate word of sentiment dictionary.If it should be noted that a certain language is adopted to suitable participle instrument, then in conjunction with n tuple generating candidate words collection, can not remove the n tuple of word, improve sentiment dictionary precision.This processing does not hinder the validity of holistic approach.
Internet text language material, as the content sources that builds sentiment dictionary, need to can extract lexical relation through suitable cleaning, and the relevant pre-treatment step such as data scrubbing comprise:
Step 1.1, removes the special word in corpus of text.Special word comprises website links, user name mark, special character etc.
Step 1.2, carries out participle to corpus of text, then based on word segmentation result, generates n tuple (n<4), so that the word of participle mistake is supplemented.Like this, from corpus of text, extract a tuple, two tuples and tlv triple totally three class tuple-sets.As Chinese corpus of text carried out to participle, be to complete based on Chinese lexical analysis system (Institute of Computing Technology, Chinese Lexical Analysis System, ICTCLAS) instrument.
Step 1.3, consider the characteristic of speech sounds of word, in described three class tuple-sets, also remove respectively the rank forefront high frequency tuple (being high frequency words) of default figure place (as ranking forefront 50) and lower than the low frequency tuple (being low-frequency word) of preset times (as 3 times) of in corpus of text occurrence number.High frequency tuple is stop words normally, and they have higher co-occurrence chance with all kinds of words, therefore to the expression of emotional characteristic not obvious; Low frequency tuple is non-word or user name etc. conventionally, and these tuples do not have language meaning, therefore need to be removed.Like this, using occurrence number intermediate frequency tuple placed in the middle as a part of candidate word.
As shown in Figure 4, the microblogging language material using is derived from Tengxun's microblogging, adopts 69,715 microbloggings (152,716 sentences), through removing user name, website links etc., and carries out participle, counts high frequency tuple and low frequency tuple and is removed.Typical high frequency tuple as shown in Figure 4.
Step S12, extracts neologisms module 102 and extracts neologisms to obtain the candidate word of sentiment dictionary from obtained corpus of text.
Except usining n tuple as candidate word, also adopt the method for context entropy and mutual information to find that neologisms are as the candidate word of sentiment dictionary.Because tuple is to carry out on the basis of word segmentation result, may exist participle boundary error to cause the boundary error (candidate's " word " who generates not is actual word) of generating candidate words.If but generated tuple as candidate word based on Zi Wei unit, the noise of a large amount of non-words could be introduced again.Therefore,, except usining the candidate word of word segmentation result generation tuple as sentiment dictionary, also need identification to find that some neologisms are used as the candidate word of sentiment dictionary.The present invention is integrated into two kinds of new word discovery methods in the middle of sentiment analysis sentiment dictionary structure, and these two kinds of new word discovery methods are the new word discovery method of context entropy new word discovery method, mutual information.
(1) context entropy new word discovery side ratio juris is as follows:
Context entropy according to tuple (word) determines that its border extension is to form neologisms.
The left context entropy of a word w of take is example, and its definition left context entropy LCE (w) computing formula is:
LCE ( w ) = - 1 N ( w ) &Sigma; i = 1 s C ( a i , w ) ln C ( a i , w ) N ( w )
Wherein N (w) represents the occurrence number of word w in corpus of text, C (a i, w) be w and another word a ico-occurrence number of times in corpus of text.When calculating LCE (w), a ibe the single candidate word that appears at w left side, the neologisms after expansion are
Figure BDA0000393630020000072
s is candidate word a inumber, the various words that w left side occurs.Lower LCE (w) reflects that the text (above) in w left side is comparatively single, therefore has necessity of further expansion left border.Then use
Figure BDA0000393630020000073
replace the w variable in above-mentioned LCE (w) computing formula, can calculate left context entropy (above entropy) to neologisms
Figure BDA0000393630020000074
and calculate the increment of entropy
&Delta;LCE ( a i w &OverBar; ) = LCE ( a i w &OverBar; ) - LCE ( w )
If this increment is larger, show that old word w left side is unlikely border, and neologisms
Figure BDA0000393630020000077
left side may be more border.At this moment can be by word
Figure BDA0000393630020000081
replace w as new candidate word.Similarly, right context entropy RCE (w) computing formula of word and entropy thereof increase
Figure BDA0000393630020000082
computing formula is:
RCE ( w ) = - 1 N ( w ) &Sigma; i = 1 s C ( w , b i ) ln C ( w , b i ) N ( w )
&Delta;RCE ( wb i &OverBar; ) = RCE ( wb i &OverBar; ) - RCE ( w )
Here b ibe a candidate word of its right side expansion, the neologisms after expansion are
Figure BDA0000393630020000085
n (w) represents the occurrence number of word w in corpus of text, C (w, b i) be w and another word b ico-occurrence number of times in corpus of text.When calculating RCE (w).S is candidate word b inumber, the various words that w right side occurs.
(2) the new word discovery side of mutual information ratio juris is as follows:
According to mutual information, determine whether a tuple should be left neologisms.
Above (left side) word of note word w is a i, below (right side) word is b i.The mutual information of both sides: left side mutual information
Figure BDA0000393630020000086
right side mutual information
Figure BDA0000393630020000087
be defined as:
LPMI ( a i w &OverBar; ) = C ( a i , w ) N ( a i ) N ( w )
RPMI ( w b i &OverBar; ) = C ( w , b i ) N ( w ) N ( b i )
Wherein N (w), N (a i) and N (b i) be respectively word w, a iwith b ioccurrence number, C (a i, w) with C (w, b i) be respectively w and a i, b ione-sided co-occurrence number of times.
When a side mutual information surpasses setting threshold (according to corpus of text adjustment), accept this side word peripheral growth; Until lower than this setting threshold, stop peripheral growth, using current word as neologisms.
For example, the new word discovery algorithm carrying out on above-mentioned participle basis, these neologisms comprise the name that participle dictionary do not include (as " Tim Cook ", " base of a fruit nurse Cook ", or " Liu Zhiwei ”,“He village grand it " etc.), new term (microblogging ”,“ Sina of Ru“ Tengxun microblogging ", " hungry marketing ") and idiom (as " bag postal ", " suffering God's punishment ") etc.
Step S13, design of graphics model module 103 utilizes the candidate word of resulting sentiment dictionary to build non-directed graph model.
After obtaining each candidate word of sentiment dictionary, design of graphics model module 103 calculates each candidate word common number of times occurring in the sentence of corpus of text, as the mutual relationship between any two candidate word (being mutual information value).Take each candidate word as node, and mutual relationship (mutual information value) constructs non-directed graph model G as limit power.In a large amount of corpus of text, the common word occurring more may have close emotion, and two nodes on the limit that therefore limit power is higher in non-directed graph can have close emotion tendency.
Constructed non-directed graph model is out represented with matrix G=(V, E), and this G represents the annexation between candidate word, and wherein V represents the set of candidate word, and E represents the set on limit.The corresponding candidate word (v ∈ V) of each node v in this G, limit (v i, v j) corresponding to two candidate word v iwith v jcooccurrence relation ((v i, v j) ∈ E), limit (v i, v j) weight w ijthese two node v iand v jthe number of times of co-occurrence in corpus of text.
By each co-occurrence matrix W (being the adjacency matrix of G) expression for internodal cooccurrence relation in V, co-occurrence matrix W is symmetrical, the element w in co-occurrence matrix W ijrepresent limit (v i, v j) weight, be this two node v i, v jthe number of times of co-occurrence in corpus of text, the element w on the diagonal line of co-occurrence matrix W iicorresponding to v ithe quantity occurring in corpus of text, this co-occurrence matrix is used at subsequent step S14 iterative computation sentiment dictionary.
For example, as shown in Figure 5, the non-directed graph model of being constructed by corpus of text " I/study/science ", " science/very/profundity ", " I/like/study " three words, its bend is separated the result representing after participle, retain whole words as the candidate word of sentiment dictionary, the corresponding node of each candidate word, single line represents that weight is 1, two-wire represents that weight is 2.The resulting co-occurrence matrix of the non-directed graph model in Fig. 5 as shown in Figure 6, as node " I " and node " study as described in corpus of text the number of times of co-occurrence be 2, the element w in co-occurrence matrix W 12be 2, the element w on the diagonal line of co-occurrence matrix W 33number of times for node " study " occurs in described corpus of text, is 2.
Step S14, iterative computation module 104 utilizes the multiple emotion score of each node in non-directed graph model and label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
Iterative computation module 104 is chosen a small amount of emotion seed word (being seed word) and is given its emotion score (described emotion score comprises mood score, commendation score, derogatory sense score and absolute score in the node of non-directed graph model, described mood score comprises happy score, angry score, grieved score, frightened score and surprised score), again by label propagation algorithm, this emotion score is propagated into the node of whole connections under the effect of limit power, each node will obtain corresponding multiple emotion score.After iteration convergence (score is stable), the node of each connection has been endowed multiple emotion score, the emotion score of each node represents the emotion tendency of the corresponding candidate word of this node, and the candidate word that these nodes are corresponding and multiple emotion score thereof have formed sentiment dictionary.
In this step, choosing of seed word can be some definite emotion words, and different language is chosen accordingly; Also can be the emotion mark of some and language independent, as smiling face's symbol:-) etc.These processing can guarantee the validity of label propagation algorithm to different language.Seed word according to word set from choosing in step S11 and the resulting candidate word of step S12.In the present embodiment, described word set comprises commendation, derogatory sense word set and mood word set, and described commendation, derogatory term centralized procurement pass judgement on the < < student that Zhang Wei etc. writes 728 commendatory terms and 933 derogatory terms that adopted dictionary > > arranges, described mood word centralized procurement Ge Xu, Xinfan Meng, five kinds of mood word sets that Houfeng Wang etc. arrange, comprise happiness, indignation, grieved, frightened and surprised, seed word quantity is respectively 91, 112, 89, (see document Xu G for 103 and 92, Meng X, Wang H.Build Chinese emotion lexicons using a graph-based algorithm and multiple resources.Proceedings of the23rd International Conference on Computational Linguistics, Stroudsburg, PA, USA:Association for Computational Linguistics, 2010.1209 – 1217.).Although these seed words are modular word, the mode of propagating by iteration can also be given certain emotion score to other candidate word in non-directed graph (as neologisms, icon) etc.
The seed word that iterative computation module 104 is chosen different emotions (as commendation, derogatory sense etc., mood (as happy, angry, grieved etc.)) carries out respectively iteration, the different emotions score of each node has been calculated respectively, adopt different emotions seed word, can obtain the sentiment dictionary of corresponding emotion, as adopted mood seed word to carry out iterative computation, just can obtain the sentiment dictionary that the mood score by candidate word corresponding to each node is formed.
Adopt label transmission method, from the seed word of choosing, its emotion score is propagated in the node of all connections in non-directed graph.Iterative process is as shown in the formula description:
x (k+1)=W·x (k)+b
Wherein, x (k)the emotion score vector that represents the node after the k time iteration.Formula thus, the result of calculation x of new round iteration (k+1)after acting on previous round vector by co-occurrence matrix W and bias vector b, draw.At each, take turns after iterative computation, result is normalized, iterative process finally restrains.B is made as Seeding vector x in the present invention (0), to strengthen the effect of seed.Select after seed, vector x (0)the dimension value that middle seed word is corresponding is 1, and other dimension values are 0.
As shown in Figure 7,728 commendatory terms that adopt that < < student that above-mentioned Zhang Wei etc. writes passes judgement on that adopted dictionary > > arranges and 933 derogatory terms are as the word set of seed word, carry out successively iteration and propagate and to calculate the adopted emotion score of passing judgement on of each node (candidate word), some of them word pass judgement on adopted emotion score example as shown in Figure 7; As shown in Figure 8, adopt five kinds of mood word sets of the arrangements such as above-mentioned Ge Xu, Xinfan Meng, Houfeng Wang as the word set of seed word, calculate five kinds of mood degree scores of each node, these scores embody the present invention in the dirigibility of the multiple emotion degree of identification.
The multiple sentiment dictionary construction method in internet of the present invention and system, for the deficiency of sentiment dictionary in the existing sentiment analysis algorithm of internet text, propose to build the method for the identification multiple emotion of internet text sentiment dictionary used.The cooccurrence relation of the elementary cell (as word, symbol etc.) that the method is utilized some text representation emotions in internet text, in conjunction with the method for new word discovery, constructs sentiment dictionary automatically by iteration circulation way.Compare with classic method, the present invention utilizes emotion mark, network neologisms, emotion icons, the mistake of more distinctive informal texts in internet text to write the structure dictionaries such as word, is not limited to traditional emotion word in single language or field.Adopt different seed words can construct the different moods sentiment dictionary of (as happy, angry, grieved, frightened, surprised etc.), make the result of emotion recognition abundanter, and then identify the emotion that whole section of text representation goes out.And experimental result shows, the present invention also has required language material scale without excessive, is not subject to the advantages such as time restriction, so this invention has suitable application area, language is wide, the variation of identification affective style.
Finally it should be noted that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not depart from the spirit and scope of technical solution of the present invention.

Claims (22)

1. the multiple sentiment dictionary construction method in internet, is characterized in that, the method comprises:
Obtaining step, obtains internet text language material from internet;
Data pre-treatment step, carries out data pre-service to obtain the candidate word of sentiment dictionary to obtained corpus of text;
Extract neologisms step, from obtained corpus of text, extract neologisms to obtain the candidate word of sentiment dictionary;
Design of graphics model step, utilizes the candidate word of resulting sentiment dictionary to build non-directed graph model;
Iterative computation step, utilizes the multiple emotion score of each node in non-directed graph model and label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
2. the multiple sentiment dictionary construction method in internet as claimed in claim 1, is characterized in that, described data pre-treatment step comprises:
Remove step, remove the special word in corpus of text;
Participle and extraction step, carry out participle to corpus of text, and generate n tuple based on word segmentation result, extracts a tuple, two tuples and tlv triple totally three class tuple-set, wherein n<4 from corpus of text;
Remove step, in described three class tuple-sets, remove respectively the rank forefront high frequency tuple of default figure place and lower than preset times low frequency tuple of in corpus of text occurrence number, using the candidate word as a part of sentiment dictionary by occurrence number intermediate frequency tuple placed in the middle.
3. the multiple sentiment dictionary construction method in internet as claimed in claim 1, is characterized in that, comprises: the new word discovery method of context entropy new word discovery method and mutual information in described extraction neologisms step from the method for obtained corpus of text extraction neologisms.
4. the multiple sentiment dictionary construction method in internet as claimed in claim 3, is characterized in that, described design of graphics model step comprises:
Calculation procedure, calculates the number of times of each candidate word common appearance in the sentence of corpus of text of sentiment dictionary, as the mutual relationship between any two candidate word;
Build non-directed graph model step, take each candidate word as node, mutual relationship is weighed as limit, builds non-directed graph model.
5. the multiple sentiment dictionary construction method in internet as claimed in claim 4, it is characterized in that, in building non-directed graph model step, by matrix G=(V for constructed non-directed graph model, E) represent, this G is used for representing the annexation between candidate word, and wherein V represents the set of candidate word, and E represents the set on limit;
The corresponding candidate word of each node v in this G, v ∈ V wherein, limit (v i, v j) corresponding to two candidate word v iwith v jcooccurrence relation, (v wherein i, v j) ∈ E;
Each internodal cooccurrence relation in V is represented by co-occurrence matrix W, and co-occurrence matrix W is the adjacency matrix of G, and it is symmetrical, the element w in co-occurrence matrix W ijrepresent limit (v i, v j) weight, be this two node v i, v jthe number of times of co-occurrence in corpus of text, the element w on the diagonal line of co-occurrence matrix W iicorresponding to v ithe quantity occurring in corpus of text.
6. the multiple sentiment dictionary construction method in internet as claimed in claim 5, is characterized in that, described iterative computation step comprises:
Selecting step, in the node of non-directed graph model, its emotion score given in selected seed word;
Propagation steps, by label propagation algorithm, under the effect of limit power, propagates into this emotion score from the seed word of choosing the node being all communicated with non-directed graph, and each node will obtain corresponding multiple emotion score;
Build sentiment dictionary step, after iteration convergence, the node of each connection has been endowed multiple emotion score, and the emotion score of each node represents the emotion tendency of the corresponding candidate word of this node, and the candidate word that these nodes are corresponding and multiple emotion score thereof have formed sentiment dictionary.
7. the multiple sentiment dictionary construction method in internet as claimed in claim 2, is characterized in that, described high frequency tuple is stop words, and it has higher co-occurrence chance with all kinds of words; Described low frequency tuple is non-word, user name.
8. the multiple sentiment dictionary construction method in internet as claimed in claim 1, is characterized in that, described emotion score comprises mood score, commendation score, derogatory sense score and absolute score,
Described mood score comprises happy score, angry score, grieved score, frightened score and surprised score.
9. the multiple sentiment dictionary construction method in internet as claimed in claim 6, is characterized in that, described seed root is chosen out from each candidate word of resulting sentiment dictionary according to word set, it comprise definite emotion word and with the emotion mark of language independent.
10. the multiple sentiment dictionary construction method in internet as claimed in claim 6, is characterized in that, the iterative process in propagation steps is as shown in the formula description:
x (k+1)=W·x (k)+b
X wherein (k)the emotion score vector that represents the node after the k time iteration, formula thus, the result of calculation x of new round iteration (k+1)after acting on previous round vector by co-occurrence matrix W and bias vector b, draw, at each, take turns after iterative computation, result is normalized, and iterative process finally restrains.
The 11. multiple sentiment dictionary construction methods in internet as claimed in claim 10, is characterized in that, b is taken as Seeding vector x (0), to strengthen the effect of seed, select after seed, vector x (0)the dimension value that middle seed word is corresponding is 1, and other dimension values are 0.
12. 1 kinds of multiple sentiment dictionary constructing systems in internet, is characterized in that, this system comprises:
Acquisition module, for obtaining internet text language material from internet;
Data preprocessing module, carries out data pre-service to obtain the candidate word of sentiment dictionary for the corpus of text to obtained;
Extract neologisms module, for the corpus of text from obtained, extract neologisms to obtain the candidate word of sentiment dictionary;
Design of graphics model module, for utilizing the candidate word of resulting sentiment dictionary to build non-directed graph model;
Iterative computation module, for utilizing the multiple emotion score of non-directed graph model and each node of label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
The 13. multiple sentiment dictionary constructing systems in internet as claimed in claim 12, is characterized in that, the process that described data preprocessing module is processed comprises:
Remove the special word in corpus of text;
Corpus of text is carried out to participle, and generate n tuple based on word segmentation result, from corpus of text, extract a tuple, two tuples and tlv triple totally three class tuple-set, wherein n<4;
In described three class tuple-sets, remove respectively the rank forefront high frequency tuple of default figure place and lower than preset times low frequency tuple of in corpus of text occurrence number, using the candidate word as a part of sentiment dictionary by occurrence number intermediate frequency tuple placed in the middle.
The 14. multiple sentiment dictionary constructing systems in internet as claimed in claim 12, is characterized in that, comprise: the new word discovery method of context entropy new word discovery method and mutual information in described extraction neologisms module from the method for obtained corpus of text extraction neologisms.
The 15. multiple sentiment dictionary constructing systems in internet as claimed in claim 14, is characterized in that, the process that described design of graphics model module builds comprises:
Calculate each candidate word common number of times occurring in the sentence of corpus of text of sentiment dictionary, as the mutual relationship between any two candidate word;
Take each candidate word as node, and mutual relationship is weighed as limit, builds non-directed graph model.
The 16. multiple sentiment dictionary constructing systems in internet as claimed in claim 15, it is characterized in that, in building non-directed graph model step, by matrix G=(V for constructed non-directed graph model, E) represent, this G is used for representing the annexation between candidate word, and wherein V represents candidate word set, and E represents the set on limit;
The corresponding candidate word of each node v in this G, v ∈ V wherein, limit (v i, v j) corresponding to two candidate word v iwith v jcooccurrence relation, (v wherein i, v j) ∈ E;
Each internodal cooccurrence relation in V is represented by co-occurrence matrix W, and co-occurrence matrix W is the adjacency matrix of G, and it is symmetrical, the element w in co-occurrence matrix W ijrepresent limit (v i, v j) weight, be this two node v i, v jthe number of times of co-occurrence in corpus of text, the element w on the diagonal line of co-occurrence matrix W iicorresponding to v ithe quantity occurring in corpus of text.
The 17. multiple sentiment dictionary constructing systems in internet as claimed in claim 16, is characterized in that, the computation process of described iterative computation module comprises:
In the node of non-directed graph model, its emotion score given in selected seed word;
By label propagation algorithm, under the effect of limit power, from the seed word of choosing, this emotion score is propagated into the node being all communicated with non-directed graph, each node will obtain corresponding multiple emotion score;
After iteration convergence, the node of each connection has been endowed multiple emotion score, and the emotion score of each node represents the emotion tendency of the corresponding candidate word of this node, and the candidate word that these nodes are corresponding and multiple emotion score thereof have formed sentiment dictionary.
The 18. multiple sentiment dictionary constructing systems in internet as claimed in claim 13, is characterized in that, described high frequency tuple is stop words, and it has higher co-occurrence chance with all kinds of words; Described low frequency tuple is non-word, user name.
The 19. multiple sentiment dictionary constructing systems in internet as claimed in claim 12, it is characterized in that, described emotion score comprises mood score, commendation score, derogatory sense score and absolute score, and described mood score comprises happy score, angry score, grieved score, frightened score and surprised score.
The 20. multiple sentiment dictionary constructing systems in internet as claimed in claim 17, it is characterized in that, described seed root is chosen out from each candidate word of resulting sentiment dictionary according to word set, it comprise definite emotion word and with the emotion mark of language independent.
The 21. multiple sentiment dictionary constructing systems in internet as claimed in claim 17, is characterized in that, the iterative process by label propagation algorithm is as shown in the formula description:
x (k+1)=W·x (k)+b
X wherein (k)the emotion score vector that represents the node after the k time iteration, formula thus, the result of calculation x of new round iteration (k+1)after acting on previous round vector by co-occurrence matrix W and bias vector b, draw, at each, take turns after iterative computation, result is normalized, and iterative process finally restrains.
The 22. multiple sentiment dictionary constructing systems in internet as claimed in claim 21, is characterized in that, b is taken as Seeding vector x (0), to strengthen the effect of seed, select after seed, vector x (0)the dimension value that middle seed word is corresponding is 1, and other dimension values are 0.
CN201310470531.2A 2013-10-10 2013-10-10 Method and system for constructing multi-emotion dictionary for internet Pending CN103544246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310470531.2A CN103544246A (en) 2013-10-10 2013-10-10 Method and system for constructing multi-emotion dictionary for internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310470531.2A CN103544246A (en) 2013-10-10 2013-10-10 Method and system for constructing multi-emotion dictionary for internet

Publications (1)

Publication Number Publication Date
CN103544246A true CN103544246A (en) 2014-01-29

Family

ID=49967698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310470531.2A Pending CN103544246A (en) 2013-10-10 2013-10-10 Method and system for constructing multi-emotion dictionary for internet

Country Status (1)

Country Link
CN (1) CN103544246A (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090864A (en) * 2014-06-09 2014-10-08 合肥工业大学 Emotion dictionary building and emotion calculation method
CN104809108A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Information monitoring and analyzing system
CN104933442A (en) * 2015-06-16 2015-09-23 陕西师范大学 Method for propagating image label based on minimal cost path
CN105005553A (en) * 2015-06-19 2015-10-28 四川大学 Emotional thesaurus based short text emotional tendency analysis method
CN105138510A (en) * 2015-08-10 2015-12-09 昆明理工大学 Microblog-based neologism emotional tendency judgment method
CN105956197A (en) * 2016-06-15 2016-09-21 杭州量知数据科技有限公司 Social media graph representation model-based social risk event extraction method
CN106681985A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Establishment system of multi-field dictionaries based on theme automatic matching
CN106682128A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Method for automatic establishment of multi-field dictionaries
CN106681986A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Multi-dimensional sentiment analysis system
CN106874275A (en) * 2015-12-10 2017-06-20 北京新媒传信科技有限公司 Build the method and device of sentiment dictionary
CN107291686A (en) * 2016-04-13 2017-10-24 北京大学 The discrimination method of emotion identification and the identification system of emotion identification
CN107609389A (en) * 2017-08-24 2018-01-19 南京理工大学 A kind of verification method and system of image content-based correlation
CN108280063A (en) * 2018-01-19 2018-07-13 中国科学院软件研究所 Semantic analysis based on semi-supervised learning and system
CN108363699A (en) * 2018-03-21 2018-08-03 浙江大学城市学院 A kind of netizen's school work mood analysis method based on Baidu's mhkc
CN108388608A (en) * 2018-02-06 2018-08-10 金蝶软件(中国)有限公司 Emotion feedback method, device, computer equipment and storage medium based on text perception
CN108509492A (en) * 2018-02-12 2018-09-07 郑长敬 Big data processing based on real estate industry and system
CN108563688A (en) * 2018-03-15 2018-09-21 西安影视数据评估中心有限公司 A kind of movie and television play principle thread recognition methods
CN108563635A (en) * 2018-04-04 2018-09-21 北京理工大学 A kind of sentiment dictionary fast construction method based on emotion wheel model
CN108572961A (en) * 2017-03-08 2018-09-25 北京嘀嘀无限科技发展有限公司 A kind of the vectorization method and device of text
CN108647191A (en) * 2018-05-17 2018-10-12 南京大学 It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method
CN108694165A (en) * 2017-04-10 2018-10-23 南京理工大学 Cross-cutting antithesis sentiment analysis method towards product review
CN108804412A (en) * 2018-04-13 2018-11-13 中国科学院自动化研究所 Multi-layer sentiment analysis method based on Social Media
CN110069780A (en) * 2019-04-19 2019-07-30 中译语通科技股份有限公司 A kind of emotion word recognition method and system based on specific area text
CN110377916A (en) * 2018-08-17 2019-10-25 腾讯科技(深圳)有限公司 Word prediction technique, device, computer equipment and storage medium
CN110399595A (en) * 2019-07-31 2019-11-01 腾讯科技(成都)有限公司 A kind of method and relevant apparatus of text information mark
CN110489553A (en) * 2019-07-26 2019-11-22 湖南大学 A kind of sensibility classification method based on Multi-source Information Fusion
CN110795558A (en) * 2019-09-03 2020-02-14 腾讯科技(深圳)有限公司 Label acquisition method and device, storage medium and electronic device
CN110858099A (en) * 2018-08-20 2020-03-03 北京搜狗科技发展有限公司 Candidate word generation method and device
CN111221962A (en) * 2019-11-18 2020-06-02 重庆邮电大学 Text emotion analysis method based on new word expansion and complex sentence pattern expansion
CN111325018A (en) * 2020-01-21 2020-06-23 上海恒企教育培训有限公司 Domain dictionary construction method based on web retrieval and new word discovery
CN112905736A (en) * 2021-01-27 2021-06-04 郑州轻工业大学 Unsupervised text emotion analysis method based on quantum theory
CN116522901A (en) * 2023-06-29 2023-08-01 金锐同创(北京)科技股份有限公司 Method, device, equipment and medium for analyzing attention information of IT community

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186704A1 (en) * 2002-12-11 2004-09-23 Jiping Sun Fuzzy based natural speech concept system
CN102163191A (en) * 2011-05-11 2011-08-24 北京航空航天大学 Short text emotion recognition method based on HowNet
CN102236650A (en) * 2010-04-20 2011-11-09 日电(中国)有限公司 Method and device for correcting and/or expanding sentiment dictionary
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN102890707A (en) * 2012-08-28 2013-01-23 华南理工大学 System for mining emotional tendencies of brief network comments based on conditional random field

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186704A1 (en) * 2002-12-11 2004-09-23 Jiping Sun Fuzzy based natural speech concept system
CN102236650A (en) * 2010-04-20 2011-11-09 日电(中国)有限公司 Method and device for correcting and/or expanding sentiment dictionary
CN102163191A (en) * 2011-05-11 2011-08-24 北京航空航天大学 Short text emotion recognition method based on HowNet
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN102890707A (en) * 2012-08-28 2013-01-23 华南理工大学 System for mining emotional tendencies of brief network comments based on conditional random field

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANQI CUI等: "Emotion Tokens:Bridging the Gap among Multilingual Twitter Sentiment Analysis", 《INFORMATION RETRIEVAL TECHNOLOGY》 *
JIN HU HUANG等: "Chinese Word Segmentation based on Contextual Entropy", 《PACIFIC ASIA CONFERENCE ON LANGUAGE》 *
陈晓东: "基于情感词典的中文微博情感倾向分析研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090864A (en) * 2014-06-09 2014-10-08 合肥工业大学 Emotion dictionary building and emotion calculation method
CN104090864B (en) * 2014-06-09 2018-02-06 合肥工业大学 A kind of sentiment dictionary is established and affection computation method
CN104809108A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Information monitoring and analyzing system
CN104809108B (en) * 2015-05-20 2018-10-09 元力云网络有限公司 Information monitoring analysis system
CN104933442A (en) * 2015-06-16 2015-09-23 陕西师范大学 Method for propagating image label based on minimal cost path
CN105005553A (en) * 2015-06-19 2015-10-28 四川大学 Emotional thesaurus based short text emotional tendency analysis method
CN105005553B (en) * 2015-06-19 2017-11-21 四川大学 Short text Sentiment orientation analysis method based on sentiment dictionary
CN105138510A (en) * 2015-08-10 2015-12-09 昆明理工大学 Microblog-based neologism emotional tendency judgment method
CN105138510B (en) * 2015-08-10 2018-05-25 昆明理工大学 A kind of neologisms Sentiment orientation determination method based on microblogging
CN106874275A (en) * 2015-12-10 2017-06-20 北京新媒传信科技有限公司 Build the method and device of sentiment dictionary
CN106874275B (en) * 2015-12-10 2020-02-07 北京新媒传信科技有限公司 Method and device for constructing emotion dictionary
CN107291686B (en) * 2016-04-13 2020-10-16 北京大学 Method and system for identifying emotion identification
CN107291686A (en) * 2016-04-13 2017-10-24 北京大学 The discrimination method of emotion identification and the identification system of emotion identification
CN105956197A (en) * 2016-06-15 2016-09-21 杭州量知数据科技有限公司 Social media graph representation model-based social risk event extraction method
CN106681986A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Multi-dimensional sentiment analysis system
CN106682128A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Method for automatic establishment of multi-field dictionaries
CN106681985A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Establishment system of multi-field dictionaries based on theme automatic matching
CN108572961A (en) * 2017-03-08 2018-09-25 北京嘀嘀无限科技发展有限公司 A kind of the vectorization method and device of text
CN108694165A (en) * 2017-04-10 2018-10-23 南京理工大学 Cross-cutting antithesis sentiment analysis method towards product review
CN108694165B (en) * 2017-04-10 2021-11-09 南京理工大学 Cross-domain dual emotion analysis method for product comments
CN107609389A (en) * 2017-08-24 2018-01-19 南京理工大学 A kind of verification method and system of image content-based correlation
CN107609389B (en) * 2017-08-24 2020-10-30 南京理工大学 Verification method and system based on image content correlation
CN108280063A (en) * 2018-01-19 2018-07-13 中国科学院软件研究所 Semantic analysis based on semi-supervised learning and system
CN108388608A (en) * 2018-02-06 2018-08-10 金蝶软件(中国)有限公司 Emotion feedback method, device, computer equipment and storage medium based on text perception
CN108388608B (en) * 2018-02-06 2020-08-04 金蝶软件(中国)有限公司 Emotion feedback method and device based on text perception, computer equipment and storage medium
CN108509492A (en) * 2018-02-12 2018-09-07 郑长敬 Big data processing based on real estate industry and system
CN108563688B (en) * 2018-03-15 2021-06-04 西安影视数据评估中心有限公司 Emotion recognition method for movie and television script characters
CN108563688A (en) * 2018-03-15 2018-09-21 西安影视数据评估中心有限公司 A kind of movie and television play principle thread recognition methods
CN108363699A (en) * 2018-03-21 2018-08-03 浙江大学城市学院 A kind of netizen's school work mood analysis method based on Baidu's mhkc
CN108563635A (en) * 2018-04-04 2018-09-21 北京理工大学 A kind of sentiment dictionary fast construction method based on emotion wheel model
CN108804412A (en) * 2018-04-13 2018-11-13 中国科学院自动化研究所 Multi-layer sentiment analysis method based on Social Media
CN108647191A (en) * 2018-05-17 2018-10-12 南京大学 It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method
CN110377916B (en) * 2018-08-17 2022-12-16 腾讯科技(深圳)有限公司 Word prediction method, word prediction device, computer equipment and storage medium
CN110377916A (en) * 2018-08-17 2019-10-25 腾讯科技(深圳)有限公司 Word prediction technique, device, computer equipment and storage medium
CN110858099A (en) * 2018-08-20 2020-03-03 北京搜狗科技发展有限公司 Candidate word generation method and device
CN110858099B (en) * 2018-08-20 2024-04-12 北京搜狗科技发展有限公司 Candidate word generation method and device
CN110069780A (en) * 2019-04-19 2019-07-30 中译语通科技股份有限公司 A kind of emotion word recognition method and system based on specific area text
CN110069780B (en) * 2019-04-19 2021-11-19 中译语通科技股份有限公司 Specific field text-based emotion word recognition method
CN110489553B (en) * 2019-07-26 2022-07-05 湖南大学 Multi-source information fusion-based emotion classification method
CN110489553A (en) * 2019-07-26 2019-11-22 湖南大学 A kind of sensibility classification method based on Multi-source Information Fusion
CN110399595B (en) * 2019-07-31 2024-04-05 腾讯科技(成都)有限公司 Text information labeling method and related device
CN110399595A (en) * 2019-07-31 2019-11-01 腾讯科技(成都)有限公司 A kind of method and relevant apparatus of text information mark
CN110795558A (en) * 2019-09-03 2020-02-14 腾讯科技(深圳)有限公司 Label acquisition method and device, storage medium and electronic device
CN110795558B (en) * 2019-09-03 2023-09-29 腾讯科技(深圳)有限公司 Label acquisition method and device, storage medium and electronic device
CN111221962B (en) * 2019-11-18 2023-05-26 重庆邮电大学 Text emotion analysis method based on new word expansion and complex sentence pattern expansion
CN111221962A (en) * 2019-11-18 2020-06-02 重庆邮电大学 Text emotion analysis method based on new word expansion and complex sentence pattern expansion
CN111325018A (en) * 2020-01-21 2020-06-23 上海恒企教育培训有限公司 Domain dictionary construction method based on web retrieval and new word discovery
CN111325018B (en) * 2020-01-21 2023-08-11 上海恒企教育培训有限公司 Domain dictionary construction method based on web retrieval and new word discovery
CN112905736A (en) * 2021-01-27 2021-06-04 郑州轻工业大学 Unsupervised text emotion analysis method based on quantum theory
CN112905736B (en) * 2021-01-27 2023-09-19 郑州轻工业大学 Quantum theory-based unsupervised text emotion analysis method
CN116522901B (en) * 2023-06-29 2023-09-15 金锐同创(北京)科技股份有限公司 Method, device, equipment and medium for analyzing attention information of IT community
CN116522901A (en) * 2023-06-29 2023-08-01 金锐同创(北京)科技股份有限公司 Method, device, equipment and medium for analyzing attention information of IT community

Similar Documents

Publication Publication Date Title
CN103544246A (en) Method and system for constructing multi-emotion dictionary for internet
Boudad et al. Sentiment analysis in Arabic: A review of the literature
Li et al. Sentiment analysis of danmaku videos based on naïve bayes and sentiment dictionary
Alessia et al. Approaches, tools and applications for sentiment analysis implementation
Ren et al. Semi-automatic creation of youth slang corpus and its application to affective computing
Seifollahi et al. Word sense disambiguation application in sentiment analysis of news headlines: an applied approach to FOREX market prediction
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN110765769B (en) Clause feature-based entity attribute dependency emotion analysis method
CN108108468A (en) A kind of short text sentiment analysis method and apparatus based on concept and text emotion
Zhao et al. Sentiment analysis on the online reviews based on hidden Markov model
Hu et al. Text sentiment analysis: A review
Dubey et al. Extended opinion lexicon and ML-based sentiment analysis of tweets: a novel approach towards accurate classifier
Lertpiya et al. A preliminary study on fundamental Thai NLP tasks for user-generated web content
Nguyen et al. Building a chatbot system to analyze opinions of english comments
Mehta et al. Enhancement of SentiWordNet using contextual valence shifters
Soni et al. Comparative analysis of rotten tomatoes movie reviews using sentiment analysis
Alshahrani et al. Word mover's distance for affect detection
Walha et al. ETL design toward social network opinion analysis
Le A hybrid method for text-based sentiment analysis
Priego Sánchez et al. Idiom polarity identification using contextual information
Elyasir et al. Opinion mining framework in the education domain
Jiang et al. Transfer learning based recurrent neural network algorithm for linguistic analysis
Chen et al. A cross-lingual hybrid neural network with interaction enhancement for grading short-answer texts
Dai et al. Unlock big data emotions: Weighted word embeddings for sentiment classification
Fadili Optimized Sentiments analysis Approach, Based On Aspects, Attention and Subjectivity notions For Textual Business Intelligence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140129

RJ01 Rejection of invention patent application after publication