CN103544246A - Method and system for constructing multi-emotion dictionary for internet - Google Patents
Method and system for constructing multi-emotion dictionary for internet Download PDFInfo
- Publication number
- CN103544246A CN103544246A CN201310470531.2A CN201310470531A CN103544246A CN 103544246 A CN103544246 A CN 103544246A CN 201310470531 A CN201310470531 A CN 201310470531A CN 103544246 A CN103544246 A CN 103544246A
- Authority
- CN
- China
- Prior art keywords
- word
- sentiment dictionary
- score
- internet
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Abstract
Provided are a method and system for constructing a multi-emotion dictionary for the internet. The method includes the steps that internet text linguistic data are obtained from the internet; data preprocessing is carried out on the obtained text linguistic data to obtain candidate words of the emotional dictionary; new words are extracted from the obtained text linguistic data to obtain candidate words of the emotional dictionary; an undirected graph model is constructed by means of the obtained candidate words of the emotional dictionary; iterative computation is conducted on multiple motion scores of each node in an undirected graph by means of the undirected graph model and a label propagation algorithm to construct the emotion dictionary. According to the method and system, different seed words can be adopted to construct emotion dictionaries with different emotions, and therefore the results of emotion recognition are richer.
Description
Technical field
The present invention relates to network information Intelligent treatment field, particularly relate to and utilize the emotion mood showing in internet text to construct the method and system of sentiment dictionary.
Background technology
Along with the development of internet, Social Media emerges in multitude.Social Media be take internet as medium, for user provides the intercommunion platform of sharing suggestion, experience, has collected a large amount of contents that user produces, and directly reflects people's mood, viewpoint and hobby.Content of text in Social Media comprises blog, micro-blog, forum's discussion, product review etc., is the carrier that user expresses individual emotion, and Social Public Feelings, brand reputation, product evaluation etc. are had a significant impact.Therefore, the text emotion analytical technology for these media becomes hot issue in recent years.Text emotion is analyzed this computer technology, is one section of expressed emotion tendency of text is identified.The emotional expression of people in text is very complicated in theory, except certainly (praising) with oppose (demoting), also may express happiness, indignation, grief, fear, the mood such as surprised.But the correlative study of computational linguistics at present is generally divided into commendation and derogatory sense by emotion tendency, sometimes also comprises neutrality or mixing etc.This degree be reduced at the needs that can meet to a certain extent people, have broad application prospects.
Therefore, the user feeling embodying in identification text, becomes a gordian technique in network information field, in business, politics, social event, plays an important role.For example, in the product review of e-commerce website, by automatically identifying consumer to product, be even appreciation to each attribute of product or criticize, can affect other consumers and make the buying behavior that is applicable to oneself, also can make goods producer find advantage and the deficiency of product, to promote its improvement; Film comment website, viewing person provides evaluation to each factors such as the story of a play or opera of film, performer, photographies, as identified it with automated process, passes judgement on tendency, can make comprehensive understanding to spectators' reflection of a film; In business, the public praise of the user of colony to a certain brand or commodity evaluation formation, one of user profile of businessman's attention, the evaluation that user is passed from mouth to mouth, the reputation of Hui Dui businessman impacts, businessman can expand the impact of product by marketing in internet media, induction user's consumer behavior; By catching the hot issue relevant to certain industry in microblogging, analyze its emotion trend, can predict the tendency of stock; In many political events, netizen utilizes internet as the platform of information transmission and data publish, the all reflections to some extent in microblogging of voter's tendency during so multinational election, different camps, therefore researcher utilizes relevant microblogging to carry out prior forecast or ex-post analysis, the impact of the Probe into Network will of the people on election.
Social Media text is exactly that language is lack of standardization, word free from the outstanding different of traditional media text.Traditional natural language processing method is carried out grammatical analysis to text conventionally, depends on linguistic knowledge.But for Social Media text, due to its text representation may not standard, grammatical, the accuracy of traditional analysis reduces greatly.The neologisms that and for example some users produce, are unexistent (i.e. " unregistered words ") in traditional dictionary, or word meaning great changes will take place, this is very limited the application of classic method.
The recognition result that text emotion is analyzed is the classification such as commendation, derogatory sense normally, so text emotion analysis can adopt the method for machine learning, as classification task, completes.In aforementioned comment on commodity or film comment website, user is conventionally furnished with scoring in comment, and this mark can be used as the marking of comment text emotion degree, i.e. the mark to comment text, therefore these comments and scoring can be used as corpus, for Supervised machine learning process.These methods are all usingd vocabulary (tuple) as feature, and combining classification device (as model-naive Bayesian, maximum entropy model, supporting vector machine model etc.) completes the training and testing of supervision.If lack sufficient corpus, supervised learning method has lost ample scope for abilities.For the huge internet text of this quantity of microblogging, adopt and manually can only mark microblogging text seldom, its suitable application area and scale are restricted.Copy the marking score value of comment website as tag along sort, in microblogging, can suppose emoticon in text (as smiley ":-) " or the face symbol of crying " :-(") represent its emotion tendency, with the appearance of this symbol, as tag along sort, train.But these emoticons often exist noise as tag along sort, and be subject to the restriction of symbol distortion, kind.Therefore, the emotional semantic classification based on supervised learning is subject to severely restricts, and unsupervised learning method based on sentiment dictionary still plays very important effect.
Sentiment dictionary refers to a dictionary that comprises emotion word and emotion tendency thereof.These emotion words be take adjective conventionally as main, express clear and definite emotion tendency, for example " good ", " bad " in word; " happiness ", " sadness " etc.In reality, artificial constructed sentiment dictionary is subject to the restriction of cost and scale, is unsuitable for promoting.And from corpus of text, can utilize the feature of text automatically to build sentiment dictionary.This automated manner conventionally from a small-scale emotion word subset (or rule), utilize afterwards connecting each other between word, expand gradually set, calculate the emotion tendency of more word.Automatically the process that builds sentiment dictionary mainly faces following problem:
Choosing of candidate's emotion word: emotion word majority is adjective, the therefore common only emotion word using adjective as candidate.For slightly complicated situation, can utilize Rule Extraction to go out abundanter emotion word or emotion phrase.
Tolerance lexical relation: for being diffused into large-scale word from small-scale emotion seed word (being called for short seed word), lexical relation should reflect the emotional connection between them.These contacts generally comprise: cooccurrence relation, and this is because commendatory term can occur with commendatory term more jointly, derogatory term can be more and derogatory term co-occurrence, so the cooccurrence relation in sentence can be set up contact between word; Or in employing sentence by conjunction (" with ", " with ", " but ") relation set up, although this mode quantity can not show a candle to the former, quality is higher; Deeper is semantic relation, as utilizes synonym, antonym relation of WordNet etc.
The propagation of emotion tendency: word and the contact between them have formed a figure, need to the emotion propensity score of seed word be propagated into more vocabulary with suitable computing method.For example, the figure building with synonym, antonymy, can be according to the type on these limits, by the word cluster of identical polar; With mutual information (point-wise mutual information, PMI), calculate the relation of neologisms and existing word.In the model based on figure, also can adopt figure to propagate the modes such as (graph propagation) or label propagation (label propagation) and complete.
These problems show, although use sentiment dictionary carries out the method for sentiment analysis, have avoided this bottleneck of corpus, and the structure of sentiment dictionary itself is very important.If the small scale of sentiment dictionary, can omit a lot of emotion words, the emotion of None-identified text tendency, particularly for some short texts, is more difficult for hitting emotion word; If sentiment dictionary is of low quality, also can cause the mistake of sentiment analysis result.
Summary of the invention
In view of above content, be necessary to provide the multiple sentiment dictionary construction method in a kind of internet and system, the cooccurrence relation of its elementary cell (as word, symbol etc.) of utilizing some text representation emotions in internet text, in conjunction with the method for new word discovery, by iteration circulation way, automatically construct sentiment dictionary.
A multiple sentiment dictionary construction method, the method comprises: obtaining step, obtains internet text language material from internet; Data pre-treatment step, carries out data pre-service to obtain the candidate word of sentiment dictionary to obtained corpus of text; Extract neologisms step, from obtained corpus of text, extract neologisms to obtain the candidate word of sentiment dictionary; Design of graphics model step, utilizes the candidate word of resulting sentiment dictionary to build non-directed graph model; Iterative computation step, utilizes the multiple emotion score of each node in non-directed graph model and label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
A multiple sentiment dictionary constructing system, this system comprises: acquisition module, for obtain internet text language material from internet; Data preprocessing module, carries out data pre-service to obtain the candidate word of sentiment dictionary for the corpus of text to obtained; Extract neologisms module, for the corpus of text from obtained, extract neologisms to obtain the candidate word of sentiment dictionary; Design of graphics model module, for utilizing the candidate word of resulting sentiment dictionary to build non-directed graph model; Iterative computation module, for utilizing the multiple emotion score of non-directed graph model and each node of label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
Compared to prior art, the present invention is directed to the deficiency of sentiment dictionary in the existing sentiment analysis algorithm of internet text, propose to build the method for the identification multiple emotion of internet text sentiment dictionary used.Compare with classic method, this method utilizes emotion mark, network neologisms, emotion icons, the mistake of more distinctive informal texts in internet text to write the structure dictionaries such as word, is not limited to traditional emotion word in single language or field.Adopt different seed words can construct the different moods sentiment dictionary of (as happy, angry, grieved, frightened, surprised etc.), make the result of emotion recognition abundanter.
Accompanying drawing explanation
Fig. 1 is the applied environment figure of the multiple sentiment dictionary constructing system in internet of the present invention.
Fig. 2 is the module map of the preferred embodiment of the multiple sentiment dictionary constructing system in internet of the present invention.
Fig. 3 is the process flow diagram of the preferred embodiment of the multiple sentiment dictionary constructing system in internet of the present invention.
Fig. 4 is typical high frequency tuple schematic diagram.
Fig. 5 is non-directed graph model schematic diagram.
Fig. 6 is matrix schematic diagram in the same way.
Fig. 7 is the adopted score schematic diagram of passing judgement on of word.
Fig. 8 is the mood score schematic diagram of word.
Main element symbol description
|
1 |
The multiple sentiment dictionary constructing system in |
10 |
Storer | 20 |
Processor | 30 |
Display device | 40 |
Input equipment | 50 |
|
100 |
|
101 |
|
102 |
Design of |
103 |
|
104 |
Following embodiment further illustrates the present invention in connection with above-mentioned accompanying drawing.
Embodiment
As shown in Figure 1, be the applied environment figure of the multiple sentiment dictionary constructing system 10 in internet of the present invention (hereinafter to be referred as system 10) preferred embodiment.This system 10 runs in calculation element 1.This calculation element 1 also comprises by the connected storer 20 of data bus, processor 30, display device 40 and input equipment 50.Calculation element 1 can be computer, mobile phone, PDA(Personal Digital Assistant, personal digital assistant) etc.
Storer 20 is for program code and other data information of storage system 10.Described display device 40 can be that the LCDs of computer is, the touch-screen of mobile phone etc.The various data that described input equipment 50 arranges for inputting user, for example, keyboard, mouse etc.
Consulting shown in Fig. 2, is the functional block diagram of these system 10 preferred embodiments.In one embodiment, this system 10 mainly comprises acquisition module 100, data preprocessing module 101, extracts neologisms module 102, design of graphics model module 103 and iterative computation module 104.Module 100-104 is the program segment that comprises computer instruction, can complete specific function, than program, is more suitable in describing the implementation of software in calculation element 1.The computer instruction that module 100-104 comprises is stored in storer 20, and the processor 30 of calculation element 1 is carried out these computer instructions.Concrete function below in conjunction with Fig. 3 specification module 100-104.
As shown in Figure 3, be the process flow diagram of the preferred embodiment of the multiple sentiment dictionary construction method in internet of the present invention.According to different demands, in this process flow diagram, the order of step can change, and some step can be omitted.
Step S10, acquisition module 100 obtains internet text language material from internet.
Step S11,101 pairs of corpus of text that obtain of data preprocessing module carry out data pre-service to obtain the candidate word of sentiment dictionary.
Internet text language material, as the content sources that builds sentiment dictionary, need to can extract lexical relation through suitable cleaning, and the relevant pre-treatment step such as data scrubbing comprise:
Step 1.1, removes the special word in corpus of text.Special word comprises website links, user name mark, special character etc.
Step 1.2, carries out participle to corpus of text, then based on word segmentation result, generates n tuple (n<4), so that the word of participle mistake is supplemented.Like this, from corpus of text, extract a tuple, two tuples and tlv triple totally three class tuple-sets.As Chinese corpus of text carried out to participle, be to complete based on Chinese lexical analysis system (Institute of Computing Technology, Chinese Lexical Analysis System, ICTCLAS) instrument.
Step 1.3, consider the characteristic of speech sounds of word, in described three class tuple-sets, also remove respectively the rank forefront high frequency tuple (being high frequency words) of default figure place (as ranking forefront 50) and lower than the low frequency tuple (being low-frequency word) of preset times (as 3 times) of in corpus of text occurrence number.High frequency tuple is stop words normally, and they have higher co-occurrence chance with all kinds of words, therefore to the expression of emotional characteristic not obvious; Low frequency tuple is non-word or user name etc. conventionally, and these tuples do not have language meaning, therefore need to be removed.Like this, using occurrence number intermediate frequency tuple placed in the middle as a part of candidate word.
As shown in Figure 4, the microblogging language material using is derived from Tengxun's microblogging, adopts 69,715 microbloggings (152,716 sentences), through removing user name, website links etc., and carries out participle, counts high frequency tuple and low frequency tuple and is removed.Typical high frequency tuple as shown in Figure 4.
Step S12, extracts neologisms module 102 and extracts neologisms to obtain the candidate word of sentiment dictionary from obtained corpus of text.
Except usining n tuple as candidate word, also adopt the method for context entropy and mutual information to find that neologisms are as the candidate word of sentiment dictionary.Because tuple is to carry out on the basis of word segmentation result, may exist participle boundary error to cause the boundary error (candidate's " word " who generates not is actual word) of generating candidate words.If but generated tuple as candidate word based on Zi Wei unit, the noise of a large amount of non-words could be introduced again.Therefore,, except usining the candidate word of word segmentation result generation tuple as sentiment dictionary, also need identification to find that some neologisms are used as the candidate word of sentiment dictionary.The present invention is integrated into two kinds of new word discovery methods in the middle of sentiment analysis sentiment dictionary structure, and these two kinds of new word discovery methods are the new word discovery method of context entropy new word discovery method, mutual information.
(1) context entropy new word discovery side ratio juris is as follows:
Context entropy according to tuple (word) determines that its border extension is to form neologisms.
The left context entropy of a word w of take is example, and its definition left context entropy LCE (w) computing formula is:
Wherein N (w) represents the occurrence number of word w in corpus of text, C (a
i, w) be w and another word a
ico-occurrence number of times in corpus of text.When calculating LCE (w), a
ibe the single candidate word that appears at w left side, the neologisms after expansion are
s is candidate word a
inumber, the various words that w left side occurs.Lower LCE (w) reflects that the text (above) in w left side is comparatively single, therefore has necessity of further expansion left border.Then use
replace the w variable in above-mentioned LCE (w) computing formula, can calculate left context entropy (above entropy) to neologisms
and calculate the increment of entropy
If this increment is larger, show that old word w left side is unlikely border, and neologisms
left side may be more border.At this moment can be by word
replace w as new candidate word.Similarly, right context entropy RCE (w) computing formula of word and entropy thereof increase
computing formula is:
Here b
ibe a candidate word of its right side expansion, the neologisms after expansion are
n (w) represents the occurrence number of word w in corpus of text, C (w, b
i) be w and another word b
ico-occurrence number of times in corpus of text.When calculating RCE (w).S is candidate word b
inumber, the various words that w right side occurs.
(2) the new word discovery side of mutual information ratio juris is as follows:
According to mutual information, determine whether a tuple should be left neologisms.
Above (left side) word of note word w is a
i, below (right side) word is b
i.The mutual information of both sides: left side mutual information
right side mutual information
be defined as:
Wherein N (w), N (a
i) and N (b
i) be respectively word w, a
iwith b
ioccurrence number, C (a
i, w) with C (w, b
i) be respectively w and a
i, b
ione-sided co-occurrence number of times.
When a side mutual information surpasses setting threshold (according to corpus of text adjustment), accept this side word peripheral growth; Until lower than this setting threshold, stop peripheral growth, using current word as neologisms.
For example, the new word discovery algorithm carrying out on above-mentioned participle basis, these neologisms comprise the name that participle dictionary do not include (as " Tim Cook ", " base of a fruit nurse Cook ", or " Liu Zhiwei ”,“He village grand it " etc.), new term (microblogging ”,“ Sina of Ru“ Tengxun microblogging ", " hungry marketing ") and idiom (as " bag postal ", " suffering God's punishment ") etc.
Step S13, design of graphics model module 103 utilizes the candidate word of resulting sentiment dictionary to build non-directed graph model.
After obtaining each candidate word of sentiment dictionary, design of graphics model module 103 calculates each candidate word common number of times occurring in the sentence of corpus of text, as the mutual relationship between any two candidate word (being mutual information value).Take each candidate word as node, and mutual relationship (mutual information value) constructs non-directed graph model G as limit power.In a large amount of corpus of text, the common word occurring more may have close emotion, and two nodes on the limit that therefore limit power is higher in non-directed graph can have close emotion tendency.
Constructed non-directed graph model is out represented with matrix G=(V, E), and this G represents the annexation between candidate word, and wherein V represents the set of candidate word, and E represents the set on limit.The corresponding candidate word (v ∈ V) of each node v in this G, limit (v
i, v
j) corresponding to two candidate word v
iwith v
jcooccurrence relation ((v
i, v
j) ∈ E), limit (v
i, v
j) weight w
ijthese two node v
iand v
jthe number of times of co-occurrence in corpus of text.
By each co-occurrence matrix W (being the adjacency matrix of G) expression for internodal cooccurrence relation in V, co-occurrence matrix W is symmetrical, the element w in co-occurrence matrix W
ijrepresent limit (v
i, v
j) weight, be this two node v
i, v
jthe number of times of co-occurrence in corpus of text, the element w on the diagonal line of co-occurrence matrix W
iicorresponding to v
ithe quantity occurring in corpus of text, this co-occurrence matrix is used at subsequent step S14 iterative computation sentiment dictionary.
For example, as shown in Figure 5, the non-directed graph model of being constructed by corpus of text " I/study/science ", " science/very/profundity ", " I/like/study " three words, its bend is separated the result representing after participle, retain whole words as the candidate word of sentiment dictionary, the corresponding node of each candidate word, single line represents that weight is 1, two-wire represents that weight is 2.The resulting co-occurrence matrix of the non-directed graph model in Fig. 5 as shown in Figure 6, as node " I " and node " study as described in corpus of text the number of times of co-occurrence be 2, the element w in co-occurrence matrix W
12be 2, the element w on the diagonal line of co-occurrence matrix W
33number of times for node " study " occurs in described corpus of text, is 2.
Step S14, iterative computation module 104 utilizes the multiple emotion score of each node in non-directed graph model and label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
In this step, choosing of seed word can be some definite emotion words, and different language is chosen accordingly; Also can be the emotion mark of some and language independent, as smiling face's symbol:-) etc.These processing can guarantee the validity of label propagation algorithm to different language.Seed word according to word set from choosing in step S11 and the resulting candidate word of step S12.In the present embodiment, described word set comprises commendation, derogatory sense word set and mood word set, and described commendation, derogatory term centralized procurement pass judgement on the < < student that Zhang Wei etc. writes 728 commendatory terms and 933 derogatory terms that adopted dictionary > > arranges, described mood word centralized procurement Ge Xu, Xinfan Meng, five kinds of mood word sets that Houfeng Wang etc. arrange, comprise happiness, indignation, grieved, frightened and surprised, seed word quantity is respectively 91, 112, 89, (see document Xu G for 103 and 92, Meng X, Wang H.Build Chinese emotion lexicons using a graph-based algorithm and multiple resources.Proceedings of the23rd International Conference on Computational Linguistics, Stroudsburg, PA, USA:Association for Computational Linguistics, 2010.1209 – 1217.).Although these seed words are modular word, the mode of propagating by iteration can also be given certain emotion score to other candidate word in non-directed graph (as neologisms, icon) etc.
The seed word that iterative computation module 104 is chosen different emotions (as commendation, derogatory sense etc., mood (as happy, angry, grieved etc.)) carries out respectively iteration, the different emotions score of each node has been calculated respectively, adopt different emotions seed word, can obtain the sentiment dictionary of corresponding emotion, as adopted mood seed word to carry out iterative computation, just can obtain the sentiment dictionary that the mood score by candidate word corresponding to each node is formed.
Adopt label transmission method, from the seed word of choosing, its emotion score is propagated in the node of all connections in non-directed graph.Iterative process is as shown in the formula description:
x
(k+1)=W·x
(k)+b
Wherein, x
(k)the emotion score vector that represents the node after the k time iteration.Formula thus, the result of calculation x of new round iteration
(k+1)after acting on previous round vector by co-occurrence matrix W and bias vector b, draw.At each, take turns after iterative computation, result is normalized, iterative process finally restrains.B is made as Seeding vector x in the present invention
(0), to strengthen the effect of seed.Select after seed, vector x
(0)the dimension value that middle seed word is corresponding is 1, and other dimension values are 0.
As shown in Figure 7,728 commendatory terms that adopt that < < student that above-mentioned Zhang Wei etc. writes passes judgement on that adopted dictionary > > arranges and 933 derogatory terms are as the word set of seed word, carry out successively iteration and propagate and to calculate the adopted emotion score of passing judgement on of each node (candidate word), some of them word pass judgement on adopted emotion score example as shown in Figure 7; As shown in Figure 8, adopt five kinds of mood word sets of the arrangements such as above-mentioned Ge Xu, Xinfan Meng, Houfeng Wang as the word set of seed word, calculate five kinds of mood degree scores of each node, these scores embody the present invention in the dirigibility of the multiple emotion degree of identification.
The multiple sentiment dictionary construction method in internet of the present invention and system, for the deficiency of sentiment dictionary in the existing sentiment analysis algorithm of internet text, propose to build the method for the identification multiple emotion of internet text sentiment dictionary used.The cooccurrence relation of the elementary cell (as word, symbol etc.) that the method is utilized some text representation emotions in internet text, in conjunction with the method for new word discovery, constructs sentiment dictionary automatically by iteration circulation way.Compare with classic method, the present invention utilizes emotion mark, network neologisms, emotion icons, the mistake of more distinctive informal texts in internet text to write the structure dictionaries such as word, is not limited to traditional emotion word in single language or field.Adopt different seed words can construct the different moods sentiment dictionary of (as happy, angry, grieved, frightened, surprised etc.), make the result of emotion recognition abundanter, and then identify the emotion that whole section of text representation goes out.And experimental result shows, the present invention also has required language material scale without excessive, is not subject to the advantages such as time restriction, so this invention has suitable application area, language is wide, the variation of identification affective style.
Finally it should be noted that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not depart from the spirit and scope of technical solution of the present invention.
Claims (22)
1. the multiple sentiment dictionary construction method in internet, is characterized in that, the method comprises:
Obtaining step, obtains internet text language material from internet;
Data pre-treatment step, carries out data pre-service to obtain the candidate word of sentiment dictionary to obtained corpus of text;
Extract neologisms step, from obtained corpus of text, extract neologisms to obtain the candidate word of sentiment dictionary;
Design of graphics model step, utilizes the candidate word of resulting sentiment dictionary to build non-directed graph model;
Iterative computation step, utilizes the multiple emotion score of each node in non-directed graph model and label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
2. the multiple sentiment dictionary construction method in internet as claimed in claim 1, is characterized in that, described data pre-treatment step comprises:
Remove step, remove the special word in corpus of text;
Participle and extraction step, carry out participle to corpus of text, and generate n tuple based on word segmentation result, extracts a tuple, two tuples and tlv triple totally three class tuple-set, wherein n<4 from corpus of text;
Remove step, in described three class tuple-sets, remove respectively the rank forefront high frequency tuple of default figure place and lower than preset times low frequency tuple of in corpus of text occurrence number, using the candidate word as a part of sentiment dictionary by occurrence number intermediate frequency tuple placed in the middle.
3. the multiple sentiment dictionary construction method in internet as claimed in claim 1, is characterized in that, comprises: the new word discovery method of context entropy new word discovery method and mutual information in described extraction neologisms step from the method for obtained corpus of text extraction neologisms.
4. the multiple sentiment dictionary construction method in internet as claimed in claim 3, is characterized in that, described design of graphics model step comprises:
Calculation procedure, calculates the number of times of each candidate word common appearance in the sentence of corpus of text of sentiment dictionary, as the mutual relationship between any two candidate word;
Build non-directed graph model step, take each candidate word as node, mutual relationship is weighed as limit, builds non-directed graph model.
5. the multiple sentiment dictionary construction method in internet as claimed in claim 4, it is characterized in that, in building non-directed graph model step, by matrix G=(V for constructed non-directed graph model, E) represent, this G is used for representing the annexation between candidate word, and wherein V represents the set of candidate word, and E represents the set on limit;
The corresponding candidate word of each node v in this G, v ∈ V wherein, limit (v
i, v
j) corresponding to two candidate word v
iwith v
jcooccurrence relation, (v wherein
i, v
j) ∈ E;
Each internodal cooccurrence relation in V is represented by co-occurrence matrix W, and co-occurrence matrix W is the adjacency matrix of G, and it is symmetrical, the element w in co-occurrence matrix W
ijrepresent limit (v
i, v
j) weight, be this two node v
i, v
jthe number of times of co-occurrence in corpus of text, the element w on the diagonal line of co-occurrence matrix W
iicorresponding to v
ithe quantity occurring in corpus of text.
6. the multiple sentiment dictionary construction method in internet as claimed in claim 5, is characterized in that, described iterative computation step comprises:
Selecting step, in the node of non-directed graph model, its emotion score given in selected seed word;
Propagation steps, by label propagation algorithm, under the effect of limit power, propagates into this emotion score from the seed word of choosing the node being all communicated with non-directed graph, and each node will obtain corresponding multiple emotion score;
Build sentiment dictionary step, after iteration convergence, the node of each connection has been endowed multiple emotion score, and the emotion score of each node represents the emotion tendency of the corresponding candidate word of this node, and the candidate word that these nodes are corresponding and multiple emotion score thereof have formed sentiment dictionary.
7. the multiple sentiment dictionary construction method in internet as claimed in claim 2, is characterized in that, described high frequency tuple is stop words, and it has higher co-occurrence chance with all kinds of words; Described low frequency tuple is non-word, user name.
8. the multiple sentiment dictionary construction method in internet as claimed in claim 1, is characterized in that, described emotion score comprises mood score, commendation score, derogatory sense score and absolute score,
Described mood score comprises happy score, angry score, grieved score, frightened score and surprised score.
9. the multiple sentiment dictionary construction method in internet as claimed in claim 6, is characterized in that, described seed root is chosen out from each candidate word of resulting sentiment dictionary according to word set, it comprise definite emotion word and with the emotion mark of language independent.
10. the multiple sentiment dictionary construction method in internet as claimed in claim 6, is characterized in that, the iterative process in propagation steps is as shown in the formula description:
x
(k+1)=W·x
(k)+b
X wherein
(k)the emotion score vector that represents the node after the k time iteration, formula thus, the result of calculation x of new round iteration
(k+1)after acting on previous round vector by co-occurrence matrix W and bias vector b, draw, at each, take turns after iterative computation, result is normalized, and iterative process finally restrains.
The 11. multiple sentiment dictionary construction methods in internet as claimed in claim 10, is characterized in that, b is taken as Seeding vector x
(0), to strengthen the effect of seed, select after seed, vector x
(0)the dimension value that middle seed word is corresponding is 1, and other dimension values are 0.
12. 1 kinds of multiple sentiment dictionary constructing systems in internet, is characterized in that, this system comprises:
Acquisition module, for obtaining internet text language material from internet;
Data preprocessing module, carries out data pre-service to obtain the candidate word of sentiment dictionary for the corpus of text to obtained;
Extract neologisms module, for the corpus of text from obtained, extract neologisms to obtain the candidate word of sentiment dictionary;
Design of graphics model module, for utilizing the candidate word of resulting sentiment dictionary to build non-directed graph model;
Iterative computation module, for utilizing the multiple emotion score of non-directed graph model and each node of label propagation algorithm iterative computation non-directed graph to build sentiment dictionary.
The 13. multiple sentiment dictionary constructing systems in internet as claimed in claim 12, is characterized in that, the process that described data preprocessing module is processed comprises:
Remove the special word in corpus of text;
Corpus of text is carried out to participle, and generate n tuple based on word segmentation result, from corpus of text, extract a tuple, two tuples and tlv triple totally three class tuple-set, wherein n<4;
In described three class tuple-sets, remove respectively the rank forefront high frequency tuple of default figure place and lower than preset times low frequency tuple of in corpus of text occurrence number, using the candidate word as a part of sentiment dictionary by occurrence number intermediate frequency tuple placed in the middle.
The 14. multiple sentiment dictionary constructing systems in internet as claimed in claim 12, is characterized in that, comprise: the new word discovery method of context entropy new word discovery method and mutual information in described extraction neologisms module from the method for obtained corpus of text extraction neologisms.
The 15. multiple sentiment dictionary constructing systems in internet as claimed in claim 14, is characterized in that, the process that described design of graphics model module builds comprises:
Calculate each candidate word common number of times occurring in the sentence of corpus of text of sentiment dictionary, as the mutual relationship between any two candidate word;
Take each candidate word as node, and mutual relationship is weighed as limit, builds non-directed graph model.
The 16. multiple sentiment dictionary constructing systems in internet as claimed in claim 15, it is characterized in that, in building non-directed graph model step, by matrix G=(V for constructed non-directed graph model, E) represent, this G is used for representing the annexation between candidate word, and wherein V represents candidate word set, and E represents the set on limit;
The corresponding candidate word of each node v in this G, v ∈ V wherein, limit (v
i, v
j) corresponding to two candidate word v
iwith v
jcooccurrence relation, (v wherein
i, v
j) ∈ E;
Each internodal cooccurrence relation in V is represented by co-occurrence matrix W, and co-occurrence matrix W is the adjacency matrix of G, and it is symmetrical, the element w in co-occurrence matrix W
ijrepresent limit (v
i, v
j) weight, be this two node v
i, v
jthe number of times of co-occurrence in corpus of text, the element w on the diagonal line of co-occurrence matrix W
iicorresponding to v
ithe quantity occurring in corpus of text.
The 17. multiple sentiment dictionary constructing systems in internet as claimed in claim 16, is characterized in that, the computation process of described iterative computation module comprises:
In the node of non-directed graph model, its emotion score given in selected seed word;
By label propagation algorithm, under the effect of limit power, from the seed word of choosing, this emotion score is propagated into the node being all communicated with non-directed graph, each node will obtain corresponding multiple emotion score;
After iteration convergence, the node of each connection has been endowed multiple emotion score, and the emotion score of each node represents the emotion tendency of the corresponding candidate word of this node, and the candidate word that these nodes are corresponding and multiple emotion score thereof have formed sentiment dictionary.
The 18. multiple sentiment dictionary constructing systems in internet as claimed in claim 13, is characterized in that, described high frequency tuple is stop words, and it has higher co-occurrence chance with all kinds of words; Described low frequency tuple is non-word, user name.
The 19. multiple sentiment dictionary constructing systems in internet as claimed in claim 12, it is characterized in that, described emotion score comprises mood score, commendation score, derogatory sense score and absolute score, and described mood score comprises happy score, angry score, grieved score, frightened score and surprised score.
The 20. multiple sentiment dictionary constructing systems in internet as claimed in claim 17, it is characterized in that, described seed root is chosen out from each candidate word of resulting sentiment dictionary according to word set, it comprise definite emotion word and with the emotion mark of language independent.
The 21. multiple sentiment dictionary constructing systems in internet as claimed in claim 17, is characterized in that, the iterative process by label propagation algorithm is as shown in the formula description:
x
(k+1)=W·x
(k)+b
X wherein
(k)the emotion score vector that represents the node after the k time iteration, formula thus, the result of calculation x of new round iteration
(k+1)after acting on previous round vector by co-occurrence matrix W and bias vector b, draw, at each, take turns after iterative computation, result is normalized, and iterative process finally restrains.
The 22. multiple sentiment dictionary constructing systems in internet as claimed in claim 21, is characterized in that, b is taken as Seeding vector x
(0), to strengthen the effect of seed, select after seed, vector x
(0)the dimension value that middle seed word is corresponding is 1, and other dimension values are 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310470531.2A CN103544246A (en) | 2013-10-10 | 2013-10-10 | Method and system for constructing multi-emotion dictionary for internet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310470531.2A CN103544246A (en) | 2013-10-10 | 2013-10-10 | Method and system for constructing multi-emotion dictionary for internet |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103544246A true CN103544246A (en) | 2014-01-29 |
Family
ID=49967698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310470531.2A Pending CN103544246A (en) | 2013-10-10 | 2013-10-10 | Method and system for constructing multi-emotion dictionary for internet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103544246A (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104090864A (en) * | 2014-06-09 | 2014-10-08 | 合肥工业大学 | Emotion dictionary building and emotion calculation method |
CN104809108A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Information monitoring and analyzing system |
CN104933442A (en) * | 2015-06-16 | 2015-09-23 | 陕西师范大学 | Method for propagating image label based on minimal cost path |
CN105005553A (en) * | 2015-06-19 | 2015-10-28 | 四川大学 | Emotional thesaurus based short text emotional tendency analysis method |
CN105138510A (en) * | 2015-08-10 | 2015-12-09 | 昆明理工大学 | Microblog-based neologism emotional tendency judgment method |
CN105956197A (en) * | 2016-06-15 | 2016-09-21 | 杭州量知数据科技有限公司 | Social media graph representation model-based social risk event extraction method |
CN106681985A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Establishment system of multi-field dictionaries based on theme automatic matching |
CN106682128A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Method for automatic establishment of multi-field dictionaries |
CN106681986A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Multi-dimensional sentiment analysis system |
CN106874275A (en) * | 2015-12-10 | 2017-06-20 | 北京新媒传信科技有限公司 | Build the method and device of sentiment dictionary |
CN107291686A (en) * | 2016-04-13 | 2017-10-24 | 北京大学 | The discrimination method of emotion identification and the identification system of emotion identification |
CN107609389A (en) * | 2017-08-24 | 2018-01-19 | 南京理工大学 | A kind of verification method and system of image content-based correlation |
CN108280063A (en) * | 2018-01-19 | 2018-07-13 | 中国科学院软件研究所 | Semantic analysis based on semi-supervised learning and system |
CN108363699A (en) * | 2018-03-21 | 2018-08-03 | 浙江大学城市学院 | A kind of netizen's school work mood analysis method based on Baidu's mhkc |
CN108388608A (en) * | 2018-02-06 | 2018-08-10 | 金蝶软件(中国)有限公司 | Emotion feedback method, device, computer equipment and storage medium based on text perception |
CN108509492A (en) * | 2018-02-12 | 2018-09-07 | 郑长敬 | Big data processing based on real estate industry and system |
CN108563688A (en) * | 2018-03-15 | 2018-09-21 | 西安影视数据评估中心有限公司 | A kind of movie and television play principle thread recognition methods |
CN108563635A (en) * | 2018-04-04 | 2018-09-21 | 北京理工大学 | A kind of sentiment dictionary fast construction method based on emotion wheel model |
CN108572961A (en) * | 2017-03-08 | 2018-09-25 | 北京嘀嘀无限科技发展有限公司 | A kind of the vectorization method and device of text |
CN108647191A (en) * | 2018-05-17 | 2018-10-12 | 南京大学 | It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method |
CN108694165A (en) * | 2017-04-10 | 2018-10-23 | 南京理工大学 | Cross-cutting antithesis sentiment analysis method towards product review |
CN108804412A (en) * | 2018-04-13 | 2018-11-13 | 中国科学院自动化研究所 | Multi-layer sentiment analysis method based on Social Media |
CN110069780A (en) * | 2019-04-19 | 2019-07-30 | 中译语通科技股份有限公司 | A kind of emotion word recognition method and system based on specific area text |
CN110377916A (en) * | 2018-08-17 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Word prediction technique, device, computer equipment and storage medium |
CN110399595A (en) * | 2019-07-31 | 2019-11-01 | 腾讯科技(成都)有限公司 | A kind of method and relevant apparatus of text information mark |
CN110489553A (en) * | 2019-07-26 | 2019-11-22 | 湖南大学 | A kind of sensibility classification method based on Multi-source Information Fusion |
CN110795558A (en) * | 2019-09-03 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Label acquisition method and device, storage medium and electronic device |
CN110858099A (en) * | 2018-08-20 | 2020-03-03 | 北京搜狗科技发展有限公司 | Candidate word generation method and device |
CN111221962A (en) * | 2019-11-18 | 2020-06-02 | 重庆邮电大学 | Text emotion analysis method based on new word expansion and complex sentence pattern expansion |
CN111325018A (en) * | 2020-01-21 | 2020-06-23 | 上海恒企教育培训有限公司 | Domain dictionary construction method based on web retrieval and new word discovery |
CN112905736A (en) * | 2021-01-27 | 2021-06-04 | 郑州轻工业大学 | Unsupervised text emotion analysis method based on quantum theory |
CN116522901A (en) * | 2023-06-29 | 2023-08-01 | 金锐同创(北京)科技股份有限公司 | Method, device, equipment and medium for analyzing attention information of IT community |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040186704A1 (en) * | 2002-12-11 | 2004-09-23 | Jiping Sun | Fuzzy based natural speech concept system |
CN102163191A (en) * | 2011-05-11 | 2011-08-24 | 北京航空航天大学 | Short text emotion recognition method based on HowNet |
CN102236650A (en) * | 2010-04-20 | 2011-11-09 | 日电(中国)有限公司 | Method and device for correcting and/or expanding sentiment dictionary |
CN102663139A (en) * | 2012-05-07 | 2012-09-12 | 苏州大学 | Method and system for constructing emotional dictionary |
CN102890707A (en) * | 2012-08-28 | 2013-01-23 | 华南理工大学 | System for mining emotional tendencies of brief network comments based on conditional random field |
-
2013
- 2013-10-10 CN CN201310470531.2A patent/CN103544246A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040186704A1 (en) * | 2002-12-11 | 2004-09-23 | Jiping Sun | Fuzzy based natural speech concept system |
CN102236650A (en) * | 2010-04-20 | 2011-11-09 | 日电(中国)有限公司 | Method and device for correcting and/or expanding sentiment dictionary |
CN102163191A (en) * | 2011-05-11 | 2011-08-24 | 北京航空航天大学 | Short text emotion recognition method based on HowNet |
CN102663139A (en) * | 2012-05-07 | 2012-09-12 | 苏州大学 | Method and system for constructing emotional dictionary |
CN102890707A (en) * | 2012-08-28 | 2013-01-23 | 华南理工大学 | System for mining emotional tendencies of brief network comments based on conditional random field |
Non-Patent Citations (3)
Title |
---|
ANQI CUI等: "Emotion Tokens:Bridging the Gap among Multilingual Twitter Sentiment Analysis", 《INFORMATION RETRIEVAL TECHNOLOGY》 * |
JIN HU HUANG等: "Chinese Word Segmentation based on Contextual Entropy", 《PACIFIC ASIA CONFERENCE ON LANGUAGE》 * |
陈晓东: "基于情感词典的中文微博情感倾向分析研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104090864A (en) * | 2014-06-09 | 2014-10-08 | 合肥工业大学 | Emotion dictionary building and emotion calculation method |
CN104090864B (en) * | 2014-06-09 | 2018-02-06 | 合肥工业大学 | A kind of sentiment dictionary is established and affection computation method |
CN104809108A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Information monitoring and analyzing system |
CN104809108B (en) * | 2015-05-20 | 2018-10-09 | 元力云网络有限公司 | Information monitoring analysis system |
CN104933442A (en) * | 2015-06-16 | 2015-09-23 | 陕西师范大学 | Method for propagating image label based on minimal cost path |
CN105005553A (en) * | 2015-06-19 | 2015-10-28 | 四川大学 | Emotional thesaurus based short text emotional tendency analysis method |
CN105005553B (en) * | 2015-06-19 | 2017-11-21 | 四川大学 | Short text Sentiment orientation analysis method based on sentiment dictionary |
CN105138510A (en) * | 2015-08-10 | 2015-12-09 | 昆明理工大学 | Microblog-based neologism emotional tendency judgment method |
CN105138510B (en) * | 2015-08-10 | 2018-05-25 | 昆明理工大学 | A kind of neologisms Sentiment orientation determination method based on microblogging |
CN106874275A (en) * | 2015-12-10 | 2017-06-20 | 北京新媒传信科技有限公司 | Build the method and device of sentiment dictionary |
CN106874275B (en) * | 2015-12-10 | 2020-02-07 | 北京新媒传信科技有限公司 | Method and device for constructing emotion dictionary |
CN107291686B (en) * | 2016-04-13 | 2020-10-16 | 北京大学 | Method and system for identifying emotion identification |
CN107291686A (en) * | 2016-04-13 | 2017-10-24 | 北京大学 | The discrimination method of emotion identification and the identification system of emotion identification |
CN105956197A (en) * | 2016-06-15 | 2016-09-21 | 杭州量知数据科技有限公司 | Social media graph representation model-based social risk event extraction method |
CN106681986A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Multi-dimensional sentiment analysis system |
CN106682128A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Method for automatic establishment of multi-field dictionaries |
CN106681985A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Establishment system of multi-field dictionaries based on theme automatic matching |
CN108572961A (en) * | 2017-03-08 | 2018-09-25 | 北京嘀嘀无限科技发展有限公司 | A kind of the vectorization method and device of text |
CN108694165A (en) * | 2017-04-10 | 2018-10-23 | 南京理工大学 | Cross-cutting antithesis sentiment analysis method towards product review |
CN108694165B (en) * | 2017-04-10 | 2021-11-09 | 南京理工大学 | Cross-domain dual emotion analysis method for product comments |
CN107609389A (en) * | 2017-08-24 | 2018-01-19 | 南京理工大学 | A kind of verification method and system of image content-based correlation |
CN107609389B (en) * | 2017-08-24 | 2020-10-30 | 南京理工大学 | Verification method and system based on image content correlation |
CN108280063A (en) * | 2018-01-19 | 2018-07-13 | 中国科学院软件研究所 | Semantic analysis based on semi-supervised learning and system |
CN108388608A (en) * | 2018-02-06 | 2018-08-10 | 金蝶软件(中国)有限公司 | Emotion feedback method, device, computer equipment and storage medium based on text perception |
CN108388608B (en) * | 2018-02-06 | 2020-08-04 | 金蝶软件(中国)有限公司 | Emotion feedback method and device based on text perception, computer equipment and storage medium |
CN108509492A (en) * | 2018-02-12 | 2018-09-07 | 郑长敬 | Big data processing based on real estate industry and system |
CN108563688B (en) * | 2018-03-15 | 2021-06-04 | 西安影视数据评估中心有限公司 | Emotion recognition method for movie and television script characters |
CN108563688A (en) * | 2018-03-15 | 2018-09-21 | 西安影视数据评估中心有限公司 | A kind of movie and television play principle thread recognition methods |
CN108363699A (en) * | 2018-03-21 | 2018-08-03 | 浙江大学城市学院 | A kind of netizen's school work mood analysis method based on Baidu's mhkc |
CN108563635A (en) * | 2018-04-04 | 2018-09-21 | 北京理工大学 | A kind of sentiment dictionary fast construction method based on emotion wheel model |
CN108804412A (en) * | 2018-04-13 | 2018-11-13 | 中国科学院自动化研究所 | Multi-layer sentiment analysis method based on Social Media |
CN108647191A (en) * | 2018-05-17 | 2018-10-12 | 南京大学 | It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method |
CN110377916B (en) * | 2018-08-17 | 2022-12-16 | 腾讯科技(深圳)有限公司 | Word prediction method, word prediction device, computer equipment and storage medium |
CN110377916A (en) * | 2018-08-17 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Word prediction technique, device, computer equipment and storage medium |
CN110858099A (en) * | 2018-08-20 | 2020-03-03 | 北京搜狗科技发展有限公司 | Candidate word generation method and device |
CN110858099B (en) * | 2018-08-20 | 2024-04-12 | 北京搜狗科技发展有限公司 | Candidate word generation method and device |
CN110069780A (en) * | 2019-04-19 | 2019-07-30 | 中译语通科技股份有限公司 | A kind of emotion word recognition method and system based on specific area text |
CN110069780B (en) * | 2019-04-19 | 2021-11-19 | 中译语通科技股份有限公司 | Specific field text-based emotion word recognition method |
CN110489553B (en) * | 2019-07-26 | 2022-07-05 | 湖南大学 | Multi-source information fusion-based emotion classification method |
CN110489553A (en) * | 2019-07-26 | 2019-11-22 | 湖南大学 | A kind of sensibility classification method based on Multi-source Information Fusion |
CN110399595B (en) * | 2019-07-31 | 2024-04-05 | 腾讯科技(成都)有限公司 | Text information labeling method and related device |
CN110399595A (en) * | 2019-07-31 | 2019-11-01 | 腾讯科技(成都)有限公司 | A kind of method and relevant apparatus of text information mark |
CN110795558A (en) * | 2019-09-03 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Label acquisition method and device, storage medium and electronic device |
CN110795558B (en) * | 2019-09-03 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Label acquisition method and device, storage medium and electronic device |
CN111221962B (en) * | 2019-11-18 | 2023-05-26 | 重庆邮电大学 | Text emotion analysis method based on new word expansion and complex sentence pattern expansion |
CN111221962A (en) * | 2019-11-18 | 2020-06-02 | 重庆邮电大学 | Text emotion analysis method based on new word expansion and complex sentence pattern expansion |
CN111325018A (en) * | 2020-01-21 | 2020-06-23 | 上海恒企教育培训有限公司 | Domain dictionary construction method based on web retrieval and new word discovery |
CN111325018B (en) * | 2020-01-21 | 2023-08-11 | 上海恒企教育培训有限公司 | Domain dictionary construction method based on web retrieval and new word discovery |
CN112905736A (en) * | 2021-01-27 | 2021-06-04 | 郑州轻工业大学 | Unsupervised text emotion analysis method based on quantum theory |
CN112905736B (en) * | 2021-01-27 | 2023-09-19 | 郑州轻工业大学 | Quantum theory-based unsupervised text emotion analysis method |
CN116522901B (en) * | 2023-06-29 | 2023-09-15 | 金锐同创(北京)科技股份有限公司 | Method, device, equipment and medium for analyzing attention information of IT community |
CN116522901A (en) * | 2023-06-29 | 2023-08-01 | 金锐同创(北京)科技股份有限公司 | Method, device, equipment and medium for analyzing attention information of IT community |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103544246A (en) | Method and system for constructing multi-emotion dictionary for internet | |
Boudad et al. | Sentiment analysis in Arabic: A review of the literature | |
Li et al. | Sentiment analysis of danmaku videos based on naïve bayes and sentiment dictionary | |
Alessia et al. | Approaches, tools and applications for sentiment analysis implementation | |
Ren et al. | Semi-automatic creation of youth slang corpus and its application to affective computing | |
Seifollahi et al. | Word sense disambiguation application in sentiment analysis of news headlines: an applied approach to FOREX market prediction | |
CN108038725A (en) | A kind of electric business Customer Satisfaction for Product analysis method based on machine learning | |
CN110765769B (en) | Clause feature-based entity attribute dependency emotion analysis method | |
CN108108468A (en) | A kind of short text sentiment analysis method and apparatus based on concept and text emotion | |
Zhao et al. | Sentiment analysis on the online reviews based on hidden Markov model | |
Hu et al. | Text sentiment analysis: A review | |
Dubey et al. | Extended opinion lexicon and ML-based sentiment analysis of tweets: a novel approach towards accurate classifier | |
Lertpiya et al. | A preliminary study on fundamental Thai NLP tasks for user-generated web content | |
Nguyen et al. | Building a chatbot system to analyze opinions of english comments | |
Mehta et al. | Enhancement of SentiWordNet using contextual valence shifters | |
Soni et al. | Comparative analysis of rotten tomatoes movie reviews using sentiment analysis | |
Alshahrani et al. | Word mover's distance for affect detection | |
Walha et al. | ETL design toward social network opinion analysis | |
Le | A hybrid method for text-based sentiment analysis | |
Priego Sánchez et al. | Idiom polarity identification using contextual information | |
Elyasir et al. | Opinion mining framework in the education domain | |
Jiang et al. | Transfer learning based recurrent neural network algorithm for linguistic analysis | |
Chen et al. | A cross-lingual hybrid neural network with interaction enhancement for grading short-answer texts | |
Dai et al. | Unlock big data emotions: Weighted word embeddings for sentiment classification | |
Fadili | Optimized Sentiments analysis Approach, Based On Aspects, Attention and Subjectivity notions For Textual Business Intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140129 |
|
RJ01 | Rejection of invention patent application after publication |