CN108763214A - A kind of sentiment dictionary method for auto constructing for comment on commodity - Google Patents
A kind of sentiment dictionary method for auto constructing for comment on commodity Download PDFInfo
- Publication number
- CN108763214A CN108763214A CN201810539447.4A CN201810539447A CN108763214A CN 108763214 A CN108763214 A CN 108763214A CN 201810539447 A CN201810539447 A CN 201810539447A CN 108763214 A CN108763214 A CN 108763214A
- Authority
- CN
- China
- Prior art keywords
- emotion
- word
- evaluation object
- comment
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
Abstract
The invention discloses a kind of sentiment dictionary method for auto constructing for comment on commodity, including Text Pretreatment, semantic relation excavate, emotion term clustering.Text Pretreatment extracts emotion word and evaluation object included in certain a kind of comment on commodity for being pre-processed to comment on commodity.Semantic relation excavates, and excavates the semantic relation between emotion word and evaluation object, the semantic relation between emotion word and evaluation object is indicated with the form of matrix.Emotion term clustering, according to mutual distance of the emotion word in emotion space of matrices, carrying out unsupervised cluster to emotion word can reasonably be divided into emotion word k classes.The present invention is directed to the characteristics of comment on commodity field text, construct field sentiment dictionary, emotion word can be divided into multiclass rather than traditional pass judgement on two major classes by the dictionary, for comment on commodity field, field sentiment dictionary has big advantage compared with other existing general sentiment dictionaries in terms of emotional semantic classification task dispatching.
Description
Technical field
The present invention relates to a kind of sentiment dictionaries for comment on commodity field to make construction method by oneself, belongs to computerized information skill
Art processing technology field.
Background technology
With the development of various shopping websites, the largely comment about all kinds of commodity has appeared on network, Ren Menke
To consult these comments anywhere or anytime.And identify that these comment on contained Sentiment orientation, either still consume businessman
Person is particularly significant.And a good sentiment dictionary is the basis that Sentiment orientation analysis is carried out to text.It is well known that text
This progress sentiment analysis needs to consider the industry field belonging to text.And existing sentiment dictionary is all general, there is no needles
To the sentiment dictionary of this designated field of comment on commodity.Obviously, comment on commodity text is carried out using traditional sentiment dictionary
Sentiment analysis is inappropriate.Therefore the method for auto constructing of sentiment dictionary draws in particular for the sentiment dictionary of specific area
More and more experts and scholars' concerns and research are played.
Existing sentiment dictionary construction method can be divided into based on language material and be based on for Chinese and English
Knowledge base two major classes.And sentiment dictionary is built based on corpus, most common method is exactly selected seed word, and passes through calculating
Relationship, that is, PMI value between the emotion word and seed words of unknown feeling polarities determines the feeling polarities of emotion word.Then Chinese
Utilizable commonsense knowledge base is extremely limited, so the research for building Chinese sentiment dictionary using knowledge base is also seldom.And
In sentiment dictionary of the structure for comment on commodity field, need that special consideration should be given to evaluation objects.Evaluation object is that we are evaluated
Commodity a certain feature, such as mobile phone, evaluation object can be the screen of mobile phone, the features such as battery.
On the other hand, existing sentiment dictionary is generally only comprising some emotion words, and by these emotion words according to commendation
Word, derogatory term two major classes are divided.There are also scholars to be divided into happiness by emotion, sad, frightened, surprised, angry, envies
Six major class.To sum up, existing emotional semantic classification is all based on the Heuristics of people, the emotion class that emotion word can divide is determined
Not.
In view of many emotion words can usually show different Sentiment orientations in different fields, therefore can be accurate
Identify that the theme in these emotion words and evaluation object or perhaps field seems particularly significant, especially in the neck of comment on commodity
Domain.Fast discoveries take information expert or by the way of crowdsourcing service come to build field sentiment dictionary be very difficult.
Shi et al. extracts information crucial in the text of field by association rule algorithm and in the way of being combined with the machine learning of supervision.
Zhang et al. will put mutual information (PMI) and association rule algorithm to extract the evaluation object of product.It is suitable in view of evaluation object
Sequence problem, Qiu etc. calculate emotion word and product he just between relationship on the basis of propose a kind of two-way propagation algorithm.Mishne
Then evaluation object is chosen using the part of speech of word and word frequency.
PMI is a common counter for considering correlation degree between two words.Turney and Littman is used
PMI and LSA calculates the correlation degree between two words, this to calculate relationship between word and seed words using PMI
Method is commonly known as the behavioural habits that so-pmi.Yang etc. combines user on the basis of so-pmi, it is proposed that Yi Zhongxin
Structure sentiment dictionary method.Islam and Inkpen then improve PMI, it is proposed that SOC-PMI. emotional semantic classifications are appointed
Business is a background task of sentiment analysis, and the quality of classification results can directly reflect the performance of sentiment dictionary used.
Pang emotional semantic classification task as a kind of text categorization task, and to three kinds of naive Bayesian, support vector machines and maximum entropy
Grader is tested.Li and Hao then extends evaluation object using the method for spectral clustering.Yang et al. is then utilized
Word2vec calculates the cosine similarity between word and seed words.
Existing sentiment dictionary construction method is mostly to be directed to general dictionary, and these general dictionaries are not to be good at fitting
The text of analysis specific area, such as comment on commodity text are closed, therefore build one to be suitble to specific area sentiment dictionary just
Seem particularly significant.
Invention content
Goal of the invention:It is an object of the invention to solve the shortcomings of the prior art, one kind is provided and is judged for commodity
By the sentiment dictionary construction method in field, emotion word is polymerized to suitable several classes in the present invention, is different from existing general emotion
Dictionary, the foundation of category division is the relationship between emotion word and evaluation object, fixed without being limited to commendation, derogatory sense etc.
Feeling polarities.
Technical solution:A kind of sentiment dictionary method for auto constructing for comment on commodity includes the following steps successively:
(1) original comment on commodity text is pre-processed.Often take Chinese word segmentation, the means such as stop words filtering, really
Determine the emotion word and evaluation object that designated field text is included.The determination of emotion word and evaluation object is the word according to word
Property, noun included in comment text is chosen as evaluation object, and adjective, adverbial word, the verb in comment text are then made
For emotion word.
(2) relationship between excavation step (1) obtained emotion word and evaluation object generates the emotion for indicating this relationship
Matrix.
(3) evaluation object obtained to step (1) screens, and leaves the evaluation object of a part of key.
(4) same to step (2) is similar, considers the relationship between emotion word and critical evaluation object, generates and indicates both it
Between relationship emotion matrix.
(5) phase between the original evaluation object that excavation step (3) the critical evaluation object screened and step 1 obtain
Guan Xing, and generate the correlation matrix for indicating correlation between the two.
(6) step (2), the two emotion matrixes and step 5 that (4) obtain is utilized to obtain correlation matrix, generate one it is new
Emotion matrix is used for indicating the relationship between emotion word and critical evaluation object.
(7) emotion word is clustered, by emotion word in the distance between the emotion matrix of step (6) according to emotion word
Several classes are divided into, field sentiment dictionary is obtained.
(8) sentiment dictionary is applied in emotional semantic classification task, the methods of crosscheck is taken really according to different fields
A fixed optimal k value, k classes are divided by emotion word.
Further, in the step (2), what the relationship between emotion word and evaluation object directly reflected is emotion word pair
A kind of degree of modification of evaluation object:
(2.1) relationship for quantifying the two with the co-occurrence between emotion word and evaluation object, is employed herein PMI to calculate
Relationship between emotion word and evaluation object.
(2.2) we indicate that this relationship, the row of matrix indicate all emotion words with a matrix (emotion matrix),
Each row of matrix are then evaluation objects, and each unit of matrix indicates the PMI of corresponding emotion word and evaluation object
Value.
Further, in the step (3), to evaluation object screening use for reference be tf-idf thoughts, specifically include with
Lower details:
(3.1) comment of same class product is merged into document, according to word in different documents i.e. different productions
The number occurred during analects closes, and reverse document frequency are judged to calculate the tf-idf values of word.
(3.2) the tf-idf values of all words are calculated, and the tf-idf values of evaluation object are ranked up, and are arranged one
A threshold value, the evaluation object for only reaching this threshold value are just screened the evaluation object for being considered final by us.
Further, in the step (4), it is similar to the emotion matrix of the structure in step (2), unique difference exists
In, what emotion matrix included in step (4) is emotion word and passes through the relationship between the evaluation object that screening leaves later, and
It is not the relationship between emotion word and whole evaluation objects.
Further, in the step (5), we can excavate all evaluation objects and by screening after evaluation object it
Between relationship, and generate an incidence matrix between critical evaluation object and all evaluation objects, detail is as follows:
(5.1) critical evaluation object is that we screen obtained evaluation object, whole evaluation objects by step (3)
It is exactly all nouns included in initial product comment.
(5.2) correlation degree between critical evaluation object and all evaluation objects can be understood as a kind of same in language material
Adopted relationship, this to be related to us with [0,1] to indicate, wherein 0 indicates not related, 1 indicates degree of association highest.Others association
Degree uses the numerical value between [0,1] section to indicate, illustrates that the degree of association is higher closer to 1.
Further, in the step (6), we using front construction two emotion matrixes and incidence matrix generate
Final emotion matrix, basis be it is proposed that a kind of improved PMI algorithms EPMI algorithms, it is specific as follows:
(6.1) EPMI algorithms:
That is, we are calculating emotion word eiWith evaluation object mjBetween relationship when, not only need to consider the two
Between relationship, it is also necessary to consider and evaluation object mjThose related evaluation objects, and this correlation degree is then used in formula
UjkTo indicate.
(6.2) we excavate the relationship between emotion word and evaluation object using new EPMI algorithms, and build one
New emotion matrix.New emotion matrix can pass through EPMI algorithms by the two emotion matrixes and incidence matrix obtained before
Directly acquire.
Further, in the step (7), we cluster emotion word, and in emotion matrix, emotion word can be with table
It is shown as a vector, therefore we can carry out unsupervised gather according to the distance between vector sum vector to these emotion words
Class.
Further, in the step (8), because what we took in cluster process is unsupervised cluster, finally
We are uncertain to the number k of cluster, and for different product reviews and different text analyzing tasks, optimal k is not
With, a relatively stable k value of performance can be chosen by crosscheck.
Advantageous effect:Compared with prior art, provided by the invention to build sentiment dictionary automatically for comment on commodity field
Method, in specific field, the performance of field sentiment dictionary is often better than general sentiment dictionary.The present invention provides one kind
The method that sentiment dictionary is built on comment language material, the sentiment dictionary that we build in addition are different from traditional sentiment dictionary, meeting
Emotion word is divided into unfixed k classes, rather than a few classes such as fixed commendation, derogatory sense, this more high-dimensional field feelings
Sense dictionary performance in the tasks such as emotional semantic classification can show better.
Description of the drawings
Fig. 1 is Text Pretreatment module flow diagram;
Fig. 2 is that semantic relation excavates schematic diagram;
Fig. 3 is the selection flow chart of k values.
Specific implementation mode
With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention
Rather than limit the scope of the invention, after having read the present invention, various equivalences of the those skilled in the art to the present invention
The modification of form falls within the application range as defined in the appended claims.
First, of the invention for ease of understanding, it makes the following instructions:
1, Text Pretreatment:
Text Pretreatment module mainly pre-processes original comment text, as shown in Figure 1, coming for Chinese
It says, need to carry out Chinese word segmentation, the software that existing many is increased income can be used for carrying out Chinese word segmentation, such as jieba segments software,
Ik segmenter etc..If being intended to reach best participle effect, user often also needs to add customized dictionary, for identifying one
A little more uncommon domain terms.And for English, due to being separated with comma between word and word, this step need not be segmented
Suddenly, the problems such as but english vocabulary can be related to more complicated tense, it is therefore desirable to which morphology stem is carried out again to english vocabulary
Reduction.Equally, many open source softwares are also supported to carry out stem reduction to english vocabulary, for example, most common natural language
Handling implement packet NLTK.
Other than carrying out the pretreatment on basis to comment text, this module can most importantly obtain one it is preliminary
Emotion word and evaluation object.And we extract emotion word and evaluation object using the method that part of speech is analyzed.Emotion word is used
User is expressed to a certain things, carries dense strong subjective colo(u)r, we choose adjective, and adverbial word and verb are as emotion
Word.Evaluation object refers to just the object of emotion word modification, can be a certain feature of product or product as its name suggests.We select
It is named word and is used as evaluation object.
Certainly, either English or Chinese text can all have a large amount of vocabulary and not include in all senses, such as Chinese
In, "Yes", " ", " ", " " etc., English in " is ", " the " etc., it would be desirable to carry out stop words processing,
These words filter out, and reduce these negative effects of the meaningless high frequency words to our text analyzings.
2, semantic relation excavates
Semantic relation excavates the nucleus module that module is the invention, only fully excavates between emotion word and evaluation object
Relationship, can be next step emotion term clustering lay a good foundation.And the process that final emotion matrix obtains is main
It is divided into three phases, as shown in Figure 2.
The relationship of emotion word and evaluation object shows as a kind of relationship modified and be modified, and this relationship is in comment text
Show as a kind of co-occurrence in this, co-occurrence it is more frequent it is considered that the relationship of the two is closer.And PMI is exactly based on two vocabulary
Co-occurrence come the relationship that both calculates, therefore initial emotion matrix is calculated in we using PMI first.
But at this moment we can face an excessively high problem of dimension, because in comment text, the number of evaluation object
Often be far longer than the number of emotion word, that is to say, that in emotion matrix, emotion word is expressed as a vector, this to
The dimension of amount can be very high, we can largely effect on the standard of cluster when according to the distance of vector to these emotion term clusterings
True rate and performance.We screen evaluation object to the thought of reference tf-idf, and evaluation object refers to the production that user is commented on
The a certain feature of product or product.These product features are that field is relevant, such as in mobile phone products comment, user can pay close attention to
" mobile phone ", " screen ", the features such as " battery ", and in Hotel Products comment, user can pay close attention to " hotel ", " toilet ", " empty
The features such as tune "." mobile phone ", " screen ", the features such as " battery " can be appeared in largely in mobile phone products comment, and few hotel's comment
Middle appearance.Hotel ", " toilet ", the features such as " air-conditioning " can be appeared in largely in hotel's comment, without in mobile phone products
In comment.Therefore, we can use for reference the thought of tf-idf to screen real evaluation object.
For tf-idf often for for text, the object handled by us is then the comment text of a rule, we
Need the comment by same class product to be merged into a document, these comment set as corresponding product review
Document, the number of document are exactly the species number for the different product comment that we collect.Utilize these documents, so that it may to calculate these
The value of the tf-idf of each word in comment text.And the higher noun of tf-idf values is exactly often the evaluation pair that we pay close attention to
As.
One threshold alpha is set, and only tf-idf values reach the noun of threshold value, are just confirmed as evaluation object, we are also referred to as
Be critical evaluation object.Similar to initial emotion matrix, it is contemplated that the relationship of emotion word and critical evaluation object, that is, count
The PMI value for calculating the two, can build a new emotion matrix, emotion word can equally be expressed as one in this matrix in this way
A vector, because we screen evaluation object using tf-idf, vectorial dimension can be far smaller than former at this time
Vectorial dimension in beginning emotion matrix.
PMI value only considered emotion word and evaluation object co-occurrence between the two, the loss that can cause some semantic in this way,
When calculating relationship between emotion word and evaluation object, the direct relation of the two is not only considered, it is also necessary to consider and evaluation pair
As the relationship of other relevant evaluation objects.Therefore, we calculate a phase of critical evaluation object and whole evaluation objects
Guan Xing, and a correlation matrix is obtained, and the following institute of algorithm of the correlation between critical evaluation object and whole evaluation objects
Show:
Wherein M is all product feature, and M ' is the critical evaluation object after screening, and D is the set of comment,
Normal () is a simple normalized function, for will be standardized as on [0,1] section per one-dimensional value in vector
Numerical value.In addition, (m 'i,mj) in d expression critical evaluation objects, m 'iWith evaluation object mjOccur in same comment, we
Think that the number that two evaluation objects occur in same comment is more, the correlation degree of the two is higher.
By above-mentioned anti-, an incidence matrix can be obtained, using the two emotion matrixes obtained before be associated with square
Battle array utilizes formula (5) that can obtain the emotion matrix that we finally need according to EPMI algorithms.
3, emotion term clustering
After obtaining final emotion matrix, emotion word can be clustered, emotion word can indicate in a matrix
At vector, and we exactly cluster according to the distance between vector, and since what is taken is unsupervised cluster, we adopt
The mode of crosscheck is taken, and by taking emotional semantic classification is task as an example, to choose suitable k.The process of specific choice such as Fig. 3 institutes
Show.
Comment data collection is divided into m parts by us, is chosen m-1 parts and is used as test set, remaining 1 part is used as test data, calculates
Classification accuracy of the different k values on test set tests different k values using different test set and training set,
M wheel experiments may finally be carried out, therefore m accuracy rate can be obtained for each k value, choose the highest k of Average Accuracy
Value is as final k values.
For the sentiment dictionary method for auto constructing of comment on commodity, include the following steps successively:
(1) original comment on commodity text is pre-processed.Often take Chinese word segmentation, the means such as stop words filtering, really
Determine the emotion word and evaluation object that designated field text is included.
(2) relationship between excavation step (1) obtained emotion word and evaluation object generates the emotion for indicating this relationship
Matrix.
What the relationship between emotion word and evaluation object directly reflected is a kind of degree of modification of the emotion word to evaluation object:
(2.1) relationship for quantifying the two with the co-occurrence between emotion word and evaluation object, is employed herein PMI to calculate
Relationship between emotion word and evaluation object.
Relationship between an emotion word and evaluation object point mutual information (Pointwise Mutual Information,
PMI it) is calculated, PMI calculation formula are defined as follows:
Wherein, p (word1,word2) it is word1And word2Two words same window in comment on commodity text is total
Existing probability.N is the number for the various words that considered comment on commodity includes.count(word1,word2) refer to word1With
word2The number of two word co-occurrences in the same window in comment on commodity.Count (word) refers in comment on commodity text
The number that middle word word occurs.
(2.2) we indicate that this relationship, the row of matrix indicate all emotion words with a matrix (emotion matrix),
Each row of matrix are then evaluation objects, and each unit of matrix indicates the PMI of corresponding emotion word and evaluation object
Value.
It is as follows that emotion matrix between emotion word and evaluation object is defined as matrix A:
The emotion matrix A of composition is that n rows p row are constituted.Wherein n rows indicate n emotion word, that is, e1~en, p row expressions p is a to be commented
Valence object, that is, m1~mp.Wherein p is far longer than n.And wijRepresent emotion word eiWith evaluation object mjBetween PMI value, wij=PMI
(ei,mj)。
(3) evaluation object obtained to step (1) screens, and leaves the evaluation object of a part of key.
To evaluation object screening use for reference be tf-idf thoughts, specifically include following details:
(3.1) comment of same class product is merged into document, according to word in different documents i.e. different productions
The number occurred during analects closes, and reverse document frequency are judged to calculate the tf-idf values of word.
(3.2) the tf-idf values of all words are calculated, and the tf-idf values of evaluation object are ranked up, and are arranged one
A threshold value, t evaluation object for only reaching this threshold value are just screened the evaluation object for being considered final by us.
(4) same to step (2) is similar, considers the relationship between emotion word and critical evaluation object, generates and indicates both it
Between relationship emotion matrix.With step (2) only difference is that, what emotion matrix included in step (4) be emotion word and
By screening the relationship between the evaluation object left later, and it is not the relationship between emotion word and whole evaluation objects.
Constructed emotion matrix B is n rows t row.N rows equally indicate that n emotion word, t row indicate t critical evaluation pair
As.
(5) phase between the original evaluation object that excavation step (3) the critical evaluation object screened and step 1 obtain
Guan Xing, and generate the correlation matrix for indicating correlation between the two.
In step (5), we can excavate all evaluation objects and the relationship after screening between evaluation object, and raw
At an incidence matrix between critical evaluation object and all evaluation objects, detail is as follows:
(5.1) critical evaluation object is that we screen obtained evaluation object, whole evaluation objects by step (3)
It is exactly all nouns included in initial product comment.
(5.2) correlation degree between critical evaluation object and all evaluation objects can be understood as a kind of same in language material
Adopted relationship, this to be related to us with [0,1] to indicate, wherein 0 indicates not related, 1 indicates degree of association highest.
Constructed correlation matrix C is as follows:
Wherein correlation matrix C is made of t rows p row, and t rows represent the t critical evaluation object after screening, p row generations
T whole evaluation object of table, uijIndicate evaluation object miWith evaluation object mjBetween correlation, correlation can be by miAnd mj
The number appeared in simultaneously in comment on commodity text in a comment calculates.
(6) it utilizes step (2), the two emotion matrixes and step (5) that (4) obtain to obtain correlation matrix, generates one newly
Emotion matrix be used for indicating relationship between emotion word and critical evaluation object.
Using front construction two emotion matrixes and incidence matrix final emotion matrix is generated by EPMI algorithms;
EPMI algorithms:
That is, we are calculating emotion word eiWith evaluation object mjBetween relationship when, not only need to consider the two
Between relationship, it is also necessary to consider and evaluation object mjThose related evaluation objects, and this correlation degree is then used in formula
UjkTo indicate.
Emotion matrix D calculation formula is as follows:
D [n] [t]=B [n] [t]+A [n] [p] * CT[t][p] (5)
Matrix D and matrix A, B is the same, and the relationship being all used to indicate that between emotion word and evaluation object, difference lies in squares
Battle array A, B are to calculate relationship between the two with traditional PMI algorithms, and matrix D then uses our improved PMI algorithms EPMI to calculate
Method is come the relationship that both calculates.
(7) emotion word is clustered, by emotion word in the distance between the emotion matrix of step (6) according to emotion word
Several classes are divided into, field sentiment dictionary is obtained.
In emotion matrix D, emotion word can be expressed as vector with every a line in matrix, according to vectorial in space of matrices
The distance between, emotion word can be polymerized to several classes using clustering methods such as k-means.Finally we can obtain one by feelings
Sense word is divided into certain classes of field sentiment dictionary.
(8) sentiment dictionary is applied in emotional semantic classification task, the methods of crosscheck is taken really according to different fields
A fixed optimal k value, k classes are divided by emotion word.
Claims (8)
1. a kind of sentiment dictionary method for auto constructing for comment on commodity, which is characterized in that include the following steps successively:
(1) original comment on commodity text is pre-processed, determines the emotion word and evaluation pair that designated field text is included
As;
(2) relationship between excavation step (1) obtained emotion word and evaluation object generates the emotion square for indicating this relationship
Battle array;
(3) evaluation object obtained to step (1) screens, and leaves crucial evaluation object;
(4) consider the relationship between emotion word and critical evaluation object, generate the emotion matrix of relationship between indicating both;
(5) correlation between the original evaluation object that excavation step (3) the critical evaluation object screened and step (1) obtain
Property, and generate the correlation matrix for indicating correlation between the two;
(6) it utilizes step (2), the two emotion matrixes and step (5) that (4) obtain to obtain correlation matrix, generates a new feelings
Sense matrix is used for indicating the relationship between emotion word and critical evaluation object;
(7) emotion word is clustered, emotion word is divided in the distance between the emotion matrix of step (6) according to emotion word
For several classes, field sentiment dictionary is obtained;
(8) sentiment dictionary is applied in emotional semantic classification task, takes the methods of crosscheck to determine one according to different fields
A optimal k values, k classes are divided by emotion word.
2. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that step (1)
In, the determination of emotion word and evaluation object is the part of speech according to word, chooses noun included in comment text as evaluation
Object, and the adjective, adverbial word, verb in comment text are then used as emotion word.
3. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step
(2) in, what the relationship between emotion word and evaluation object directly reflected is a kind of degree of modification of the emotion word to evaluation object:
(2.1) relationship for quantifying the two with the co-occurrence between emotion word and evaluation object, is employed herein PMI to calculate emotion
Relationship between word and evaluation object;
PMI calculation formula are defined as follows:
Wherein, p (word1,word2) it is word1And word2Two words in comment on commodity text the same window co-occurrence it is general
Rate;N is the number for the various words that considered comment on commodity includes;count(word1,word2) refer to word1And word2Two
The number of a word co-occurrence in the same window in comment on commodity.Count (word) refers to the word in comment on commodity text
The number that word occurs;
(2.2) we indicate that this relationship, the row of matrix indicate all emotion words, matrix with a matrix (emotion matrix)
Each row be then evaluation object, and each unit of matrix indicates the PMI value of corresponding emotion word and evaluation object;
It is as follows that emotion matrix between emotion word and evaluation object is defined as matrix A:
The emotion matrix A of composition is that n rows p row are constituted;Wherein n rows indicate n emotion word, that is, e1~en, the p evaluation pair of expression of p row
As being m1~mp;And wijRepresent emotion word eiWith evaluation object mjBetween PMI value, wij=PMI (ei,mj)。
4. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step
(3) in, to evaluation object screening use for reference be tf-idf thoughts, specifically include following details:
(3.1) comment of same class product is merged into document, according to the number that word occurs in different documents, and it is reverse
Document frequency calculates the tf-idf values of word;
(3.2) the tf-idf values of all words are calculated, and the tf-idf values of evaluation object are ranked up, and a threshold is set
Value, t evaluation object for only reaching this threshold value just are screened out the evaluation object for being considered final.
5. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step
(4) in, it is similar to the emotion matrix of the structure in step (2), only difference is that, emotion matrix includes in step (4)
Be emotion word and by the relationship between the evaluation object that leaves later of screening, and be not emotion word and whole evaluation objects
Between relationship;
Constructed emotion matrix B is n rows t row;N rows indicate that n emotion word, t row indicate t critical evaluation object;
6. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step
(5) in, all evaluation objects and the relationship after screening between evaluation object are excavated, and generate critical evaluation object and institute
There are a correlation matrix between evaluation object, correlation matrix C as follows:
Wherein correlation matrix C is made of t rows p row, and t rows represent the t critical evaluation object after screening, and p row represent p entirely
The evaluation object in portion, uijIndicate evaluation object miWith evaluation object mjBetween correlation, correlation is by miAnd mjIn comment on commodity
The number appeared in simultaneously in text in a comment calculates.
7. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step
(6) in, using front construction two emotion matrixes and incidence matrix final emotion matrix is generated by EPMI algorithms;
EPMI algorithms:
Calculating emotion word eiWith evaluation object mjBetween relationship when, not only need to consider relationship between the two, it is also necessary to examine
Consider and evaluation object mjThose related evaluation objects, and this correlation degree then uses the u in formulajkTo indicate;
Constructed new emotion matrix D calculation formula is as follows:
D [n] [t]=B [n] [t]+A [n] [p] * CT[t][p] (5)。
8. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step
(7) in, emotion word is clustered, in emotion matrix D, emotion word can be expressed as vector, root with every a line in matrix
According to the distance between vector in space of matrices, emotion word can be polymerized to using k-means clustering methods by several classes, finally obtain one
It is a that emotion word is divided into certain classes of field sentiment dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810539447.4A CN108763214B (en) | 2018-05-30 | 2018-05-30 | Automatic construction method of emotion dictionary for commodity comments |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810539447.4A CN108763214B (en) | 2018-05-30 | 2018-05-30 | Automatic construction method of emotion dictionary for commodity comments |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108763214A true CN108763214A (en) | 2018-11-06 |
CN108763214B CN108763214B (en) | 2021-09-24 |
Family
ID=64004195
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810539447.4A Active CN108763214B (en) | 2018-05-30 | 2018-05-30 | Automatic construction method of emotion dictionary for commodity comments |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763214B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800418A (en) * | 2018-12-17 | 2019-05-24 | 北京百度网讯科技有限公司 | Text handling method, device and storage medium |
CN109933793A (en) * | 2019-03-15 | 2019-06-25 | 腾讯科技(深圳)有限公司 | Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing |
CN110147552A (en) * | 2019-05-22 | 2019-08-20 | 南京邮电大学 | Educational resource quality evaluation method for digging and system based on natural language processing |
CN110413780A (en) * | 2019-07-16 | 2019-11-05 | 合肥工业大学 | Text emotion analysis method, device, storage medium and electronic equipment |
CN111080055A (en) * | 2019-11-06 | 2020-04-28 | 邱素容 | Hotel scoring method, hotel recommendation method, electronic device and storage medium |
CN112818682A (en) * | 2021-01-22 | 2021-05-18 | 深圳大学 | E-commerce data analysis method, equipment, device and computer-readable storage medium |
CN113377929A (en) * | 2021-08-12 | 2021-09-10 | 北京好欣晴移动医疗科技有限公司 | Unsupervised clustering method, unsupervised clustering device and unsupervised clustering system for special terms |
CN116320626A (en) * | 2023-05-11 | 2023-06-23 | 深圳市兴意腾科技电子有限公司 | Method and system for calculating live broadcast heat of electronic commerce |
CN117217218A (en) * | 2023-11-08 | 2023-12-12 | 中国科学技术信息研究所 | Emotion dictionary construction method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130092342A (en) * | 2012-02-09 | 2013-08-20 | 고민수 | System and method for creating emotional word dictionary and computing emotional degrees of documents |
CN103646097A (en) * | 2013-12-18 | 2014-03-19 | 北京理工大学 | Constraint relationship based opinion objective and emotion word united clustering method |
CN105117428A (en) * | 2015-08-04 | 2015-12-02 | 电子科技大学 | Web comment sentiment analysis method based on word alignment model |
CN105718446A (en) * | 2016-03-08 | 2016-06-29 | 徐勇 | UGC fuzzy comprehensive evaluation method based on sentiment analysis |
CN106407177A (en) * | 2016-08-26 | 2017-02-15 | 西南大学 | Emergency online group behavior detection method based on clustering analysis |
CN107369066A (en) * | 2017-06-28 | 2017-11-21 | 东软集团股份有限公司 | A kind of feature between comment object compares method and device |
-
2018
- 2018-05-30 CN CN201810539447.4A patent/CN108763214B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130092342A (en) * | 2012-02-09 | 2013-08-20 | 고민수 | System and method for creating emotional word dictionary and computing emotional degrees of documents |
CN103646097A (en) * | 2013-12-18 | 2014-03-19 | 北京理工大学 | Constraint relationship based opinion objective and emotion word united clustering method |
CN105117428A (en) * | 2015-08-04 | 2015-12-02 | 电子科技大学 | Web comment sentiment analysis method based on word alignment model |
CN105718446A (en) * | 2016-03-08 | 2016-06-29 | 徐勇 | UGC fuzzy comprehensive evaluation method based on sentiment analysis |
CN106407177A (en) * | 2016-08-26 | 2017-02-15 | 西南大学 | Emergency online group behavior detection method based on clustering analysis |
CN107369066A (en) * | 2017-06-28 | 2017-11-21 | 东软集团股份有限公司 | A kind of feature between comment object compares method and device |
Non-Patent Citations (3)
Title |
---|
JUN ZHANG: "Research on Building a Chinese Sentiment Lexicon Based on SO-PMI", 《APPLIED MECHANICS AND MATERIALS》 * |
刘沙: "电商网站的产品评价对象抽取研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
牛振东: "基于三层过滤的评价对象抽取", 《北京理工大学学报》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800418A (en) * | 2018-12-17 | 2019-05-24 | 北京百度网讯科技有限公司 | Text handling method, device and storage medium |
CN109800418B (en) * | 2018-12-17 | 2023-05-05 | 北京百度网讯科技有限公司 | Text processing method, device and storage medium |
CN109933793A (en) * | 2019-03-15 | 2019-06-25 | 腾讯科技(深圳)有限公司 | Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing |
CN109933793B (en) * | 2019-03-15 | 2023-01-06 | 腾讯科技(深圳)有限公司 | Text polarity identification method, device and equipment and readable storage medium |
CN110147552A (en) * | 2019-05-22 | 2019-08-20 | 南京邮电大学 | Educational resource quality evaluation method for digging and system based on natural language processing |
CN110147552B (en) * | 2019-05-22 | 2022-12-06 | 南京邮电大学 | Education resource quality evaluation mining method and system based on natural language processing |
CN110413780B (en) * | 2019-07-16 | 2022-02-22 | 合肥工业大学 | Text emotion analysis method and electronic equipment |
CN110413780A (en) * | 2019-07-16 | 2019-11-05 | 合肥工业大学 | Text emotion analysis method, device, storage medium and electronic equipment |
CN111080055A (en) * | 2019-11-06 | 2020-04-28 | 邱素容 | Hotel scoring method, hotel recommendation method, electronic device and storage medium |
CN112818682A (en) * | 2021-01-22 | 2021-05-18 | 深圳大学 | E-commerce data analysis method, equipment, device and computer-readable storage medium |
CN112818682B (en) * | 2021-01-22 | 2023-01-03 | 深圳大学 | E-commerce data analysis method, equipment, device and computer-readable storage medium |
CN113377929B (en) * | 2021-08-12 | 2021-12-10 | 北京好欣晴移动医疗科技有限公司 | Unsupervised clustering method, unsupervised clustering device and unsupervised clustering system for special terms |
CN113377929A (en) * | 2021-08-12 | 2021-09-10 | 北京好欣晴移动医疗科技有限公司 | Unsupervised clustering method, unsupervised clustering device and unsupervised clustering system for special terms |
CN116320626A (en) * | 2023-05-11 | 2023-06-23 | 深圳市兴意腾科技电子有限公司 | Method and system for calculating live broadcast heat of electronic commerce |
CN116320626B (en) * | 2023-05-11 | 2023-11-14 | 深圳市兴意腾科技电子有限公司 | Method and system for calculating live broadcast heat of electronic commerce |
CN117217218A (en) * | 2023-11-08 | 2023-12-12 | 中国科学技术信息研究所 | Emotion dictionary construction method and device, electronic equipment and storage medium |
CN117217218B (en) * | 2023-11-08 | 2024-01-23 | 中国科学技术信息研究所 | Emotion dictionary construction method and device for science and technology risk event related public opinion |
Also Published As
Publication number | Publication date |
---|---|
CN108763214B (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108763214A (en) | A kind of sentiment dictionary method for auto constructing for comment on commodity | |
US10678816B2 (en) | Single-entity-single-relation question answering systems, and methods | |
Shen et al. | Pragmatically informative text generation | |
Choi et al. | Aila: Attentive interactive labeling assistant for document classification through attention-based deep neural networks | |
Benchimol et al. | Text mining methodologies with R: An application to central bank texts | |
CN110134792B (en) | Text recognition method and device, electronic equipment and storage medium | |
CN106940726B (en) | Creative automatic generation method and terminal based on knowledge network | |
CN108132927A (en) | A kind of fusion graph structure and the associated keyword extracting method of node | |
CN111080055A (en) | Hotel scoring method, hotel recommendation method, electronic device and storage medium | |
CN109947934A (en) | For the data digging method and system of short text | |
CN110222250A (en) | A kind of emergency event triggering word recognition method towards microblogging | |
CN115795030A (en) | Text classification method and device, computer equipment and storage medium | |
TWI254880B (en) | Method for classifying electronic document analysis | |
CN108536781A (en) | A kind of method for digging and system of social networks mood focus | |
Steuber et al. | Topic modeling of short texts using anchor words | |
CN105740225B (en) | A kind of Word sense disambiguation method merging sentence local context and document realm information | |
Yaddarabullah et al. | Classification hoax news of COVID-19 on Instagram using K-nearest neighbor | |
Wang et al. | Distributional modeling on a diet: One-shot word learning from text only | |
CN116304063B (en) | Simple emotion knowledge enhancement prompt tuning aspect-level emotion classification method | |
Kieu et al. | Learning neural textual representations for citation recommendation | |
Bethard | We need to talk about random seeds | |
Zhu et al. | An improved method for k-means clustering based on internal validity indexes and inter-cluster variance | |
Azizov et al. | Frank at checkthat! 2023: Detecting the political bias of news articles and news media | |
CN109800430A (en) | A kind of semantic understanding method and system | |
CN108920475A (en) | A kind of short text similarity calculating method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |