CN108763214A - A kind of sentiment dictionary method for auto constructing for comment on commodity - Google Patents

A kind of sentiment dictionary method for auto constructing for comment on commodity Download PDF

Info

Publication number
CN108763214A
CN108763214A CN201810539447.4A CN201810539447A CN108763214A CN 108763214 A CN108763214 A CN 108763214A CN 201810539447 A CN201810539447 A CN 201810539447A CN 108763214 A CN108763214 A CN 108763214A
Authority
CN
China
Prior art keywords
emotion
word
evaluation object
comment
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810539447.4A
Other languages
Chinese (zh)
Other versions
CN108763214B (en
Inventor
冯钧
贡诚
李晓东
邹希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201810539447.4A priority Critical patent/CN108763214B/en
Publication of CN108763214A publication Critical patent/CN108763214A/en
Application granted granted Critical
Publication of CN108763214B publication Critical patent/CN108763214B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Abstract

The invention discloses a kind of sentiment dictionary method for auto constructing for comment on commodity, including Text Pretreatment, semantic relation excavate, emotion term clustering.Text Pretreatment extracts emotion word and evaluation object included in certain a kind of comment on commodity for being pre-processed to comment on commodity.Semantic relation excavates, and excavates the semantic relation between emotion word and evaluation object, the semantic relation between emotion word and evaluation object is indicated with the form of matrix.Emotion term clustering, according to mutual distance of the emotion word in emotion space of matrices, carrying out unsupervised cluster to emotion word can reasonably be divided into emotion word k classes.The present invention is directed to the characteristics of comment on commodity field text, construct field sentiment dictionary, emotion word can be divided into multiclass rather than traditional pass judgement on two major classes by the dictionary, for comment on commodity field, field sentiment dictionary has big advantage compared with other existing general sentiment dictionaries in terms of emotional semantic classification task dispatching.

Description

A kind of sentiment dictionary method for auto constructing for comment on commodity
Technical field
The present invention relates to a kind of sentiment dictionaries for comment on commodity field to make construction method by oneself, belongs to computerized information skill Art processing technology field.
Background technology
With the development of various shopping websites, the largely comment about all kinds of commodity has appeared on network, Ren Menke To consult these comments anywhere or anytime.And identify that these comment on contained Sentiment orientation, either still consume businessman Person is particularly significant.And a good sentiment dictionary is the basis that Sentiment orientation analysis is carried out to text.It is well known that text This progress sentiment analysis needs to consider the industry field belonging to text.And existing sentiment dictionary is all general, there is no needles To the sentiment dictionary of this designated field of comment on commodity.Obviously, comment on commodity text is carried out using traditional sentiment dictionary Sentiment analysis is inappropriate.Therefore the method for auto constructing of sentiment dictionary draws in particular for the sentiment dictionary of specific area More and more experts and scholars' concerns and research are played.
Existing sentiment dictionary construction method can be divided into based on language material and be based on for Chinese and English Knowledge base two major classes.And sentiment dictionary is built based on corpus, most common method is exactly selected seed word, and passes through calculating Relationship, that is, PMI value between the emotion word and seed words of unknown feeling polarities determines the feeling polarities of emotion word.Then Chinese Utilizable commonsense knowledge base is extremely limited, so the research for building Chinese sentiment dictionary using knowledge base is also seldom.And In sentiment dictionary of the structure for comment on commodity field, need that special consideration should be given to evaluation objects.Evaluation object is that we are evaluated Commodity a certain feature, such as mobile phone, evaluation object can be the screen of mobile phone, the features such as battery.
On the other hand, existing sentiment dictionary is generally only comprising some emotion words, and by these emotion words according to commendation Word, derogatory term two major classes are divided.There are also scholars to be divided into happiness by emotion, sad, frightened, surprised, angry, envies Six major class.To sum up, existing emotional semantic classification is all based on the Heuristics of people, the emotion class that emotion word can divide is determined Not.
In view of many emotion words can usually show different Sentiment orientations in different fields, therefore can be accurate Identify that the theme in these emotion words and evaluation object or perhaps field seems particularly significant, especially in the neck of comment on commodity Domain.Fast discoveries take information expert or by the way of crowdsourcing service come to build field sentiment dictionary be very difficult. Shi et al. extracts information crucial in the text of field by association rule algorithm and in the way of being combined with the machine learning of supervision. Zhang et al. will put mutual information (PMI) and association rule algorithm to extract the evaluation object of product.It is suitable in view of evaluation object Sequence problem, Qiu etc. calculate emotion word and product he just between relationship on the basis of propose a kind of two-way propagation algorithm.Mishne Then evaluation object is chosen using the part of speech of word and word frequency.
PMI is a common counter for considering correlation degree between two words.Turney and Littman is used PMI and LSA calculates the correlation degree between two words, this to calculate relationship between word and seed words using PMI Method is commonly known as the behavioural habits that so-pmi.Yang etc. combines user on the basis of so-pmi, it is proposed that Yi Zhongxin Structure sentiment dictionary method.Islam and Inkpen then improve PMI, it is proposed that SOC-PMI. emotional semantic classifications are appointed Business is a background task of sentiment analysis, and the quality of classification results can directly reflect the performance of sentiment dictionary used. Pang emotional semantic classification task as a kind of text categorization task, and to three kinds of naive Bayesian, support vector machines and maximum entropy Grader is tested.Li and Hao then extends evaluation object using the method for spectral clustering.Yang et al. is then utilized Word2vec calculates the cosine similarity between word and seed words.
Existing sentiment dictionary construction method is mostly to be directed to general dictionary, and these general dictionaries are not to be good at fitting The text of analysis specific area, such as comment on commodity text are closed, therefore build one to be suitble to specific area sentiment dictionary just Seem particularly significant.
Invention content
Goal of the invention:It is an object of the invention to solve the shortcomings of the prior art, one kind is provided and is judged for commodity By the sentiment dictionary construction method in field, emotion word is polymerized to suitable several classes in the present invention, is different from existing general emotion Dictionary, the foundation of category division is the relationship between emotion word and evaluation object, fixed without being limited to commendation, derogatory sense etc. Feeling polarities.
Technical solution:A kind of sentiment dictionary method for auto constructing for comment on commodity includes the following steps successively:
(1) original comment on commodity text is pre-processed.Often take Chinese word segmentation, the means such as stop words filtering, really Determine the emotion word and evaluation object that designated field text is included.The determination of emotion word and evaluation object is the word according to word Property, noun included in comment text is chosen as evaluation object, and adjective, adverbial word, the verb in comment text are then made For emotion word.
(2) relationship between excavation step (1) obtained emotion word and evaluation object generates the emotion for indicating this relationship Matrix.
(3) evaluation object obtained to step (1) screens, and leaves the evaluation object of a part of key.
(4) same to step (2) is similar, considers the relationship between emotion word and critical evaluation object, generates and indicates both it Between relationship emotion matrix.
(5) phase between the original evaluation object that excavation step (3) the critical evaluation object screened and step 1 obtain Guan Xing, and generate the correlation matrix for indicating correlation between the two.
(6) step (2), the two emotion matrixes and step 5 that (4) obtain is utilized to obtain correlation matrix, generate one it is new Emotion matrix is used for indicating the relationship between emotion word and critical evaluation object.
(7) emotion word is clustered, by emotion word in the distance between the emotion matrix of step (6) according to emotion word Several classes are divided into, field sentiment dictionary is obtained.
(8) sentiment dictionary is applied in emotional semantic classification task, the methods of crosscheck is taken really according to different fields A fixed optimal k value, k classes are divided by emotion word.
Further, in the step (2), what the relationship between emotion word and evaluation object directly reflected is emotion word pair A kind of degree of modification of evaluation object:
(2.1) relationship for quantifying the two with the co-occurrence between emotion word and evaluation object, is employed herein PMI to calculate Relationship between emotion word and evaluation object.
(2.2) we indicate that this relationship, the row of matrix indicate all emotion words with a matrix (emotion matrix), Each row of matrix are then evaluation objects, and each unit of matrix indicates the PMI of corresponding emotion word and evaluation object Value.
Further, in the step (3), to evaluation object screening use for reference be tf-idf thoughts, specifically include with Lower details:
(3.1) comment of same class product is merged into document, according to word in different documents i.e. different productions The number occurred during analects closes, and reverse document frequency are judged to calculate the tf-idf values of word.
(3.2) the tf-idf values of all words are calculated, and the tf-idf values of evaluation object are ranked up, and are arranged one A threshold value, the evaluation object for only reaching this threshold value are just screened the evaluation object for being considered final by us.
Further, in the step (4), it is similar to the emotion matrix of the structure in step (2), unique difference exists In, what emotion matrix included in step (4) is emotion word and passes through the relationship between the evaluation object that screening leaves later, and It is not the relationship between emotion word and whole evaluation objects.
Further, in the step (5), we can excavate all evaluation objects and by screening after evaluation object it Between relationship, and generate an incidence matrix between critical evaluation object and all evaluation objects, detail is as follows:
(5.1) critical evaluation object is that we screen obtained evaluation object, whole evaluation objects by step (3) It is exactly all nouns included in initial product comment.
(5.2) correlation degree between critical evaluation object and all evaluation objects can be understood as a kind of same in language material Adopted relationship, this to be related to us with [0,1] to indicate, wherein 0 indicates not related, 1 indicates degree of association highest.Others association Degree uses the numerical value between [0,1] section to indicate, illustrates that the degree of association is higher closer to 1.
Further, in the step (6), we using front construction two emotion matrixes and incidence matrix generate Final emotion matrix, basis be it is proposed that a kind of improved PMI algorithms EPMI algorithms, it is specific as follows:
(6.1) EPMI algorithms:
That is, we are calculating emotion word eiWith evaluation object mjBetween relationship when, not only need to consider the two Between relationship, it is also necessary to consider and evaluation object mjThose related evaluation objects, and this correlation degree is then used in formula UjkTo indicate.
(6.2) we excavate the relationship between emotion word and evaluation object using new EPMI algorithms, and build one New emotion matrix.New emotion matrix can pass through EPMI algorithms by the two emotion matrixes and incidence matrix obtained before Directly acquire.
Further, in the step (7), we cluster emotion word, and in emotion matrix, emotion word can be with table It is shown as a vector, therefore we can carry out unsupervised gather according to the distance between vector sum vector to these emotion words Class.
Further, in the step (8), because what we took in cluster process is unsupervised cluster, finally We are uncertain to the number k of cluster, and for different product reviews and different text analyzing tasks, optimal k is not With, a relatively stable k value of performance can be chosen by crosscheck.
Advantageous effect:Compared with prior art, provided by the invention to build sentiment dictionary automatically for comment on commodity field Method, in specific field, the performance of field sentiment dictionary is often better than general sentiment dictionary.The present invention provides one kind The method that sentiment dictionary is built on comment language material, the sentiment dictionary that we build in addition are different from traditional sentiment dictionary, meeting Emotion word is divided into unfixed k classes, rather than a few classes such as fixed commendation, derogatory sense, this more high-dimensional field feelings Sense dictionary performance in the tasks such as emotional semantic classification can show better.
Description of the drawings
Fig. 1 is Text Pretreatment module flow diagram;
Fig. 2 is that semantic relation excavates schematic diagram;
Fig. 3 is the selection flow chart of k values.
Specific implementation mode
With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after having read the present invention, various equivalences of the those skilled in the art to the present invention The modification of form falls within the application range as defined in the appended claims.
First, of the invention for ease of understanding, it makes the following instructions:
1, Text Pretreatment:
Text Pretreatment module mainly pre-processes original comment text, as shown in Figure 1, coming for Chinese It says, need to carry out Chinese word segmentation, the software that existing many is increased income can be used for carrying out Chinese word segmentation, such as jieba segments software, Ik segmenter etc..If being intended to reach best participle effect, user often also needs to add customized dictionary, for identifying one A little more uncommon domain terms.And for English, due to being separated with comma between word and word, this step need not be segmented Suddenly, the problems such as but english vocabulary can be related to more complicated tense, it is therefore desirable to which morphology stem is carried out again to english vocabulary Reduction.Equally, many open source softwares are also supported to carry out stem reduction to english vocabulary, for example, most common natural language Handling implement packet NLTK.
Other than carrying out the pretreatment on basis to comment text, this module can most importantly obtain one it is preliminary Emotion word and evaluation object.And we extract emotion word and evaluation object using the method that part of speech is analyzed.Emotion word is used User is expressed to a certain things, carries dense strong subjective colo(u)r, we choose adjective, and adverbial word and verb are as emotion Word.Evaluation object refers to just the object of emotion word modification, can be a certain feature of product or product as its name suggests.We select It is named word and is used as evaluation object.
Certainly, either English or Chinese text can all have a large amount of vocabulary and not include in all senses, such as Chinese In, "Yes", " ", " ", " " etc., English in " is ", " the " etc., it would be desirable to carry out stop words processing, These words filter out, and reduce these negative effects of the meaningless high frequency words to our text analyzings.
2, semantic relation excavates
Semantic relation excavates the nucleus module that module is the invention, only fully excavates between emotion word and evaluation object Relationship, can be next step emotion term clustering lay a good foundation.And the process that final emotion matrix obtains is main It is divided into three phases, as shown in Figure 2.
The relationship of emotion word and evaluation object shows as a kind of relationship modified and be modified, and this relationship is in comment text Show as a kind of co-occurrence in this, co-occurrence it is more frequent it is considered that the relationship of the two is closer.And PMI is exactly based on two vocabulary Co-occurrence come the relationship that both calculates, therefore initial emotion matrix is calculated in we using PMI first.
But at this moment we can face an excessively high problem of dimension, because in comment text, the number of evaluation object Often be far longer than the number of emotion word, that is to say, that in emotion matrix, emotion word is expressed as a vector, this to The dimension of amount can be very high, we can largely effect on the standard of cluster when according to the distance of vector to these emotion term clusterings True rate and performance.We screen evaluation object to the thought of reference tf-idf, and evaluation object refers to the production that user is commented on The a certain feature of product or product.These product features are that field is relevant, such as in mobile phone products comment, user can pay close attention to " mobile phone ", " screen ", the features such as " battery ", and in Hotel Products comment, user can pay close attention to " hotel ", " toilet ", " empty The features such as tune "." mobile phone ", " screen ", the features such as " battery " can be appeared in largely in mobile phone products comment, and few hotel's comment Middle appearance.Hotel ", " toilet ", the features such as " air-conditioning " can be appeared in largely in hotel's comment, without in mobile phone products In comment.Therefore, we can use for reference the thought of tf-idf to screen real evaluation object.
For tf-idf often for for text, the object handled by us is then the comment text of a rule, we Need the comment by same class product to be merged into a document, these comment set as corresponding product review Document, the number of document are exactly the species number for the different product comment that we collect.Utilize these documents, so that it may to calculate these The value of the tf-idf of each word in comment text.And the higher noun of tf-idf values is exactly often the evaluation pair that we pay close attention to As.
One threshold alpha is set, and only tf-idf values reach the noun of threshold value, are just confirmed as evaluation object, we are also referred to as Be critical evaluation object.Similar to initial emotion matrix, it is contemplated that the relationship of emotion word and critical evaluation object, that is, count The PMI value for calculating the two, can build a new emotion matrix, emotion word can equally be expressed as one in this matrix in this way A vector, because we screen evaluation object using tf-idf, vectorial dimension can be far smaller than former at this time Vectorial dimension in beginning emotion matrix.
PMI value only considered emotion word and evaluation object co-occurrence between the two, the loss that can cause some semantic in this way, When calculating relationship between emotion word and evaluation object, the direct relation of the two is not only considered, it is also necessary to consider and evaluation pair As the relationship of other relevant evaluation objects.Therefore, we calculate a phase of critical evaluation object and whole evaluation objects Guan Xing, and a correlation matrix is obtained, and the following institute of algorithm of the correlation between critical evaluation object and whole evaluation objects Show:
Wherein M is all product feature, and M ' is the critical evaluation object after screening, and D is the set of comment, Normal () is a simple normalized function, for will be standardized as on [0,1] section per one-dimensional value in vector Numerical value.In addition, (m 'i,mj) in d expression critical evaluation objects, m 'iWith evaluation object mjOccur in same comment, we Think that the number that two evaluation objects occur in same comment is more, the correlation degree of the two is higher.
By above-mentioned anti-, an incidence matrix can be obtained, using the two emotion matrixes obtained before be associated with square Battle array utilizes formula (5) that can obtain the emotion matrix that we finally need according to EPMI algorithms.
3, emotion term clustering
After obtaining final emotion matrix, emotion word can be clustered, emotion word can indicate in a matrix At vector, and we exactly cluster according to the distance between vector, and since what is taken is unsupervised cluster, we adopt The mode of crosscheck is taken, and by taking emotional semantic classification is task as an example, to choose suitable k.The process of specific choice such as Fig. 3 institutes Show.
Comment data collection is divided into m parts by us, is chosen m-1 parts and is used as test set, remaining 1 part is used as test data, calculates Classification accuracy of the different k values on test set tests different k values using different test set and training set, M wheel experiments may finally be carried out, therefore m accuracy rate can be obtained for each k value, choose the highest k of Average Accuracy Value is as final k values.
For the sentiment dictionary method for auto constructing of comment on commodity, include the following steps successively:
(1) original comment on commodity text is pre-processed.Often take Chinese word segmentation, the means such as stop words filtering, really Determine the emotion word and evaluation object that designated field text is included.
(2) relationship between excavation step (1) obtained emotion word and evaluation object generates the emotion for indicating this relationship Matrix.
What the relationship between emotion word and evaluation object directly reflected is a kind of degree of modification of the emotion word to evaluation object:
(2.1) relationship for quantifying the two with the co-occurrence between emotion word and evaluation object, is employed herein PMI to calculate Relationship between emotion word and evaluation object.
Relationship between an emotion word and evaluation object point mutual information (Pointwise Mutual Information, PMI it) is calculated, PMI calculation formula are defined as follows:
Wherein, p (word1,word2) it is word1And word2Two words same window in comment on commodity text is total Existing probability.N is the number for the various words that considered comment on commodity includes.count(word1,word2) refer to word1With word2The number of two word co-occurrences in the same window in comment on commodity.Count (word) refers in comment on commodity text The number that middle word word occurs.
(2.2) we indicate that this relationship, the row of matrix indicate all emotion words with a matrix (emotion matrix), Each row of matrix are then evaluation objects, and each unit of matrix indicates the PMI of corresponding emotion word and evaluation object Value.
It is as follows that emotion matrix between emotion word and evaluation object is defined as matrix A:
The emotion matrix A of composition is that n rows p row are constituted.Wherein n rows indicate n emotion word, that is, e1~en, p row expressions p is a to be commented Valence object, that is, m1~mp.Wherein p is far longer than n.And wijRepresent emotion word eiWith evaluation object mjBetween PMI value, wij=PMI (ei,mj)。
(3) evaluation object obtained to step (1) screens, and leaves the evaluation object of a part of key.
To evaluation object screening use for reference be tf-idf thoughts, specifically include following details:
(3.1) comment of same class product is merged into document, according to word in different documents i.e. different productions The number occurred during analects closes, and reverse document frequency are judged to calculate the tf-idf values of word.
(3.2) the tf-idf values of all words are calculated, and the tf-idf values of evaluation object are ranked up, and are arranged one A threshold value, t evaluation object for only reaching this threshold value are just screened the evaluation object for being considered final by us.
(4) same to step (2) is similar, considers the relationship between emotion word and critical evaluation object, generates and indicates both it Between relationship emotion matrix.With step (2) only difference is that, what emotion matrix included in step (4) be emotion word and By screening the relationship between the evaluation object left later, and it is not the relationship between emotion word and whole evaluation objects.
Constructed emotion matrix B is n rows t row.N rows equally indicate that n emotion word, t row indicate t critical evaluation pair As.
(5) phase between the original evaluation object that excavation step (3) the critical evaluation object screened and step 1 obtain Guan Xing, and generate the correlation matrix for indicating correlation between the two.
In step (5), we can excavate all evaluation objects and the relationship after screening between evaluation object, and raw At an incidence matrix between critical evaluation object and all evaluation objects, detail is as follows:
(5.1) critical evaluation object is that we screen obtained evaluation object, whole evaluation objects by step (3) It is exactly all nouns included in initial product comment.
(5.2) correlation degree between critical evaluation object and all evaluation objects can be understood as a kind of same in language material Adopted relationship, this to be related to us with [0,1] to indicate, wherein 0 indicates not related, 1 indicates degree of association highest.
Constructed correlation matrix C is as follows:
Wherein correlation matrix C is made of t rows p row, and t rows represent the t critical evaluation object after screening, p row generations T whole evaluation object of table, uijIndicate evaluation object miWith evaluation object mjBetween correlation, correlation can be by miAnd mj The number appeared in simultaneously in comment on commodity text in a comment calculates.
(6) it utilizes step (2), the two emotion matrixes and step (5) that (4) obtain to obtain correlation matrix, generates one newly Emotion matrix be used for indicating relationship between emotion word and critical evaluation object.
Using front construction two emotion matrixes and incidence matrix final emotion matrix is generated by EPMI algorithms; EPMI algorithms:
That is, we are calculating emotion word eiWith evaluation object mjBetween relationship when, not only need to consider the two Between relationship, it is also necessary to consider and evaluation object mjThose related evaluation objects, and this correlation degree is then used in formula UjkTo indicate.
Emotion matrix D calculation formula is as follows:
D [n] [t]=B [n] [t]+A [n] [p] * CT[t][p] (5)
Matrix D and matrix A, B is the same, and the relationship being all used to indicate that between emotion word and evaluation object, difference lies in squares Battle array A, B are to calculate relationship between the two with traditional PMI algorithms, and matrix D then uses our improved PMI algorithms EPMI to calculate Method is come the relationship that both calculates.
(7) emotion word is clustered, by emotion word in the distance between the emotion matrix of step (6) according to emotion word Several classes are divided into, field sentiment dictionary is obtained.
In emotion matrix D, emotion word can be expressed as vector with every a line in matrix, according to vectorial in space of matrices The distance between, emotion word can be polymerized to several classes using clustering methods such as k-means.Finally we can obtain one by feelings Sense word is divided into certain classes of field sentiment dictionary.
(8) sentiment dictionary is applied in emotional semantic classification task, the methods of crosscheck is taken really according to different fields A fixed optimal k value, k classes are divided by emotion word.

Claims (8)

1. a kind of sentiment dictionary method for auto constructing for comment on commodity, which is characterized in that include the following steps successively:
(1) original comment on commodity text is pre-processed, determines the emotion word and evaluation pair that designated field text is included As;
(2) relationship between excavation step (1) obtained emotion word and evaluation object generates the emotion square for indicating this relationship Battle array;
(3) evaluation object obtained to step (1) screens, and leaves crucial evaluation object;
(4) consider the relationship between emotion word and critical evaluation object, generate the emotion matrix of relationship between indicating both;
(5) correlation between the original evaluation object that excavation step (3) the critical evaluation object screened and step (1) obtain Property, and generate the correlation matrix for indicating correlation between the two;
(6) it utilizes step (2), the two emotion matrixes and step (5) that (4) obtain to obtain correlation matrix, generates a new feelings Sense matrix is used for indicating the relationship between emotion word and critical evaluation object;
(7) emotion word is clustered, emotion word is divided in the distance between the emotion matrix of step (6) according to emotion word For several classes, field sentiment dictionary is obtained;
(8) sentiment dictionary is applied in emotional semantic classification task, takes the methods of crosscheck to determine one according to different fields A optimal k values, k classes are divided by emotion word.
2. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that step (1) In, the determination of emotion word and evaluation object is the part of speech according to word, chooses noun included in comment text as evaluation Object, and the adjective, adverbial word, verb in comment text are then used as emotion word.
3. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step (2) in, what the relationship between emotion word and evaluation object directly reflected is a kind of degree of modification of the emotion word to evaluation object:
(2.1) relationship for quantifying the two with the co-occurrence between emotion word and evaluation object, is employed herein PMI to calculate emotion Relationship between word and evaluation object;
PMI calculation formula are defined as follows:
Wherein, p (word1,word2) it is word1And word2Two words in comment on commodity text the same window co-occurrence it is general Rate;N is the number for the various words that considered comment on commodity includes;count(word1,word2) refer to word1And word2Two The number of a word co-occurrence in the same window in comment on commodity.Count (word) refers to the word in comment on commodity text The number that word occurs;
(2.2) we indicate that this relationship, the row of matrix indicate all emotion words, matrix with a matrix (emotion matrix) Each row be then evaluation object, and each unit of matrix indicates the PMI value of corresponding emotion word and evaluation object;
It is as follows that emotion matrix between emotion word and evaluation object is defined as matrix A:
The emotion matrix A of composition is that n rows p row are constituted;Wherein n rows indicate n emotion word, that is, e1~en, the p evaluation pair of expression of p row As being m1~mp;And wijRepresent emotion word eiWith evaluation object mjBetween PMI value, wij=PMI (ei,mj)。
4. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step (3) in, to evaluation object screening use for reference be tf-idf thoughts, specifically include following details:
(3.1) comment of same class product is merged into document, according to the number that word occurs in different documents, and it is reverse Document frequency calculates the tf-idf values of word;
(3.2) the tf-idf values of all words are calculated, and the tf-idf values of evaluation object are ranked up, and a threshold is set Value, t evaluation object for only reaching this threshold value just are screened out the evaluation object for being considered final.
5. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step (4) in, it is similar to the emotion matrix of the structure in step (2), only difference is that, emotion matrix includes in step (4) Be emotion word and by the relationship between the evaluation object that leaves later of screening, and be not emotion word and whole evaluation objects Between relationship;
Constructed emotion matrix B is n rows t row;N rows indicate that n emotion word, t row indicate t critical evaluation object;
6. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step (5) in, all evaluation objects and the relationship after screening between evaluation object are excavated, and generate critical evaluation object and institute There are a correlation matrix between evaluation object, correlation matrix C as follows:
Wherein correlation matrix C is made of t rows p row, and t rows represent the t critical evaluation object after screening, and p row represent p entirely The evaluation object in portion, uijIndicate evaluation object miWith evaluation object mjBetween correlation, correlation is by miAnd mjIn comment on commodity The number appeared in simultaneously in text in a comment calculates.
7. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step (6) in, using front construction two emotion matrixes and incidence matrix final emotion matrix is generated by EPMI algorithms;
EPMI algorithms:
Calculating emotion word eiWith evaluation object mjBetween relationship when, not only need to consider relationship between the two, it is also necessary to examine Consider and evaluation object mjThose related evaluation objects, and this correlation degree then uses the u in formulajkTo indicate;
Constructed new emotion matrix D calculation formula is as follows:
D [n] [t]=B [n] [t]+A [n] [p] * CT[t][p] (5)。
8. being directed to the sentiment dictionary method for auto constructing of comment on commodity as described in claim 1, which is characterized in that the step (7) in, emotion word is clustered, in emotion matrix D, emotion word can be expressed as vector, root with every a line in matrix According to the distance between vector in space of matrices, emotion word can be polymerized to using k-means clustering methods by several classes, finally obtain one It is a that emotion word is divided into certain classes of field sentiment dictionary.
CN201810539447.4A 2018-05-30 2018-05-30 Automatic construction method of emotion dictionary for commodity comments Active CN108763214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810539447.4A CN108763214B (en) 2018-05-30 2018-05-30 Automatic construction method of emotion dictionary for commodity comments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810539447.4A CN108763214B (en) 2018-05-30 2018-05-30 Automatic construction method of emotion dictionary for commodity comments

Publications (2)

Publication Number Publication Date
CN108763214A true CN108763214A (en) 2018-11-06
CN108763214B CN108763214B (en) 2021-09-24

Family

ID=64004195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810539447.4A Active CN108763214B (en) 2018-05-30 2018-05-30 Automatic construction method of emotion dictionary for commodity comments

Country Status (1)

Country Link
CN (1) CN108763214B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800418A (en) * 2018-12-17 2019-05-24 北京百度网讯科技有限公司 Text handling method, device and storage medium
CN109933793A (en) * 2019-03-15 2019-06-25 腾讯科技(深圳)有限公司 Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing
CN110147552A (en) * 2019-05-22 2019-08-20 南京邮电大学 Educational resource quality evaluation method for digging and system based on natural language processing
CN110413780A (en) * 2019-07-16 2019-11-05 合肥工业大学 Text emotion analysis method, device, storage medium and electronic equipment
CN111080055A (en) * 2019-11-06 2020-04-28 邱素容 Hotel scoring method, hotel recommendation method, electronic device and storage medium
CN112818682A (en) * 2021-01-22 2021-05-18 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN113377929A (en) * 2021-08-12 2021-09-10 北京好欣晴移动医疗科技有限公司 Unsupervised clustering method, unsupervised clustering device and unsupervised clustering system for special terms
CN116320626A (en) * 2023-05-11 2023-06-23 深圳市兴意腾科技电子有限公司 Method and system for calculating live broadcast heat of electronic commerce
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130092342A (en) * 2012-02-09 2013-08-20 고민수 System and method for creating emotional word dictionary and computing emotional degrees of documents
CN103646097A (en) * 2013-12-18 2014-03-19 北京理工大学 Constraint relationship based opinion objective and emotion word united clustering method
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN105718446A (en) * 2016-03-08 2016-06-29 徐勇 UGC fuzzy comprehensive evaluation method based on sentiment analysis
CN106407177A (en) * 2016-08-26 2017-02-15 西南大学 Emergency online group behavior detection method based on clustering analysis
CN107369066A (en) * 2017-06-28 2017-11-21 东软集团股份有限公司 A kind of feature between comment object compares method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130092342A (en) * 2012-02-09 2013-08-20 고민수 System and method for creating emotional word dictionary and computing emotional degrees of documents
CN103646097A (en) * 2013-12-18 2014-03-19 北京理工大学 Constraint relationship based opinion objective and emotion word united clustering method
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN105718446A (en) * 2016-03-08 2016-06-29 徐勇 UGC fuzzy comprehensive evaluation method based on sentiment analysis
CN106407177A (en) * 2016-08-26 2017-02-15 西南大学 Emergency online group behavior detection method based on clustering analysis
CN107369066A (en) * 2017-06-28 2017-11-21 东软集团股份有限公司 A kind of feature between comment object compares method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUN ZHANG: "Research on Building a Chinese Sentiment Lexicon Based on SO-PMI", 《APPLIED MECHANICS AND MATERIALS》 *
刘沙: "电商网站的产品评价对象抽取研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
牛振东: "基于三层过滤的评价对象抽取", 《北京理工大学学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800418A (en) * 2018-12-17 2019-05-24 北京百度网讯科技有限公司 Text handling method, device and storage medium
CN109800418B (en) * 2018-12-17 2023-05-05 北京百度网讯科技有限公司 Text processing method, device and storage medium
CN109933793A (en) * 2019-03-15 2019-06-25 腾讯科技(深圳)有限公司 Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing
CN109933793B (en) * 2019-03-15 2023-01-06 腾讯科技(深圳)有限公司 Text polarity identification method, device and equipment and readable storage medium
CN110147552A (en) * 2019-05-22 2019-08-20 南京邮电大学 Educational resource quality evaluation method for digging and system based on natural language processing
CN110147552B (en) * 2019-05-22 2022-12-06 南京邮电大学 Education resource quality evaluation mining method and system based on natural language processing
CN110413780B (en) * 2019-07-16 2022-02-22 合肥工业大学 Text emotion analysis method and electronic equipment
CN110413780A (en) * 2019-07-16 2019-11-05 合肥工业大学 Text emotion analysis method, device, storage medium and electronic equipment
CN111080055A (en) * 2019-11-06 2020-04-28 邱素容 Hotel scoring method, hotel recommendation method, electronic device and storage medium
CN112818682A (en) * 2021-01-22 2021-05-18 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN112818682B (en) * 2021-01-22 2023-01-03 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN113377929B (en) * 2021-08-12 2021-12-10 北京好欣晴移动医疗科技有限公司 Unsupervised clustering method, unsupervised clustering device and unsupervised clustering system for special terms
CN113377929A (en) * 2021-08-12 2021-09-10 北京好欣晴移动医疗科技有限公司 Unsupervised clustering method, unsupervised clustering device and unsupervised clustering system for special terms
CN116320626A (en) * 2023-05-11 2023-06-23 深圳市兴意腾科技电子有限公司 Method and system for calculating live broadcast heat of electronic commerce
CN116320626B (en) * 2023-05-11 2023-11-14 深圳市兴意腾科技电子有限公司 Method and system for calculating live broadcast heat of electronic commerce
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium
CN117217218B (en) * 2023-11-08 2024-01-23 中国科学技术信息研究所 Emotion dictionary construction method and device for science and technology risk event related public opinion

Also Published As

Publication number Publication date
CN108763214B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN108763214A (en) A kind of sentiment dictionary method for auto constructing for comment on commodity
US10678816B2 (en) Single-entity-single-relation question answering systems, and methods
Shen et al. Pragmatically informative text generation
Choi et al. Aila: Attentive interactive labeling assistant for document classification through attention-based deep neural networks
Benchimol et al. Text mining methodologies with R: An application to central bank texts
CN110134792B (en) Text recognition method and device, electronic equipment and storage medium
CN106940726B (en) Creative automatic generation method and terminal based on knowledge network
CN108132927A (en) A kind of fusion graph structure and the associated keyword extracting method of node
CN111080055A (en) Hotel scoring method, hotel recommendation method, electronic device and storage medium
CN109947934A (en) For the data digging method and system of short text
CN110222250A (en) A kind of emergency event triggering word recognition method towards microblogging
CN115795030A (en) Text classification method and device, computer equipment and storage medium
TWI254880B (en) Method for classifying electronic document analysis
CN108536781A (en) A kind of method for digging and system of social networks mood focus
Steuber et al. Topic modeling of short texts using anchor words
CN105740225B (en) A kind of Word sense disambiguation method merging sentence local context and document realm information
Yaddarabullah et al. Classification hoax news of COVID-19 on Instagram using K-nearest neighbor
Wang et al. Distributional modeling on a diet: One-shot word learning from text only
CN116304063B (en) Simple emotion knowledge enhancement prompt tuning aspect-level emotion classification method
Kieu et al. Learning neural textual representations for citation recommendation
Bethard We need to talk about random seeds
Zhu et al. An improved method for k-means clustering based on internal validity indexes and inter-cluster variance
Azizov et al. Frank at checkthat! 2023: Detecting the political bias of news articles and news media
CN109800430A (en) A kind of semantic understanding method and system
CN108920475A (en) A kind of short text similarity calculating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant