CN112883145A - Emotion multi-tendency classification method for Chinese comments - Google Patents

Emotion multi-tendency classification method for Chinese comments Download PDF

Info

Publication number
CN112883145A
CN112883145A CN202011547122.4A CN202011547122A CN112883145A CN 112883145 A CN112883145 A CN 112883145A CN 202011547122 A CN202011547122 A CN 202011547122A CN 112883145 A CN112883145 A CN 112883145A
Authority
CN
China
Prior art keywords
morpheme
emotion
node
nodes
emotional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011547122.4A
Other languages
Chinese (zh)
Other versions
CN112883145B (en
Inventor
张少中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Wanli University
Original Assignee
Zhejiang Wanli University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Wanli University filed Critical Zhejiang Wanli University
Priority to CN202011547122.4A priority Critical patent/CN112883145B/en
Publication of CN112883145A publication Critical patent/CN112883145A/en
Application granted granted Critical
Publication of CN112883145B publication Critical patent/CN112883145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an emotion multi-tendency classification method for Chinese comments, which comprises the following steps of: firstly, extracting morpheme words and emotion words; secondly, constructing a similarity relation between morpheme emotional variables; finally, calculating a morpheme emotion tight path; the morpheme emotion variables are regarded as nodes in a directed weighted acyclic graph, directed weighted relation connection is built among the morpheme emotion nodes and serves as directed weighted link edges, and effective paths meeting certain weight conditions are searched on the basis of the directed weighted link edges. The invention combines a directed weighted acyclic graph model with emotion tendency analysis, realizes emotion multi-tendency classification of comments by three steps of extracting various morpheme emotions of the comments, analyzing similarity relation among the morpheme emotions and calculating morpheme emotion tight paths, more accurately distinguishes various attitudes expressed by a user to objects, and reflects the attribute and characteristic opinion of the user to the objects.

Description

Emotion multi-tendency classification method for Chinese comments
Technical Field
The invention relates to sentiment tendency classification, in particular to a sentiment multi-tendency classification method facing Chinese comments.
Background
With the rapid popularization and development of applications such as blogs, microblogs, comments and the like, various comments in the network become important ways for users to express opinions and communicate online. Comment information in a network typically expresses a user's opinion of things in the form of short text, such as a review of news events, a comment on performance of goods, and so on. All of these commenting information is published by a large number of users, making their own opinions and claims about things from different sides and different perspectives. The evaluation information is accumulated day by day, and a data set with complex structure, various contents and various emotional combinations is formed.
The relevant comments made by the user on the interesting things are important ways to reflect the user's opinion on the attributes and characteristics of things. Users express the attitudes of the users in various aspects such as attitude of the users to events, performance of commodities, quality of service and the like through comments. The existing comment emotion tendency classification researches mainly divide emotion tendencies into positive emotions, negative emotions and neutral emotions, and some researches divide the emotion tendencies into several grades, such as: the emotion tendency classification method comprises the following steps of high favor, neutrality, disfavor, high disfavor and the like, wherein the emotion tendency is divided into several fixed types, and more complex emotion classification situations are difficult to process.
Different users may experience different experiences with events, things, services, etc., as users' knowledge of the events, understanding of the things, experience of the services, etc., may all vary widely. Such diverse receptors may now express a wide variety of emotions and attitudes in their review. Meanwhile, in a single comment on a certain object (event, thing, service, etc.), a user sometimes expresses a single attitude, such as approval or disapproval, which is an overall evaluation about the object and expresses a certain tendency of emotion. However, due to the richness and complexity of human emotions, users often individually comment on and evaluate different aspects of a target, for example, evaluating a commodity involves different details and aspects of price, performance, appearance, etc., and express different attitudes on those details and aspects. This results in that the emotional tendencies expressed by the user in the same comment are not always of a single emotional type. In many cases, a user may agree to or deny something to the same thing, rather than affirming or denying the entire thing. Thus, these different attitudes are a more comprehensive description of an object by the user, expressing a multi-faceted emotional orientation. In order to more accurately distinguish the multiple attitudes expressed by the user on the objects, it is necessary to classify the comments of the user into more detailed emotional trends.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide the sentiment multi-tendency classification method for Chinese comments, which can more accurately distinguish multiple attitudes expressed by users to objects.
The invention discloses an emotion multi-tendency classification method for Chinese comments, which comprises the following steps of:
s1, extracting morpheme emotion variables; extracting various morpheme words and emotional words related to a commented object in a comment text according to a Chinese morpheme lexicon and an emotional corpus lexicon, calculating a correlation coefficient between the morpheme words and the emotional words by adopting a Pearson correlation coefficient method, and forming morpheme emotional variables through the correlation coefficient;
s2, constructing similarity relation between morpheme emotion variables; calculating the approximate relation of the two morpheme emotional variables by adopting a condition mutual information calculation formula, and describing the relation between the morpheme emotional variables;
s3, calculating a morpheme emotion tight path; the method comprises the steps of regarding morpheme emotion variables as nodes in a directed weighted acyclic graph, wherein the nodes are called morpheme emotion nodes or morpheme emotion node variables, constructing directed weighted relation connection among the morpheme emotion nodes and serving as directed weighted link edges, designing an improved shortest path search algorithm based on a directed weighted acyclic graph model on the basis of the directed weighted link edges, searching effective paths meeting certain weight conditions, and each path is an emotion tendency classification.
Preferably, in step S1, the chinese morphemes are divided into noun morphemes and emotion morphemes, the two morphemes are combined in one or more of union, bias, domination, statement and supplement, the morphemes in the comment text are extracted by a supervised machine learning method, and the morphemes and emotion are associated by using the pearson correlation coefficient between the morphemes and emotion as the correlation coefficient, so as to construct morpheme-emotion variables.
Preferably, in step S2, two morpheme emotion nodes with similarity are connected by using a directed edge to form a directed link; the direction of the directional link is determined according to the sequence of occurrence of the morpheme emotional variables in the comments, and the sequence determines the connection direction of the link edges.
Preferably, in step S3, after directional link edges between all morpheme emotion nodes are obtained, the shortest paths from a start node to all end nodes are found, the morpheme emotion nodes on each shortest path form the strongest emotion tendency set, which represents an emotion tendency, and by setting a reasonable empirical threshold of the maximum path length, those paths meeting the emotion intensity requirement are found, and the morpheme emotion nodes and the directional weighted edges on those paths form an effective emotion tendency classification.
Preferably, in step S1, the extraction of the morpheme emotion variables includes the following steps:
a1, selecting a comment training sample set, referring to the existing Chinese morpheme library, searching all Chinese name part-of-speech morphemes, and recording a morpheme set M;
a2, selecting a comment training sample set, referring to the existing emotion corpus, searching all Chinese emotion type morphemes, and recording an emotion set S;
a3, the morpheme elements in morpheme set M and the emotion elements in emotion set S form an independent morpheme emotion variable viCalculating a Pearson correlation coefficient r between each morpheme element and the emotion element; setting a threshold rθWill satisfy r ≥ rθMorpheme emotion variable viRecording effective morpheme emotion variable set V
Figure BDA0002856612800000031
Wherein n in the formula (1) is the number of effective morpheme emotion variables;
a4, and a3 is executed circularly until all elements in the morpheme set and the emotion set are processed.
Preferably, the calculation formula of the pearson correlation coefficient r between the morphemes and the emotions is as follows:
Figure BDA0002856612800000032
wherein, in the formula (2)
Figure BDA0002856612800000033
Figure BDA0002856612800000034
And σMAre respectively paired with MiThe standard score, the mean, and the standard deviation of (a), n is the number of review training samples.
Preferably, in step S2, regarding the morpheme emotion variables as nodes in a directed weighted acyclic graph, which are called morpheme emotion nodes or morpheme emotion node variables, the calculating the approximate relationship of the morpheme emotion nodes includes the following steps:
b1, finding out a sub-node set of each morpheme emotion node, and constructing a directed acyclic graph of the morpheme emotion nodes;
firstly, initializing a child node set, and emptying the child node sets of all morpheme emotion nodes; then each pair of morpheme sentiment nodes v are calculatediAnd morpheme sentiment node vjWhen the condition mutual information is larger than a preset empirical value, the morpheme emotional node v is processedjNode v regarded as morpheme emotioniA child node of (1); finally, outputting a sub-node set of all morpheme emotion nodes and a directed acyclic graph, wherein the directed acyclic graph is represented by G ═ V, D; wherein v isi、vjThe method comprises the following steps that (1) a morpheme sentiment node is set, G is a directed acyclic graph, V is an effective morpheme sentiment node set, and D is a directed edge set from a father node to a child node;
calculating the condition mutual information of each pair of morpheme emotion nodes:
Figure BDA0002856612800000035
wherein f (G) in the formula (3) is conditional mutual information, p (v)i,vj) For joint probability density function, Chirld (v)i) Is a node viA set of child nodes of; the value range of i is [1, n-1 ]]J has a value range of [ i +1, n];
b2, calculating similarity weights among morpheme emotional nodes, and executing in a circulating mode until all the morpheme emotional nodes are traversed;
Figure BDA0002856612800000041
wherein, in the formula (4), Wi,jIs the weight of the similarity relation of two morpheme emotional nodes with a parent-child relation, N (v)i) And N (v)j) The number of times that the nodes appear in the same comment text, N (v)i,vj) Is the number of times both occur simultaneously in the same comment text.
Preferably, in step S3, the computation of the morpheme emotion dense path includes the following steps:
c1, calculating the length of the directed link edge of the directed weighted acyclic graph, and converting the similarity weight into the length of the directed edge, Li,j=-lnWi,jWherein L isi,jIs the length of the directed edge;
c2, calculating an emotional tendency classification path, initializing variables, and sequentially executing the following steps:
c21, selecting a morpheme emotion node variable without father node from the morpheme emotion variable set V as a start node, and recording as Vs
c22, initializing the child nodes of the start node to be self, and initializing the child nodes of other morpheme emotion nodes in the morpheme emotion variable set V to be null;
c23, node v of morpheme emotioniTo morpheme emotion node vjHas a path length of Di,jThe path length from the starting node to the self node is 0, and the initial value of the path length from the starting node to other morpheme emotion nodes is infinite; morpheme sentiment node viAnd morpheme emotion node vjThe length of the path between the two semantic emotion nodes is equal to the algebraic sum of the lengths of all the directed edges between the two semantic emotion nodes;
c24, initializing classification and candidatesSet of nodes, Ck={vs};Q={vs}; wherein, CkFor the kth emotional tendency classification, Q is a candidate node set, vsIs a start node;
c3, when the morpheme emotion node variable set V is not empty, searching the morpheme emotion nodes in the candidate node set Q, finding out the morpheme emotion node with the shortest path length, and executing the following steps:
c31, when morpheme emotion node variable viAnd vjAre all in the candidate node set Q, and i ≠ j, if the start node vsTo morpheme emotion node ViIs less than or equal to the starting node vsTo morpheme emotion node vjIf the path length is not equal to the threshold value, deleting the morpheme emotional node v with the shortest path length from the candidate node set Qi
c32, converting the morpheme sentiment node v with the shortest path lengthiAdding the shortest path set into the shortest path set;
c33, for each subordinate morpheme sentiment node viStarting connection to morpheme emotion node vtWhen starting node vsTo morpheme emotion node viPath length of plus morpheme sentiment node viTo morpheme emotion node vtIs smaller than the start node vsTo morpheme emotion node vtWhen the path length is greater than the predetermined value, the start node v is usedsTo morpheme emotion node viPath length of plus morpheme sentiment node viTo morpheme emotion node vtUpdating shortest path length D by algebraic sum of directed edge lengthss,tAnd will be associated with morpheme emotion node viSetting the successor node with the shortest path length as a morpheme sentiment node vt(ii) a If morpheme emotion node vtIf not in the candidate set Q, the morpheme sentiment node v is connectedtAdding a candidate node set Q;
c34, when morpheme sentiment node viIf no successor node exists, searching for the next classification;
c35, if morpheme sentiment node viPertaining to a morpheme emotionIf the sensing node set V is the same as the sensing node set V, the morpheme emotional node V which has given the shortest path is deleted from the morpheme emotional node set Vi
c4, if the path length is less than the set maximum path length threshold value, the classification is valid, and the algorithm is finished.
Compared with the prior art, the emotion multi-tendency classification method for Chinese comments has the following remarkable advantages:
the invention combines a directed weighted acyclic graph model with emotion tendency analysis, introduces a method of Chinese morphemes, divides the traditional Chinese morphemes into morphemes and emotion types, realizes emotion multi-tendency classification of comments by three steps of extracting various morpheme emotions of the comments, analyzing similarity relation between the morpheme emotions and calculating a morpheme emotion tight path, more accurately distinguishes multiple attitudes expressed by a user to an object, and reflects the attribute and characteristic opinion of the user to the object.
Drawings
FIG. 1 is an emotional multi-tendency classification model facing Chinese comments.
Fig. 2 is a comparison diagram of the convergence time of the algorithm in which epsilon is 0.85 and xi takes different values in the embodiment of the present invention.
Fig. 3 is a comparison graph of convergence time of the algorithm in which ξ ═ 2000 and ε take different values in the embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
As shown in FIG. 1, the invention provides an emotion multi-tendency classification method facing Chinese comments, which comprises the following steps:
first, morpheme emotion variables are extracted. And extracting various morpheme words and emotional words related to the object to be commented in the comment text according to the existing Chinese morpheme lexicon and the existing emotional corpus lexicon. According to the characteristics of the Chinese language description objects, calculating the association coefficient between the morpheme words and the emotion words by adopting a Pearson correlation coefficient method, and forming morpheme emotion variables by the association coefficient. The morpheme emotion variable can be used as independent emotion content to describe a certain emotion type and is regarded as an independent emotion unit in emotion relation calculation.
Then, a similarity relation between morpheme emotion variables is constructed. And describing the relation between morpheme emotion variables by adopting a conditional mutual information calculation formula. The morpheme emotion variables are regarded as independent entities, a plurality of independent morpheme emotions are extracted from the comments, and the emotion tendencies of the users are expressed through the independent morpheme emotions. Through organic combination, the morpheme emotions may have certain similarity, the morpheme emotion variables with the similarity are descriptions of emotion tendencies similar to a certain morpheme, and the morpheme emotion variables with the similarity can reflect certain types of emotion tendencies.
And finally, calculating a morpheme emotion tight path. The morpheme emotion similarity relation obtained by adopting the conditional mutual information is a direct relation between morpheme emotion variables, and whether a morpheme emotion variable set with the direct relation can express the emotion tendencies of a user needs to determine the strength of the global relation formed by the morpheme emotion variables in the whole set. On the basis of the directed weighted acyclic graph model, the morpheme emotional variables are regarded as nodes in the directed weighted acyclic graph and are called morpheme emotional nodes or morpheme emotional node variables (in the directed weighted acyclic graph, the morpheme emotional nodes or the morpheme emotional node variables are called, and in the morpheme emotion set, the morpheme emotional node variables are only called). And constructing directed weighted relation connection between the morpheme emotion nodes to serve as directed weighted link edges. On the basis of the directed weighted link edges, an improved shortest path search algorithm is designed based on a directed weighted acyclic graph model, and effective paths meeting certain weight conditions are searched. All morpheme emotion node sets experienced on the path represent a certain type of emotion type, a plurality of paths can be found out according to needs by setting certain path length limitation, and each path is classified according to emotion tendentiousness.
The invention is directed to the emotional multi-tendency classification problem of Chinese comments. According to the characteristics of Chinese grammar and morphemes, Chinese comments generally have specific topics and objects, and the topics and the objects can be obtained through related topics of websites or platforms where the comments are located, such as topics in microblogs, specific event objects in blogs, products and services in electronic commerce and the like. These titles and specific descriptions related to topics, objects, products, services, etc. constitute metadata about the objects and can be considered as objects and topics. Generally, the object and subject of the comment can be clearly determined in the comment of most online users.
Objects and topics are composed of a number of aspects, which are branch parts that describe the various components of the object. Sometimes a user is looking at both the entire object and some aspect of the object in a review. These aspects are the core content expressing the user's emotion and it is necessary to extract these aspects describing the object. The aspects of the object are typically composed of morphemes, which are attributes and characteristics of the object and subject, and monosyllabic, bisyllabic, and polysyllabic words in the review may be considered morphemes.
The invention divides Chinese morphemes into two types: noun morphemes and emotion morphemes. The nominal description about things, different sides, functions, attributes, features, etc. of an object in a comment belongs to noun class morphemes, which are abbreviated as morphemes; the expression emotion, attitude, preference, emotion and other contents belong to emotion morphemes, which are called emotion for short. The invention takes the morpheme emotion as the most basic unit of emotion analysis, the morpheme emotion is the inseparable part of the emotion tendentiousness expressed by the user, and a morpheme emotion represents the emotion tendentiousness of the user in the aspect of the morpheme.
The method is characterized in that the morpheme emotion is a basic unit for judging the emotion tendentiousness of a user, and the extraction and mining of effective morpheme emotion are the primary tasks of comment emotion multi-tendentiousness classification. In the Chinese statement expression structure, two types of morphemes can be combined in a compound mode of union, deviation, domination, statement, supplement and the like. The morpheme emotion mining is to find the effective morphemes in the comments and the emotions closely related to the morphemes, and then link the effective morphemes and the emotions to be used as the basic elements of the overall emotion classification. The method extracts the morphemes in the comment text by a supervised machine learning method, and the morphemes and the emotions are corresponding by taking the Pearson correlation coefficient between the morphemes and the emotions as a correlation coefficient to construct a morpheme emotion variable.
And analyzing similarity relation between morpheme emotions. The extracted morpheme emotion variables are regarded as effective emotion tendency nodes, and the morpheme emotion nodes with similarity indicate that the nodes have certain similar emotion tendencies on a certain side. The relation between different morpheme emotion nodes is analyzed whether similarity exists or not through emotion similarity calculation. The invention adopts a conditional mutual information method to calculate the approximate relationship between two morpheme emotional nodes. Two nodes with similarity are connected by using directed edges to form a directed link. The direction of the directional link is determined according to the sequence of occurrence of the morpheme emotional variables in the comments, and the sequence determines the connection direction of the link edges.
And calculating the emotional tendency classification path and determining effective emotional tendency classification. After directional link edges among all morpheme emotional nodes are obtained, the shortest path from a certain starting node to all termination nodes is found out by utilizing an improved shortest path algorithm, the morpheme emotional nodes on each shortest path form a strongest emotion tendency set which represents an emotion tendency, the paths meeting the emotion intensity requirement can be found out by setting a reasonable maximum path length empirical threshold, and the morpheme nodes and the directional weighting edges on the paths form effective emotion tendency classification. And if the number of paths meets the requirement, obtaining the number of emotional tendency classifications, thereby realizing the comment emotional multi-tendency classification of the invention.
The emotion multi-tendency classification algorithm for Chinese comments, which is provided by the invention, is as follows:
inputting: chinese comment text data sets (training samples, test samples);
and (3) outputting: an emotion multi-tendency classification set;
step1, selecting a comment training sample set, searching all Chinese name part-of-speech morphemes by referring to the existing Chinese morpheme library, and recording a morpheme set M;
step2, selecting a comment training sample set, referring to the existing emotion corpus, searching all Chinese emotion type morphemes, and recording an emotion set S;
step3, executing a loop until all elements in the morpheme set and the emotion set are processed:
calculating a Pearson correlation coefficient r between each morpheme and the emotion;
Figure BDA0002856612800000081
wherein
Figure BDA0002856612800000082
Figure BDA0002856612800000083
And σMAre respectively paired with MiStandard score, mean and standard deviation of (d);
step4, the morpheme elements and the emotion elements form an independent morpheme emotion variable viSetting a threshold rθWill satisfy r ≥ rθThe morpheme emotion variables are recorded into an effective morpheme emotion variable set
Figure BDA0002856612800000084
Step5, changing the effective morpheme emotion variable viLooking as nodes in the directed weighted acyclic graph, finding out a child node set of each node, constructing a directed acyclic graph G of the morpheme emotion node, and executing (Step5-1 to Step 5-3):
step5-1, initializing the child node set, first setting the child node set of all morpheme emotion nodes to null, for i ═ 1 to n, and executing Chirld (v)i)←φ;
Step5-2, calculating a conditional mutual information function f (g), i ═ 1 to n-1, j ═ i +1 to n, and executing in a loop (S5-2-1 to S5-2-2):
step5-2-1, calculating the condition mutual information of each pair of morpheme emotion nodes:
Figure BDA0002856612800000085
f (G) is conditional mutual information, p (v)i,vj) For joint probability density function, Chirld (v)i) For morpheme emotion node viA set of child nodes of;
step5-2-2, judging that if f (G) is not less than epsilon, Chirld (v)i)←vj(ii) a Wherein epsilon is an empirical constant, and a morpheme sentiment node viAnd morpheme emotion node vjIf the conditional mutual information is greater than a certain empirical value, the morpheme emotion node v is determinedjNode v regarded as morpheme emotioniA child node of (1);
step5-3, outputting child node set Chirld (v) of all nodesi) And directed acyclic graph G ═ V, D; d is a directed edge set from a parent node to a child node;
step6, calculating similarity weights among morpheme emotion nodes, and executing in a circulating mode until all the nodes are traversed, wherein i is 1 to n-1; j ═ i +1 to n;
do:
Figure BDA0002856612800000091
wherein Wi,jIs the weight of the similarity relation of two morpheme emotional nodes with a parent-child relation, N (v)i) And N (v)j) The number of times that the nodes appear in the same comment text, N (v)i,vj) The number of times that the two occur in the same comment text at the same time;
step7, calculating the length of a directed link edge between two morpheme emotion nodes in the directed weighted acyclic graph, Li,j=-ln Wi,j(ii) a The similarity weight is converted into a directed edge length, Li,jIs the length of the directed edge;
step8, calculating an emotional tendency classification path, initializing variables and sequentially executing (Step8-1 to Step 8-5):
Step8-1,k=1,Ckphi is defined as; wherein C iskClassifying the kth emotional tendency;
step8-2, selecting a node variable without a father node from the morpheme emotion variable set V as a starting node, and marking as Vs
Step8-3, node v will startsThe child nodes are initialized to be other nodes V in the morpheme emotion variable set VjThe child node of (a) is initialized to null; chirld (v)s)=vs;Chirld(vj)=φ;
Step8-4,Di,jFor morpheme emotion node viTo morpheme emotion node vjThe path length from the start node to itself is 0, D s,s0; the initial value of the path length from the starting node to other morpheme emotional nodes is infinity, Ds,jInfinity; the length of the path between two morpheme emotion nodes is equal to the algebraic sum of the lengths of the directed edges between all the nodes passed by the path, Di,j=Li,1+L1,2+...+Lj-1,j
Step8-5, initialize classification and candidate node set: ck={vs};Q={vsIn which C iskFor the kth emotional tendency classification, Q is a candidate node set, vsIs a start node;
step9, when the morpheme sentiment node set V is not empty, namely V is not equal to phi, searching nodes in the candidate node set Q, finding out the morpheme sentiment nodes with the shortest path length, and executing (Step9-1 to Step 9-5);
step9-1, for i, j ∈ Q and i ≠ j, if Ds,i≤Ds,jThen Q is Q- { vi}; for morpheme emotion node variable viAnd VjAll are in a candidate node set Q, and i is not equal to j, if a starting node reaches a morpheme sentiment node ViHas a path length less than or equal to the length from the start node to the morpheme emotion node vjIf the path length is not equal to the threshold value, deleting the morpheme emotional node v with the shortest path length from the candidate node set Qi
Step9-2,Ck=Ck∪{vi}; updating sentiment classification set CkMorpheme emotion node v with shortest path lengthiAdd to Emotion Classification set CkPerforming the following steps;
step9-3, for each sentiment node v with morphemesiStarting connection to morpheme emotion sectionPoint vtDirected edges of, i.e. all vt∈Chirld(vi) When path length Ds,i+Li,t<Ds,tThen, execution is performed (S9-3-1 to S9-3-2):
Step9-3-1,Ds,t=Ds,i+Li,t;Next(vi)=vt(ii) a For each sentiment node v with morphemesiStarting connection to morpheme emotion node vtWhen starting node vsTo morpheme emotion node viPath length of plus morpheme sentiment node viTo morpheme emotion node vtIs smaller than the starting node vsTo morpheme emotion node vtWhen the path length is greater than the predetermined value, the start node v is usedsTo morpheme emotion node viPath length of plus morpheme sentiment node viTo morpheme emotion node vtUpdating shortest path length D by algebraic sum of directed edge lengths ofs,t,Next(vi)=vtDenotes vtIs a semantic emotion node viThe subsequent node with the shortest length directional connection edge;
step9-3-2, judge if
Figure BDA0002856612800000101
Then Q ═ Q utou { v } is performedt}; if morpheme emotion node vtIf not in the candidate set Q, the node v is connectedtAdding a candidate node set Q;
step9-4, judge if Next (v)i) If is, executing k is k + 1; when morpheme sentiment node viIf no successor node exists, searching for the next classification;
step9-5, judging if viE.g. V, then execute V ═ V- { Vi}; if morpheme emotion node viIf the node belongs to the morpheme emotional node set V, deleting the morpheme emotional node V with the shortest path found from the morpheme emotional node set Vi
Step10, if Ds,tLess than or equal to xi, all C are outputkGathering and finishing the algorithm; if the path length is less than the set maximum pathLength threshold xi, is the valid classification.
The following are specific examples of the present invention and further describe the technical solutions of the present invention, but the present invention is not limited to these examples.
The technical effect experiment adopts self-collected data to test, the data set is derived from user comments about mobile phones of a certain online shopping mall, and the data comprises comments in the period from 5 months in 2019 to 10 months in 2019. Data used in the experiment are preliminarily screened, all consumers and commodities are guaranteed to have at least 5 comments, and a comment data recording structure is composed of a comment name, a product number, a comment text and a score. The detailed structure of the data set is shown in table 1.
TABLE 1 data Structure of comments
Figure BDA0002856612800000111
From the review data, 1,000 reviews related to 100 models of cell phones were selected as our test data, and manually labeled. Each comment may express multiple aspects of emotion, which requires multiple labels to be labeled, depending on its specific content. Table 2 is a sample of labels manually labeled.
TABLE 2 Emotion tendentiousness tags to review data
Figure BDA0002856612800000112
Figure BDA0002856612800000121
Technical test method
The records of the data set were equally divided into 5 sections in the experimental test, each section containing 200 reviews. First, one part of the data set is used as a test set and the remaining 4 parts are used as training data sets. Accuracy in the experiment included Precision, Recall. Then, another section is selected as the test set, the remaining 4 sections in the data set are used as the training set, and the precision rate, recall rate, and CPU time consumed are calculated again until all 5 sections are used as the test set for one pass.
Testing the effects
The control parameter epsilon of the conditional mutual information of the algorithm in the accuracy test is respectively 0.65, 0.75, 0.85 and 0.95, and the maximum length xi of the path distance is respectively 1000, 2000, 3000, 4000 and 5000. The results are shown in Table 3:
TABLE 3 average of precision, recall and F values for the algorithm when the control parameters ε and ξ take different values, respectively
Figure BDA0002856612800000122
As can be seen from table 3, when epsilon is 0.95 and ξ is 2000, the accuracy value is the highest, but the recall rate is very low, which indicates that in this case, the correct case is good in the classification result of the algorithm, but the missing classification is also many; when epsilon is 0.85 and xi is 5000, the recall rate is high, but the accuracy rate is reduced, which shows that in the case, the missing classification in the classification result of the algorithm is few, but the correct condition is not good, so that the accuracy rate and the recall rate are not good only. From the overall consideration, when epsilon is 0.85 and xi is 2000, the precision rate and the recall rate are in a better condition.
When epsilon takes a fixed value and xi takes a variable value, the accuracy rate is slightly increased along with the increase of xi at the beginning, but the arrival peak value is reduced, mainly because the maximum path distance is increased to a certain degree to obtain higher accuracy under the condition that the node similarity relation is fixed, but the nodes which are not in a close relation are also added into the path by unlimited increase to cause the reduction of the label classification accuracy; for the recall rate, the recall rate reflects the number of the missing tags, and the bigger ξ is, more tags are added, the missing is reduced, and the recall rate is also improved.
When xi is a fixed value and epsilon is a variable value, the accuracy rate increases with the increase of epsilon, because larger mutual information can find out more accurate labels, but the recall rate reaches a maximum value in a certain range, which indicates that the number of missing labels is reduced at the beginning, but is increased after reaching a certain degree, and the reason is that the strong similarity relationship of nodes is mainly over-emphasized, so that the missing of correct labels is caused.
The control parameter epsilon of the algorithm in the time efficiency test is 0.85, the xi is 2000, 3000 and 5000, and the convergence time of the algorithm in the three cases is shown in figure 2. As can be seen from FIG. 2, when ε is fixed, the convergence time of the algorithm increases with increasing ξ value. The value of the control parameter xi of the algorithm is 2000, the values of the parameter epsilon are respectively 0.75, 0.85 and 0.95, and the convergence time of the algorithm under the three conditions is shown in figure 3. As can be seen from fig. 3, when ξ takes a fixed value, the algorithm convergence time decreases with increasing value of ∈.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention shall be subject to the claims.

Claims (8)

1. A Chinese comment-oriented emotion multi-tendency classification method is characterized by comprising the following steps: the method comprises the following steps:
s1, extracting morpheme emotion variables; extracting various morpheme words and emotional words related to a commented object in a comment text according to a Chinese morpheme lexicon and an emotional corpus lexicon, calculating a correlation coefficient between the morpheme words and the emotional words by adopting a Pearson correlation coefficient method, and forming morpheme emotional variables through the correlation coefficient;
s2, constructing similarity relation between morpheme emotion variables; calculating the approximate relation of the two morpheme emotional variables by adopting a condition mutual information calculation formula, and describing the relation between the morpheme emotional variables;
s3, calculating a morpheme emotion tight path; the method comprises the steps of regarding morpheme emotion variables as nodes in a directed weighted acyclic graph, wherein the nodes are called morpheme emotion nodes or morpheme emotion node variables, constructing directed weighted relation connection among the morpheme emotion nodes and serving as directed weighted link edges, designing an improved shortest path search algorithm based on a directed weighted acyclic graph model on the basis of the directed weighted link edges, searching effective paths meeting certain weight conditions, and each path is an emotion tendency classification.
2. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S1, the chinese morphemes are divided into noun morphemes and emotion morphemes, which are combined in one or more of union, bias, domination, statement, and supplement, and the morphemes in the comment text are extracted by a supervised machine learning method, and the morphemes and the emotion are associated with each other by using the pearson correlation coefficient between the morphemes and the emotion as an association coefficient, thereby constructing a morpheme-emotion variable.
3. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S2, two morpheme emotion nodes with similarity are connected by using a directed edge to form a directed link; the direction of the directional link is determined according to the sequence of occurrence of the morpheme emotional variables in the comments, and the sequence determines the connection direction of the link edges.
4. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S3, after directional link edges between all morpheme emotion nodes are obtained, the shortest paths from a certain start node to all end nodes are found, the morpheme emotion nodes on each shortest path form the strongest emotion tendency set representing an emotion tendency, and by setting a reasonable maximum path length empirical threshold, those paths meeting the emotion intensity requirement are found, and the morpheme emotion nodes and directional weighted edges on these paths form an effective emotion tendency classification.
5. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S1, the extraction of the morpheme emotion variables includes the following steps:
a1, selecting a comment training sample set, referring to the existing Chinese morpheme library, searching all Chinese name part-of-speech morphemes, and recording a morpheme set M;
a2, selecting a comment training sample set, referring to the existing emotion corpus, searching all Chinese emotion type morphemes, and recording an emotion set S;
a3, the morpheme elements in morpheme set M and the emotion elements in emotion set S form an independent morpheme emotion variable viCalculating a Pearson correlation coefficient r between each morpheme element and the emotion element; setting a threshold rθWill satisfy r ≥ rθMorpheme emotion variable viRecording effective morpheme emotion variable set V
Figure FDA0002856612790000021
Wherein n in the formula (1) is the number of effective morpheme emotion variables;
a4, and a3 is executed circularly until all elements in the morpheme set and the emotion set are processed.
6. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 5, wherein: the calculation formula of the Pearson correlation coefficient r between the morphemes and the emotions is as follows:
Figure FDA0002856612790000022
wherein, in the formula (2)
Figure FDA0002856612790000023
Figure FDA0002856612790000024
And σMAre respectively paired with MiThe standard score, the mean, and the standard deviation of (a), n is the number of review training samples.
7. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S2, regarding the morpheme emotion variables as nodes in the directed weighted acyclic graph, which are called morpheme emotion nodes or morpheme emotion node variables, and calculating the approximate relationship of the morpheme emotion nodes includes the following steps:
b1, finding out a sub-node set of each morpheme emotion node, and constructing a directed acyclic graph of the morpheme emotion nodes;
firstly, initializing a child node set, and emptying the child node sets of all morpheme emotion nodes; then each pair of morpheme sentiment nodes v are calculatediAnd morpheme sentiment node vjWhen the condition mutual information is larger than a preset empirical value, the morpheme emotional node v is processedjNode v regarded as morpheme emotioniA child node of (1); finally, outputting a sub-node set of all morpheme emotion nodes and a directed acyclic graph, wherein the directed acyclic graph is represented by G ═ V, D; wherein v isi、vjThe method comprises the following steps that (1) a morpheme sentiment node is set, G is a directed acyclic graph, V is an effective morpheme sentiment node set, and D is a directed edge set from a father node to a child node;
calculating the condition mutual information of each pair of morpheme emotion nodes:
Figure FDA0002856612790000031
wherein f (G) in the formula (3) is conditional mutual information, p (v)i,vj) For joint probability density function, Chirld (v)i) Is a node viA set of child nodes of; the value range of i is [1, n-1 ]]J has a value range of [ i +1, n];
b2, calculating similarity weights among morpheme emotional nodes, and executing in a circulating mode until all the morpheme emotional nodes are traversed;
Figure FDA0002856612790000032
wherein, in the formula (4), Wi,jIs the weight of the similarity relation of two morpheme emotional nodes with a parent-child relation, N (v)i) And N (v)j) The number of times that the nodes appear in the same comment text, N (v)i,vj) Is the number of times both occur simultaneously in the same comment text.
8. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 7, wherein: in step S3, the computation of the morpheme emotion dense path includes the following steps:
c1, calculating the length of the directed link edge of the directed weighted acyclic graph, and converting the similarity weight into the length of the directed edge, Li,j=-lnWi,jWherein L isi,jIs the length of the directed edge;
c2, calculating an emotional tendency classification path, initializing variables, and sequentially executing the following steps:
c21, selecting a morpheme emotion node variable without father node from the morpheme emotion variable set V as a start node, and recording as Vs
c22, initializing the child nodes of the start node to be self, and initializing the child nodes of other morpheme emotion nodes in the morpheme emotion variable set V to be null;
c23, node v of morpheme emotioniTo morpheme emotion node vjHas a path length of Di,jThe path length from the starting node to the self node is 0, and the initial value of the path length from the starting node to other morpheme emotion nodes is infinite; morpheme sentiment node viAnd morpheme emotion node vjThe length of the path between the two semantic emotion nodes is equal to the algebraic sum of the lengths of all the directed edges between the two semantic emotion nodes;
c24, initializing classification and candidate node set, Ck={vs};Q={vs}; wherein, CkFor the kth emotional tendency classification, Q is a candidate node set, vsIs a start node;
c3, when the morpheme emotion node variable set V is not empty, searching the morpheme emotion nodes in the candidate node set Q, finding out the morpheme emotion node with the shortest path length, and executing the following steps:
c31, when morpheme emotion node variable viAnd vjAre all in the candidate node set Q, and i ≠ j, if the start node vsTo morpheme emotion node ViIs less than or equal to the starting node vsTo morpheme emotion node vjIf the path length is not equal to the threshold value, deleting the morpheme emotional node v with the shortest path length from the candidate node set Qi
c32, converting the morpheme sentiment node v with the shortest path lengthiAdding the shortest path set into the shortest path set;
c33, for each subordinate morpheme sentiment node viStarting connection to morpheme emotion node vtWhen starting node vsTo morpheme emotion node viPath length of plus morpheme sentiment node viTo morpheme emotion node vtIs smaller than the start node vsTo morpheme emotion node vtWhen the path length is greater than the predetermined value, the start node v is usedsTo morpheme emotion node viPath length of plus morpheme sentiment node viTo morpheme emotion node vtUpdating shortest path length D by algebraic sum of directed edge lengthss,tAnd will be associated with morpheme emotion node viSetting the successor node with the shortest path length as a morpheme sentiment node vt(ii) a If morpheme emotion node vtIf not in the candidate set Q, the morpheme sentiment node v is connectedtAdding a candidate node set Q;
c34, when morpheme sentiment node viIf no successor node exists, searching for the next classification;
c35, if morpheme sentiment node viBelongs to a morpheme emotion node set V, the morpheme emotion is selected fromDeleting morpheme emotional nodes V with given shortest paths in sensing node set Vi
c4, if the path length is less than the set maximum path length threshold value, the classification is valid, and the algorithm is finished.
CN202011547122.4A 2020-12-24 2020-12-24 Emotion multi-tendency classification method for Chinese comments Active CN112883145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011547122.4A CN112883145B (en) 2020-12-24 2020-12-24 Emotion multi-tendency classification method for Chinese comments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011547122.4A CN112883145B (en) 2020-12-24 2020-12-24 Emotion multi-tendency classification method for Chinese comments

Publications (2)

Publication Number Publication Date
CN112883145A true CN112883145A (en) 2021-06-01
CN112883145B CN112883145B (en) 2022-10-11

Family

ID=76043452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011547122.4A Active CN112883145B (en) 2020-12-24 2020-12-24 Emotion multi-tendency classification method for Chinese comments

Country Status (1)

Country Link
CN (1) CN112883145B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN102682130A (en) * 2012-05-17 2012-09-19 苏州大学 Text sentiment classification method and system
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN103699663A (en) * 2013-12-27 2014-04-02 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
US20160163332A1 (en) * 2014-12-04 2016-06-09 Microsoft Technology Licensing, Llc Emotion type classification for interactive dialog system
CN106547866A (en) * 2016-10-24 2017-03-29 西安邮电大学 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word
US20190318407A1 (en) * 2015-07-17 2019-10-17 Devanathan GIRIDHARI Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
CN111914185A (en) * 2020-07-06 2020-11-10 华中科技大学 Graph attention network-based text emotion analysis method in social network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN102682130A (en) * 2012-05-17 2012-09-19 苏州大学 Text sentiment classification method and system
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN103699663A (en) * 2013-12-27 2014-04-02 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
US20160163332A1 (en) * 2014-12-04 2016-06-09 Microsoft Technology Licensing, Llc Emotion type classification for interactive dialog system
US20190318407A1 (en) * 2015-07-17 2019-10-17 Devanathan GIRIDHARI Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
CN106547866A (en) * 2016-10-24 2017-03-29 西安邮电大学 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word
CN111914185A (en) * 2020-07-06 2020-11-10 华中科技大学 Graph attention network-based text emotion analysis method in social network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王志涛等: ""基于词典和规则集的中文微博情感分析"", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN112883145B (en) 2022-10-11

Similar Documents

Publication Publication Date Title
CN110162706B (en) Personalized recommendation method and system based on interactive data clustering
Liu et al. Assessing product competitive advantages from the perspective of customers by mining user-generated content on social media
Riaz et al. Opinion mining on large scale data using sentiment analysis and k-means clustering
CN106708966B (en) Junk comment detection method based on similarity calculation
CN105893350B (en) The evaluation method and system of text comments quality in a kind of e-commerce
Sang et al. Context-dependent propagating-based video recommendation in multimodal heterogeneous information networks
CN109597493B (en) Expression recommendation method and device
CN112991017A (en) Accurate recommendation method for label system based on user comment analysis
US20160170993A1 (en) System and method for ranking news feeds
CN105761154B (en) A kind of socialization recommended method and device
CN109816015B (en) Recommendation method and system based on material data
Malik et al. EPR-ML: E-Commerce Product Recommendation Using NLP and Machine Learning Algorithm
CN112650929A (en) Graph neural network recommendation method integrating comment information
CN106372956B (en) Method and system for identifying intention entity based on user search log
Ramkumar et al. Scoring products from reviews through application of fuzzy techniques
Eide et al. Deep neural network marketplace recommenders in online experiments
CN115329215A (en) Recommendation method and system based on self-adaptive dynamic knowledge graph in heterogeneous network
CN112883145B (en) Emotion multi-tendency classification method for Chinese comments
CN117252186A (en) XAI-based information processing method, device, equipment and storage medium
Suryana et al. Dynamic convolutional neural network for eliminating item sparse data on recommender system.
Liu et al. User-generated content analysis for customer needs elicitation
CN111666410B (en) Emotion classification method and system for commodity user comment text
CN113837847A (en) Knowledge-intensive service recommendation method based on heterogeneous multivariate relation fusion
Zhao et al. Probabilistic matrix factorization based on similarity propagation and trust propagation for recommendation
Liang et al. Mining Users' Opinions Based on Item Folksonomy and Taxonomy for Personalized Recommender Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant